AI API Proxy Guard

API SecurityCost ControlAI Infrastructure

Completed

Overview

AI-powered features introduce new operational risks, particularly around cost control, security, and usage visibility. As multiple applications began integrating Gemini models, it became clear that direct client-side API access was unsustainable in a production environment. AI API Proxy Guard was built as a central control layer to sit between applications and Google's Gemini APIs. It enforces quotas, removes sensitive keys from the frontend, and provides a single point of governance for all AI traffic.

Key Features

Quota Enforcement: Rate limiting and usage caps to prevent runaway costs and ensure fair resource allocation across applications
Key Abstraction: Removes sensitive API keys from frontend code, centralising credentials in a secure backend layer
Unified Governance: Single point of control for all AI traffic, enabling consistent policies across multiple client applications
Usage Visibility: Real-time monitoring and logging of API calls, costs, and performance metrics via Cloud Monitoring
Cost Control: Budget alerts and automatic throttling to prevent unexpected billing spikes
Secure Architecture: Environment-based secrets management with no credentials exposed to client-side code
Scalable Infrastructure: Cloud Run auto-scaling to handle variable AI workloads efficiently

Development Approach

Discovery

Identified operational risks with direct Gemini API access. Analysed cost patterns and security requirements across multiple applications.

Design

Architected proxy layer with quota management, key abstraction, and centralised logging. Defined API contracts for client integrations.

Development

Built Node.js backend service with Cloud Run deployment. Implemented rate limiting, request validation, and Gemini API forwarding.

Testing

Load testing for scalability verification. Security audit for credential handling and API endpoint protection.

Launch

Deployed to Google Cloud Run with monitoring dashboards. Migrated frontend applications to use proxy endpoints.

Optimization

Ongoing cost analysis, quota tuning, and performance improvements based on usage patterns and client feedback.

Tech Stack

Google Cloud Run - Serverless container hosting
Google Gemini API - AI model integration
Node.js - Backend service runtime
Environment-based configuration - Secure secrets management
Cloud Monitoring - Usage tracking and alerting
Cloud Logging - Centralised log aggregation
Vercel - Frontend application integrations

Development Updates

January 2026

Proxy Guard deployed to production. All frontend applications migrated to use centralised API endpoints.

January 2026

Cloud Monitoring dashboards configured with cost alerts and usage quotas. Rate limiting policies implemented.

December 2025

Core proxy architecture completed. Node.js service deployed to Cloud Run with auto-scaling configuration.

December 2025

Project initiated after identifying cost and security risks with direct client-side Gemini API access.

Related Projects

Shark Match

Interactive Tool

Car Matching

An interactive calculator that helps Australian buyers determine if the BYD Shark plug-in hybrid pickup truck suits their needs.

View Project

eLearning Calculator

eLearning

Project Management

A professional web tool that provides accurate eLearning development time estimates based on project scope, complexity, and industry-validated methodologies.

View Project

Interested in a similar project?

Let's discuss how we can create a customised solution for your specific needs.

Get in Touch

Overview

Key Features

Quota Enforcement: Rate limiting and usage caps to prevent runaway costs and ensure fair resource allocation across applications

Key Abstraction: Removes sensitive API keys from frontend code, centralising credentials in a secure backend layer

Unified Governance: Single point of control for all AI traffic, enabling consistent policies across multiple client applications

Usage Visibility: Real-time monitoring and logging of API calls, costs, and performance metrics via Cloud Monitoring

Cost Control: Budget alerts and automatic throttling to prevent unexpected billing spikes

Secure Architecture: Environment-based secrets management with no credentials exposed to client-side code

Scalable Infrastructure: Cloud Run auto-scaling to handle variable AI workloads efficiently

Development Approach

Discovery

Identified operational risks with direct Gemini API access. Analysed cost patterns and security requirements across multiple applications.

Design

Architected proxy layer with quota management, key abstraction, and centralised logging. Defined API contracts for client integrations.

Development

Built Node.js backend service with Cloud Run deployment. Implemented rate limiting, request validation, and Gemini API forwarding.

Testing

Load testing for scalability verification. Security audit for credential handling and API endpoint protection.

Launch

Deployed to Google Cloud Run with monitoring dashboards. Migrated frontend applications to use proxy endpoints.

Optimization

Ongoing cost analysis, quota tuning, and performance improvements based on usage patterns and client feedback.

Tech Stack

Google Cloud Run - Serverless container hosting

Google Gemini API - AI model integration

Node.js - Backend service runtime

Environment-based configuration - Secure secrets management

Cloud Monitoring - Usage tracking and alerting

Cloud Logging - Centralised log aggregation

Vercel - Frontend application integrations

Development Updates

January 2026

Proxy Guard deployed to production. All frontend applications migrated to use centralised API endpoints.

January 2026

Cloud Monitoring dashboards configured with cost alerts and usage quotas. Rate limiting policies implemented.

December 2025

Core proxy architecture completed. Node.js service deployed to Cloud Run with auto-scaling configuration.

December 2025

Project initiated after identifying cost and security risks with direct client-side Gemini API access.

Related Projects

Shark Match

Interactive Tool

Car Matching

An interactive calculator that helps Australian buyers determine if the BYD Shark plug-in hybrid pickup truck suits their needs.

View Project

eLearning Calculator

eLearning

Project Management

A professional web tool that provides accurate eLearning development time estimates based on project scope, complexity, and industry-validated methodologies.

View Project