LLM Gateway
API gateway for Large Language Model providers with request routing and caching.
Featuresโ
- Multi-provider support for OpenAI, Anthropic, and LangChain
- Request routing based on model availability
- Caching layer with Redis support
- Rate limiting per provider and endpoint
- OpenTelemetry metrics and tracing
- JWT authentication and API key management
- Request validation and error handling
- GraphQL and REST endpoints
- OpenAPI 3.0 specification
Installationโ
npm install @bluefly/llm-gateway
Quick Startโ
import [createGateway] from '@bluefly/llm-gateway';
const gateway = createGateway({
providers: {
openai: {
apiKey: process.env.OPENAI_API_KEY,
},
anthropic: {
apiKey: process.env.ANTHROPIC_API_KEY,
},
},
cache: {
enabled: true,
ttl: 3600,
},
rateLimit: {
windowMs: 60000,
max: 100,
},
});
await gateway.start(3000);
Configurationโ
Environment Variablesโ
# Provider API Keys
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
# Cache Configuration
REDIS_URL=redis://localhost:6379
CACHE_TTL=3600
# Rate Limiting
RATE_LIMIT_WINDOW=60000
RATE_LIMIT_MAX=100
# Server Configuration
PORT=3000
NODE_ENV=production
Configuration File (gateway.config.js
)โ
module.exports = {
providers: {
openai: {
apiKey: process.env.OPENAI_API_KEY,
models: ['gpt-4', 'gpt-3.5-turbo'],
timeout: 30000
},
anthropic: {
apiKey: process.env.ANTHROPIC_API_KEY,
models: ['claude-3-sonnet', 'claude-3-haiku'],
timeout: 30000
}
},
cache: {
enabled: true,
provider: 'redis',
url: process.env.REDIS_URL,
ttl: 3600
},
rateLimit: {
windowMs: 60000,
max: 100,
skipSuccessfulRequests: false
},
monitoring: {
enabled: true,
endpoint: '/metrics'
}
};
API Endpointsโ
REST APIโ
# Chat completion
POST /api/v1/chat/completions
Content-Type: application/json
{
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}
# Health check
GET /health
# Metrics (Prometheus format)
GET /metrics
GraphQL APIโ
mutation ChatCompletion($input: ChatInput!) {
chat(input: $input) {
id
choices {
message {
role
content
}
}
usage [promptTokens
completionTokens
totalTokens]
}
}
Usage Examplesโ
Basic Chat Completionโ
const response = await fetch('http://localhost:3000/api/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer your-api-key'
},
body: JSON.stringify({
model: 'gpt-4',
messages: [
{ role: 'user', content: 'Explain quantum computing' }
]
})
});
const result = await response.json();
console.log(result.choices[0].message.content);
Streaming Responseโ
const response = await fetch('http://localhost:3000/api/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer your-api-key'
},
body: JSON.stringify({
model: 'gpt-4',
messages: [role: 'user', content: 'Tell me a story'],
stream: true
})
});
const reader = response.body.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = new TextDecoder().decode(value);
console.log(chunk);
}
Developmentโ
Setupโ
# Clone repository
git clone https://gitlab.bluefly.io/llm/llm-gateway.git
cd llm-gateway
# Install dependencies
npm install
# Start development server
npm run dev
# Run tests
npm test
Scriptsโ
# Development
npm run dev # Start with hot reload
npm run dev:debug # Start with debugging
# Building
npm run build # Build for production
npm run build:prod # Build with minification
# Testing
npm test # Run all tests
npm run test:watch # Run tests in watch mode
npm run test:coverage # Generate coverage report
# Linting
npm run lint # Check code style
npm run lint:fix # Fix code style issues
# Type checking
npm run typecheck # Check TypeScript types
Dockerโ
# Build image
npm run docker:build
# Run container
docker run -p 3000:3000 -e OPENAI_API_KEY=your_key llm-gateway
Monitoringโ
Health Checksโ
# Basic health check
curl http://localhost:3000/health
# Detailed health with provider status
curl http://localhost:3000/health/detailed
Metricsโ
The gateway exposes Prometheus metrics at /metrics
:
llm_gateway_requests_total
- Total number of requestsllm_gateway_request_duration_seconds
- Request duration histogramllm_gateway_provider_errors_total
- Provider error countsllm_gateway_cache_hits_total
- Cache hit/miss counts
Loggingโ
Structured logging with configurable levels:
// Log levels: error, warn, info, debug
LOG_LEVEL=info
// Log format: json, text
LOG_FORMAT=json
Deploymentโ
Environment Variablesโ
Required environment variables for production:
# Provider Configuration
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Cache Configuration
REDIS_URL=redis://localhost:6379
# Security
JWT_SECRET=your-jwt-secret
API_KEY=your-api-key
# Monitoring
OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:14268/api/traces
Kubernetesโ
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-gateway
spec:
replicas: 3
selector:
matchLabels:
app: llm-gateway
template:
metadata:
labels:
app: llm-gateway
spec:
containers:
- name: llm-gateway
image: llm-gateway:latest
ports:
- containerPort: 3000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: llm-secrets
key: openai-api-key
Contributingโ
- Fork the repository on GitLab
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a merge request
Repositoryโ
Licenseโ
MIT