About
An authenticated, metered API gateway for LLM inference. Provides OpenAI-compatible endpoints with per-user rate limiting and usage tracking over a self-hosted llama.cpp backend.
Key Highlights
- ▸ Designed and deployed an OpenAI-compatible LLM inference gateway with JWT authentication and per-user metering
- ▸ Backed by self-hosted llama.cpp server; zero external API dependency
- ▸ Enables students and projects to consume LLM completions without direct model access
Technologies
llm api inference authentication metering python
Roles
llm-engineer backend-engineer