Model Gateway

Live Service

About

An authenticated, metered API gateway for LLM inference. Provides OpenAI-compatible endpoints with per-user rate limiting and usage tracking over a self-hosted llama.cpp backend.

Key Highlights

▸ Designed and deployed an OpenAI-compatible LLM inference gateway with JWT authentication and per-user metering
▸ Backed by self-hosted llama.cpp server; zero external API dependency
▸ Enables students and projects to consume LLM completions without direct model access

Technologies

llm api inference authentication metering python

Roles

llm-engineer backend-engineer