← All projects

Model Gateway

Live Service

About

An authenticated, metered API gateway for LLM inference. Provides OpenAI-compatible endpoints with per-user rate limiting and usage tracking over a self-hosted llama.cpp backend.

Key Highlights

  • Designed and deployed an OpenAI-compatible LLM inference gateway with JWT authentication and per-user metering
  • Backed by self-hosted llama.cpp server; zero external API dependency
  • Enables students and projects to consume LLM completions without direct model access

Technologies

llm api inference authentication metering python

Roles

llm-engineer backend-engineer