💰 Budgets, Rate Limits

Requirements:

Need to a postgres database (e.g. Supabase, Neon, etc) See Setup

Set Budgets

You can set budgets at 3 levels:

For the proxy
For an internal user
For a customer (end-user)
For a key
For a key (model specific budgets)

Apply a budget across all calls on the proxy

Step 1. Modify config.yaml

general_settings:
  master_key: sk-1234

litellm_settings:
  # other litellm settings
  max_budget: 0 # (float) sets max budget as $0 USD
  budget_duration: 30d # (str) frequency of reset - You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

Step 2. Start proxy

litellm /path/to/config.yaml

Step 3. Send test call

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Autherization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "gpt-3.5-turbo",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ],
}'

You can: - Add budgets to Teams

info

Step-by step tutorial on setting, resetting budgets on Teams here (API or using Admin UI)

👉 https://docs.litellm.ai/docs/proxy/team_budgets

Add budgets to teams

curl --location 'http://localhost:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_alias": "my-new-team_4",
  "members_with_roles": [{"role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"}],
  "rpm_limit": 99
}' 

See Swagger

Sample Response

{
    "team_alias": "my-new-team_4",
    "team_id": "13e83b19-f851-43fe-8e93-f96e21033100",
    "admins": [],
    "members": [],
    "members_with_roles": [
        {
            "role": "admin",
            "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"
        }
    ],
    "metadata": {},
    "tpm_limit": null,
    "rpm_limit": 99,
    "max_budget": null,
    "models": [],
    "spend": 0.0,
    "max_parallel_requests": null,
    "budget_duration": null,
    "budget_reset_at": null
}

Add budget duration to teams

budget_duration: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

curl 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_alias": "my-new-team_4",
  "members_with_roles": [{"role": "admin", "user_id": "5c4a0aa3-a1e1-43dc-bd87-3c2da8382a3a"}],
  "budget_duration": 10s,
}'

Use this when you want to budget a users spend within a Team

Step 1. Create User

Create a user with user_id=ishaan

curl --location 'http://0.0.0.0:4000/user/new' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
        "user_id": "ishaan"
}'

Step 2. Add User to an existing Team - set `max_budget_in_team`

Set max_budget_in_team when adding a User to a team. We use the same user_id we set in Step 1

curl -X POST 'http://0.0.0.0:4000/team/member_add' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"team_id": "e8d1460f-846c-45d7-9b43-55f3cc52ac32", "max_budget_in_team": 0.000000000001, "member": {"role": "user", "user_id": "ishaan"}}'

Step 3. Create a Key for Team member from Step 1

Set user_id=ishaan from step 1

curl --location 'http://0.0.0.0:4000/key/generate' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
        "user_id": "ishaan",
        "team_id": "e8d1460f-846c-45d7-9b43-55f3cc52ac32"
}'

Response from /key/generate

We use the key from this response in Step 4

{"key":"sk-RV-l2BJEZ_LYNChSx2EueQ", "models":[],"spend":0.0,"max_budget":null,"user_id":"ishaan","team_id":"e8d1460f-846c-45d7-9b43-55f3cc52ac32","max_parallel_requests":null,"metadata":{},"tpm_limit":null,"rpm_limit":null,"budget_duration":null,"allowed_cache_controls":[],"soft_budget":null,"key_alias":null,"duration":null,"aliases":{},"config":{},"permissions":{},"model_max_budget":{},"key_name":null,"expires":null,"token_id":null}% 

Step 4. Make /chat/completions requests for Team member

Use the key from step 3 for this request. After 2-3 requests expect to see The following error ExceededBudget: Crossed spend within team

curl --location 'http://localhost:4000/chat/completions' \
    --header 'Authorization: Bearer sk-RV-l2BJEZ_LYNChSx2EueQ' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "llama3",
    "messages": [
        {
        "role": "user",
        "content": "tes4"
        }
    ]
}'

Use this to budget user passed to /chat/completions, without needing to create a key for every user

Step 1. Modify config.yaml Define litellm.max_end_user_budget

general_settings:
  master_key: sk-1234

litellm_settings:
  max_end_user_budget: 0.0001 # budget for 'user' passed to /chat/completions

Make a /chat/completions call, pass 'user' - First call Works

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

Make a /chat/completions call, pass 'user' - Call Fails, since 'ishaan3' over budget

curl --location 'http://0.0.0.0:4000/chat/completions' \
        --header 'Content-Type: application/json' \
        --header 'Authorization: Bearer sk-zi5onDRdHGD24v0Zdn7VBA' \
        --data ' {
        "model": "azure-gpt-3.5",
        "user": "ishaan3",
        "messages": [
            {
            "role": "user",
            "content": "what time is it"
            }
        ]
        }'

Error

{"error":{"message":"Budget has been exceeded: User ishaan3 has exceeded their budget. Current spend: 0.0008869999999999999; Max Budget: 0.0001","type":"auth_error","param":"None","code":401}}%                

Apply a budget on a key.

You can:

Add budgets to keys Jump
Add budget durations, to reset spend Jump

Expected Behaviour

Costs Per key get auto-populated in LiteLLM_VerificationToken Table
After the key crosses it's max_budget, requests fail
If duration set, spend is reset at the end of the duration

By default the max_budget is set to null and is not checked for keys

Add budgets to keys

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_id": "core-infra", # [OPTIONAL]
  "max_budget": 10,
}'

Example Request to /chat/completions when key has crossed budget

curl --location 'http://0.0.0.0:4000/chat/completions' \
  --header 'Content-Type: application/json' \
  --header 'Authorization: Bearer <generated-key>' \
  --data ' {
  "model": "azure-gpt-3.5",
  "user": "e09b4da8-ed80-4b05-ac93-e16d9eb56fca",
  "messages": [
      {
      "role": "user",
      "content": "respond in 50 lines"
      }
  ],
}'

Expected Response from /chat/completions when key has crossed budget

{
  "detail":"Authentication Error, ExceededTokenBudget: Current spend for token: 7.2e-05; Max Budget for Token: 2e-07"
}   

Add budget duration to keys

budget_duration: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_id": "core-infra", # [OPTIONAL]
  "max_budget": 10,
  "budget_duration": 10s,
}'

Apply a budget across all calls an internal user (key owner) can make on the proxy.

info

For most use-cases, we recommend setting team-member budgets

LiteLLM exposes a /user/new endpoint to create budgets for this.

You can:

Add budgets to users Jump
Add budget durations, to reset spend Jump

By default the max_budget is set to null and is not checked for keys

Add budgets to users

curl --location 'http://localhost:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["azure-models"], "max_budget": 0, "user_id": "krrish3@berri.ai"}' 

See Swagger

Sample Response

{
    "key": "sk-YF2OxDbrgd1y2KgwxmEA2w",
    "expires": "2023-12-22T09:53:13.861000Z",
    "user_id": "krrish3@berri.ai",
    "max_budget": 0.0
}

Add budget duration to users

budget_duration: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "team_id": "core-infra", # [OPTIONAL]
  "max_budget": 10,
  "budget_duration": 10s,
}'

Create new keys for existing user

Now you can just call /key/generate with that user_id (i.e. krrish3@berri.ai) and:

Budget Check: krrish3@berri.ai's budget (i.e. $10) will be checked for this key
Spend Tracking: spend for this key will update krrish3@berri.ai's spend as well

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": "krrish3@berri.ai"}'

Apply model specific budgets on a key.

Expected Behaviour

model_spend gets auto-populated in LiteLLM_VerificationToken Table
After the key crosses the budget set for the model in model_max_budget, calls fail

By default the model_max_budget is set to {} and is not checked for keys

info

LiteLLM will track the cost/budgets for the model passed to LLM endpoints (/chat/completions, /embeddings)

Add model specific budgets to keys

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  model_max_budget={"gpt4": 0.5, "gpt-5": 0.01}
}'

Reset Budgets

Reset budgets across keys/internal users/teams/customers

budget_duration: Budget is reset at the end of specified duration. If not set, budget is never reset. You can set duration as seconds ("30s"), minutes ("30m"), hours ("30h"), days ("30d").

Internal Users
Keys
Teams

curl 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "max_budget": 10,
  "budget_duration": 10s, # 👈 KEY CHANGE
}'

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "max_budget": 10,
  "budget_duration": 10s, # 👈 KEY CHANGE
}'

curl 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "max_budget": 10,
  "budget_duration": 10s, # 👈 KEY CHANGE
}'

Note: By default, the server checks for resets every 10 minutes, to minimize DB calls.

To change this, set proxy_budget_rescheduler_min_time and proxy_budget_rescheduler_max_time

E.g.: Check every 1 seconds

general_settings: 
  proxy_budget_rescheduler_min_time: 1
  proxy_budget_rescheduler_max_time: 1

Set Rate Limits

You can set:

tpm limits (tokens per minute)
rpm limits (requests per minute)
max parallel requests
rpm / tpm limits per model for a given key

Per Team
Per Internal User
Per Key
Per API Key Per model
For customers

Use /team/new or /team/update, to persist rate limits across multiple keys for a team.

curl --location 'http://0.0.0.0:4000/team/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"team_id": "my-prod-team", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' 

See Swagger

Expected Response

{
    "key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q",
    "expires": "2024-01-19T01:21:12.816168",
    "team_id": "my-prod-team",
}

Use /user/new or /user/update, to persist rate limits across multiple keys for internal users.

curl --location 'http://0.0.0.0:4000/user/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"user_id": "krrish@berri.ai", "max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' 

See Swagger

Expected Response

{
    "key": "sk-sA7VDkyhlQ7m8Gt77Mbt3Q",
    "expires": "2024-01-19T01:21:12.816168",
    "user_id": "krrish@berri.ai",
}

Use /key/generate, if you want them for just that key.

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"max_parallel_requests": 10, "tpm_limit": 20, "rpm_limit": 4}' 

Expected Response

{
    "key": "sk-ulGNRXWtv7M0lFnnsQk0wQ",
    "expires": "2024-01-18T20:48:44.297973",
    "user_id": "78c2c8fc-c233-43b9-b0c3-eb931da27b84"  // 👈 auto-generated
}

Set rate limits per model per api key

Set model_rpm_limit and model_tpm_limit to set rate limits per model per api key

Here gpt-4 is the model_name set on the litellm config.yaml

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{"model_rpm_limit": {"gpt-4": 2}, "model_tpm_limit": {"gpt-4":}}' 

Expected Response

{
    "key": "sk-ulGNRXWtv7M0lFnnsQk0wQ",
    "expires": "2024-01-18T20:48:44.297973",
}

Verify Model Rate Limits set correctly for this key

Make /chat/completions request check if x-litellm-key-remaining-requests-gpt-4 returned

curl -i http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-ulGNRXWtv7M0lFnnsQk0wQ" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello, Claude!ss eho ares"}
    ]
  }'

Expected headers

x-litellm-key-remaining-requests-gpt-4: 1
x-litellm-key-remaining-tokens-gpt-4: 179

These headers indicate:

1 request remaining for the GPT-4 model for key=sk-ulGNRXWtv7M0lFnnsQk0wQ
179 tokens remaining for the GPT-4 model for key=sk-ulGNRXWtv7M0lFnnsQk0wQ

info

You can also create a budget id for a customer on the UI, under the 'Rate Limits' tab.

Use this to set rate limits for user passed to /chat/completions, without needing to create a key for every user

Step 1. Create Budget

Set a tpm_limit on the budget (You can also pass rpm_limit if needed)

curl --location 'http://0.0.0.0:4000/budget/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
    "budget_id" : "free-tier",
    "tpm_limit": 5
}'

Step 2. Create `Customer` with Budget

We use budget_id="free-tier" from Step 1 when creating this new customers

curl --location 'http://0.0.0.0:4000/customer/new' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
    "user_id" : "palantir",
    "budget_id": "free-tier"
}'

Step 3. Pass `user_id` id in `/chat/completions` requests

Pass the user_id from Step 2 as user="palantir"

curl --location 'http://localhost:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "llama3",
    "user": "palantir",
    "messages": [
        {
        "role": "user",
        "content": "gm"
        }
    ]
}'

Set default budget for ALL internal users

Use this to set a default budget for users who you give keys to.

This will apply when a user has user_role="internal_user" (set this via /user/new or /user/update).

This will NOT apply if a key has a team_id (team budgets will apply then). Tell us how we can improve this!

Define max budget in your config.yaml

model_list: 
  - model_name: "gpt-3.5-turbo"
    litellm_params:
      model: gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY

litellm_settings:
  max_internal_user_budget: 0 # amount in USD
  internal_user_budget_duration: "1mo" # reset every month

Create key for user

curl -L -X POST 'http://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{}'

Expected Response:

{
  ...
  "key": "sk-X53RdxnDhzamRwjKXR4IHg"
}

Test it!

curl -L -X POST 'http://0.0.0.0:4000/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-X53RdxnDhzamRwjKXR4IHg' \
-d '{
    "model": "gpt-3.5-turbo",
    "messages": [{"role": "user", "content": "Hey, how's it going?"}]
}'

Expected Response:

{
    "error": {
        "message": "ExceededBudget: User=<user_id> over budget. Spend=3.7e-05, Budget=0.0",
        "type": "budget_exceeded",
        "param": null,
        "code": "400"
    }
}

Grant Access to new model

Use model access groups to give users access to select models, and add new ones to it over time (e.g. mistral, llama-2, etc.).

Difference between doing this with /key/generate vs. /user/new? If you do it on /user/new it'll persist across multiple keys generated for that user.

Step 1. Assign model, access group in config.yaml

model_list:
  - model_name: text-embedding-ada-002
    litellm_params:
      model: azure/azure-embedding-model
      api_base: "os.environ/AZURE_API_BASE"
      api_key: "os.environ/AZURE_API_KEY"
      api_version: "2023-07-01-preview"
    model_info:
      access_groups: ["beta-models"] # 👈 Model Access Group

Step 2. Create key with access group

curl --location 'http://localhost:4000/user/new' \
-H 'Authorization: Bearer <your-master-key>' \
-H 'Content-Type: application/json' \
-d '{"models": ["beta-models"], # 👈 Model Access Group
            "max_budget": 0}'

Create new keys for existing internal user

Just include user_id in the /key/generate request.

curl --location 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer <your-master-key>' \
--header 'Content-Type: application/json' \
--data '{"models": ["azure-models"], "user_id": "krrish@berri.ai"}'

💰 Budgets, Rate Limits

Set Budgets​

Add budgets to teams​

Add budget duration to teams​

Step 1. Create User​

Step 2. Add User to an existing Team - set max_budget_in_team​

Step 3. Create a Key for Team member from Step 1​

Step 4. Make /chat/completions requests for Team member​

Add budgets to keys​

Add budget duration to keys​

Add budgets to users​

Add budget duration to users​

Create new keys for existing user​

Add model specific budgets to keys​

Reset Budgets​

Set Rate Limits​

Step 1. Create Budget​

Step 2. Create Customer with Budget​

Step 3. Pass user_id id in /chat/completions requests​

Set default budget for ALL internal users​

Grant Access to new model​

Create new keys for existing internal user​

Set Budgets

Add budgets to teams

Add budget duration to teams

Step 1. Create User

Step 2. Add User to an existing Team - set `max_budget_in_team`

Step 3. Create a Key for Team member from Step 1

Step 4. Make /chat/completions requests for Team member

Add budgets to keys

Add budget duration to keys

Add budgets to users

Add budget duration to users

Create new keys for existing user

Add model specific budgets to keys

Reset Budgets

Set Rate Limits

Step 1. Create Budget

Step 2. Create `Customer` with Budget

Step 3. Pass `user_id` id in `/chat/completions` requests

Set default budget for ALL internal users

Grant Access to new model

Create new keys for existing internal user