Limits

Workers AI is now Generally Available. We've updated our rate limits to reflect this.

Note that model inferences in local mode using Wrangler will also count towards these limits. Beta models may have lower rate limits while we work on performance and scale.

Rate limits are default per task type, with some per-model limits defined as follows:

Rate limits by task type

Automatic Speech Recognition

720 requests per minute

Image Classification

3000 requests per minute

Image-to-Text

720 requests per minute

Object Detection

3000 requests per minute

Summarization

1500 requests per minute

Text Classification

2000 requests per minute

Text Embeddings

3000 requests per minute
@cf/baai/bge-large-en-v1.5 is 1500 requests per minute

Additional limits for Embedding Models

When using @cf/baai/bge embedding models, the following limits apply:

The maximum token limit per input is 512 tokens.
The maximum batch size is100 inputs per request.
- The total number of tokens across all inputs in the batch must not exceed internal processing limits.
- Larger inputs (closer to 512 tokens) may reduce the maximum batch size due to these constraints.

Behavior and constraints

Exceeding the batch size limit:If more than 100 inputs are provided, a 400 Bad Request error is returned.
Exceeding the token limit per input: If a single input exceeds 512 tokens, the request will fail with a 400 Bad Request error.
Combined constraints:Requests with both a high batch size and large token inputs may fail due to exceeding the model's processing limits.

Text Generation

300 requests per minute
@hf/thebloke/mistral-7b-instruct-v0.1-awq is 400 requests per minute
@cf/microsoft/phi-2 is 720 requests per minute
@cf/qwen/qwen1.5-0.5b-chat is 1500 requests per minute
@cf/qwen/qwen1.5-1.8b-chat is 720 requests per minute
@cf/qwen/qwen1.5-14b-chat-awq is 150 requests per minute
@cf/tinyllama/tinyllama-1.1b-chat-v1.0 is 720 requests per minute

Text-to-Image

720 requests per minute
@cf/runwayml/stable-diffusion-v1-5-img2img is 1500 requests per minute

Translation

720 requests per minute

Was this helpful?

Cloudflare Dashboard Discord Community Learning Center Support Portal