Choose the plan that's right for you
Developer
Powerful speed and reliability to start your project
100 requests/min rate limit
Up to 100 deployed models
Custom PEFT add-ons
Pay per usage
Business
A plan that scales with your production usage
Everything from the Developer plan
Custom rate limits
Team collaboration features
API telemetry and metrics
Dedicated email support
Enterprise
Personalized configurations for serving at scale
Everything from the Business plan
Custom pricing
Unlimited rate limits
Unlimited deployed models
Custom base models
Dedicated and self-hosted deployments
Specialized enterprise support
Text Models
Per-token pricing is applied only for non-enterprise deployments. Contact us for dedicated deployment pricing options.
Input tokens are determined from the prompt you supply in the request. Output tokens are the completions generated by the model.
Base model parameter count | $/1M input tokens | $/1M output tokens |
---|---|---|
up to 16B | $0.20 | $0.80 |
16.1B - 80B | $0.70 | $2.80 |
Mixtral 8x7B | $0.40 | $1.60 |
Image Models
For image generation models like SDXL, we charge based on the number of inference steps (denoising iterations).
SDXL, $/step | SDXL w/ ControlNet, $/step |
---|---|
$0.0002 | $0.0003 |
Multi-Modal
For multi-modal models like LLaVA, each image is billed as 576 prompt tokens.