AI Model Pricing (LLM Inference)

This page outlines the pricing structure for all AI models available through AISA’s unified LLM inference API.

LLM usage is billed based on token consumption. Each request is charged separately for:

  • Input tokens: the tokens included in your prompt
  • Output tokens: the tokens generated by the model

All prices listed on this page are in USD per 1 million tokens (1M tokens).

How Token-Based Billing Works

When you send a request to an AI model:

  1. Your prompt is converted into input tokens.
  2. The model generates output tokens.
  3. Both input and output tokens are counted separately.
  4. The total cost is calculated using the model’s pricing.

The billing formula is:

Total Cost = (Input tokens ÷ 1,000,000 × Input price) + (Output tokens ÷ 1,000,000 × Output price)

For example:

  • If a model charges $1.00 per 1M input tokens
  • And you send 2,000 input tokens
  • The input cost is:

2,000 ÷ 1,000,000 × 1.00 = $0.002

The same calculation applies to output tokens.

What Counts as Tokens?

Tokens represent fragments of text processed by the model. They may include:

  • Words
  • Punctuation
  • Numbers
  • Formatting characters

Longer prompts and longer outputs consume more tokens and therefore increase cost.

Streaming responses are billed the same way as non-streaming responses, based on total tokens generated.

Model Versions and Naming

Some models include version identifiers such as:

  • Date-based versions (e.g., -2025-12-11)
  • “thinking” variants
  • “mini” or “flash” variants

These represent distinct models and may have different pricing.

If a model is updated or replaced, pricing may differ between versions.

Group-Based Pricing

If your workspace uses multiple groups, pricing may vary by group.

Group-level pricing rules and ratios are applied automatically during billing. You can view the final calculated cost for each request in the Usage Logs page.

AI Model Pricing Table

AISA supports multiple types of AI models. Pricing is categorized based on how the model consumes compute:

  • Token-based pricing: used for text and multimodal LLM inference.
  • Media-based pricing: used for image generation and video generation models.

All token-based prices are listed per 1 million tokens (1M tokens).

Media models are priced per generated asset or per processing duration.

Model NameInput (USD / 1M tokens)Output (USD / 1M tokens)
gpt-4.11.45.6
gpt-4.1-mini0.281.12
gpt-4o1.757
gpt-4o-mini0.1050.42
gpt-50.8757
gpt-5-mini0.1751.4
gpt-5.21.2259.8
gpt-5.2-2025-12-111.2259.8
gpt-5.2-chat-latest1.2259.8
gpt-5.41.7510.5
gpt-oss-120b0.0280.133
claude-3-7-sonnet-202502192.110.5
claude-3-7-sonnet-20250219-thinking2.110.5
claude-haiku-4-5-202510010.73.5
claude-opus-4-1-2025080510.552.5
claude-opus-4-1-20250805-thinking10.552.5
claude-opus-4-2025051410.552.5
claude-opus-4-20250514-thinking10.552.5
claude-opus-4-63.517.5
claude-sonnet-4-202505142.110.5
claude-sonnet-4-20250514-thinking2.110.5
claude-sonnet-4-5-202509292.110.5
claude-sonnet-4-62.110.5
claude-sonnet-4-6-thinking2.110.5
deepseek-r10.40181.6058
deepseek-v30.20090.8029
deepseek-v3-03240.20090.8029
deepseek-v3.10.40181.2047
gemini-2.5-flash0.211.75
gemini-2.5-flash-lite0.070.28
gemini-2.5-pro0.8757
gemini-3-pro-image-preview1.48.4
gemini-3-pro-preview1.48.4
gemini-3.1-pro-preview1.48.4
grok-32.110.5
grok-42.110.5
kimi-k2-thinking0.40181.6058
kimi-k2.50.40182.1077
MiniMax-M2.50.210.84
qwen-flash0.02250.18
qwen-mt-flash0.0720.2205
qwen-mt-lite0.0840.252
qwen-plus-2025-12-010.280.84
qwen-vl-max0.562.24
qwen3-coder-480b-a35b-instruct1.055.25
qwen3-coder-flash0.211.05
qwen3-coder-plus0.73.5
qwen3-max0.723.6
qwen3-max-2026-01-230.723.6
qwen3-omni-flash0.3011.162
qwen3-omni-flash-2025-12-010.3011.162
qwen3-vl-flash0.0350.28
qwen3-vl-flash-2025-10-150.0350.28
qwen3-vl-plus0.141.12
qwen3-vl-plus-2025-12-190.141.12
seed-1-6-2509150.2250.9
seed-1-6-flash-2507150.06750.27
seed-1-8-2512280.2251.8

Image & Video Generation Pricing

Some models generate media rather than tokens. These models are billed per asset or processing duration.

Model Name

Pricing

gemini-3-pro-image-preview

Token-based pricing (see token table)

seedream-4-5-251128

$0.036 per generated image

qwen WAN 2.6

$0.0688 per second (720p)
$0.1144 per second (1080p)

Important Notes

  • All prices are listed in USD.
  • Text-based models are billed per input and output token.
  • Image generation models are billed per generated image.
  • Video generation models are billed per second of generated video.
  • Pricing is usage-based and calculated per request.
  • Model availability and pricing may change over time.
  • Always refer to the Marketplace for the most up-to-date pricing information.
  • The final billed amount for each request can be verified in Usage Logs.