Kimi K2.5 API — Access Moonshot AI's Kimi via AIsa

Kimi K2.5 API: Access Moonshot AI's Kimi Models

Kimi K2.5 is Moonshot AI's flagship model: a 1-trillion-parameter mixture-of-experts architecture that activates only 32 billion parameters per request, delivering top-tier reasoning, visual coding, and agentic tool-calling at a fraction of the compute cost of a dense model its size.

Through AIsa, you access Kimi K2.5 at approximately 80% of Moonshot AI's official pricing, under a formal enterprise data agreement that guarantees your data is never retained or stored by Moonshot after processing. One AIsa key. No Moonshot account required.


Kimi K2.5 at a glance

SpecValue
Total parameters1 trillion
Active parameters per request32 billion
ArchitectureMixture-of-Experts (MoE)
Context window256,000 tokens
Release dateJanuary 27, 2026
Input pricing (via AIsa)~$0.48/M tokens (≈80% of $0.60/M official)
Output pricing (via AIsa)~$2.00/M tokens (≈80% of $2.50/M official)
Cache hit pricing$0.10/M input tokens

See marketplace.aisa.one/pricing for exact current rates.


Quickstart

Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_AISA_API_KEY",
    base_url="https://api.aisa.one/v1"
)

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {"role": "system", "content": "You are an expert software engineer and technical architect."},
        {"role": "user", "content": "Design the database schema for a multi-tenant SaaS billing system."}
    ]
)
print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.AISA_API_KEY,
  baseURL: "https://api.aisa.one/v1",
});

const response = await client.chat.completions.create({
  model: "kimi-k2.5",
  messages: [
    { role: "user", content: "Analyse this React component and suggest performance improvements." }
  ],
});
console.log(response.choices[0].message.content);

Streaming

stream = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[{"role": "user", "content": "Walk me through building an agentic coding assistant step by step."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Why Kimi K2.5?

Mixture-of-Experts efficiency

The 1T/32B MoE architecture means Kimi K2.5 has the knowledge capacity of a 1-trillion-parameter model but the inference speed and cost of a 32-billion-parameter model. Only ~3% of the network activates per token, which translates directly to faster responses and lower cost compared to a dense model of equivalent capability.

Built for agents

Kimi K2.5 was specifically designed for agentic use cases. It supports:

  • Tool calling — natively compatible with OpenAI function calling schema
  • JSON mode — structured output for downstream parsing
  • Partial mode — stream structured data before the full response completes
  • Internet search — built-in search capability (available through AIsa's web search tools)
  • Extended context — 256K tokens with automatic caching for repeated prefixes

Visual coding

Kimi K2.5 excels at reading and reasoning about visual content: UI screenshots, architecture diagrams, database schemas, and code rendered as images. This makes it particularly powerful for:

  • Reviewing UI/UX mockups and generating the corresponding code
  • Describing and debugging rendered output from data visualisations
  • Extracting structured data from screenshots of tables or dashboards

Agentic tool calling

Kimi K2.5 handles complex multi-step agentic workflows reliably. Here's an example with multiple tools:

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_codebase",
            "description": "Search the codebase for files matching a pattern",
            "parameters": {
                "type": "object",
                "properties": {
                    "pattern": {"type": "string"},
                    "file_type": {"type": "string", "enum": ["py", "ts", "js", "go", "rust"]}
                },
                "required": ["pattern"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the contents of a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"}
                },
                "required": ["path"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Write content to a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string"},
                    "content": {"type": "string"}
                },
                "required": ["path", "content"]
            }
        }
    }
]

messages = [
    {"role": "user", "content": "Find all authentication-related files in the codebase and add rate limiting to each login endpoint."}
]

# Agentic loop
while True:
    response = client.chat.completions.create(
        model="kimi-k2.5",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )

    message = response.choices[0].message
    messages.append(message)

    if not message.tool_calls:
        print("Task complete:", message.content)
        break

    # Execute tool calls and feed results back
    for tool_call in message.tool_calls:
        result = execute_tool(tool_call.function.name, tool_call.function.arguments)
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": str(result)
        })

JSON mode and structured output

import json

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {"role": "user", "content": "Extract all API endpoints from this codebase description and return as JSON with: method, path, description, and auth_required fields."}
    ],
    response_format={"type": "json_object"}
)

endpoints = json.loads(response.choices[0].message.content)

Using Kimi K2.5 with the 256K context window

The 256K context window covers approximately 200,000 words — enough for most novels, large codebases, or extended research documents:

# Load a large codebase into context
import os

codebase_contents = ""
for root, dirs, files in os.walk("./my-project"):
    # Skip node_modules, .git, etc.
    dirs[:] = [d for d in dirs if d not in ['.git', 'node_modules', '__pycache__']]
    for file in files:
        if file.endswith(('.py', '.ts', '.js')):
            path = os.path.join(root, file)
            with open(path) as f:
                codebase_contents += f"\n\n--- {path} ---\n{f.read()}"

response = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {"role": "system", "content": "You are a code reviewer. Analyse the entire codebase provided."},
        {"role": "user", "content": f"Here is the codebase:\n{codebase_contents}\n\nIdentify all security vulnerabilities and provide remediation steps."}
    ]
)

Caching: reduce cost on repeated context

Kimi K2.5 supports prompt caching. When the same prefix (e.g., a system prompt, document, or codebase) appears across multiple requests, cache hits cost $0.10/M input instead of the full rate.

# First call — cold (full rate)
response1 = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {"role": "system", "content": large_system_prompt},  # This prefix is cached
        {"role": "user", "content": "Question 1"}
    ]
)

# Second call — cache hit (90% cheaper on the cached prefix)
response2 = client.chat.completions.create(
    model="kimi-k2.5",
    messages=[
        {"role": "system", "content": large_system_prompt},  # Cached — $0.10/M
        {"role": "user", "content": "Question 2"}            # New — full rate
    ]
)

Enterprise data privacy

AIsa holds a Supplemental Enterprise Service Agreement with Moonshot AI (effective February 10, 2026) with the following guarantees:

  • Customer data is not retained by Moonshot AI after processing
  • Generated outputs are not stored on Moonshot's infrastructure
  • Data is not used for model training or fine-tuning
  • Processing occurs within the boundaries of the enterprise agreement

For organisations with GDPR, HIPAA, or internal data governance requirements, contact us for a copy of the full data processing agreement.


What's next