GLM-5 Benchmark, Price, API and Architecture

Overview

What Is GLM-5

GLM-5 is Zhipu AI's fifth-generation large language model, targeting complex systems engineering and long-horizon agentic tasks. It scales up from GLM-4.5 with 355B/32B active parameters to 744B/40B active, and grows pretraining data from 23T to 28.5T tokens.

Unified Hybrid Reasoning

Think and non-think modes share one base model, with differences mainly in post-training strategy.

Long Context with Cost Controls

DeepSeek Sparse Attention reduces inference cost while preserving 200K context capability for long-document workflows.

Open Source + API Access

GLM-5 weights are available on Hugging Face and ModelScope under MIT license, with API access through multiple providers.

Architecture

GLM-5 Architecture

GLM-5 uses a unified model architecture with hybrid reasoning mode switching. Think and non-think share the same base, with DSA for long-context efficiency and the slime asynchronous RL infrastructure for training.

1

744B

Total Parameters

2

40B

Active Parameters

3

28.5T Tokens

Pretraining Data

4

Unified Think/Non-think

Reasoning Mode

Stage#1

Unified Base Model

GLM-5 uses a single unified model base, eliminating the need for separate models when switching between think and non-think usage.

Stage#2

Post-Training Specialization

Differences between think and non-think behavior are introduced in post-training, enabling quality and latency tradeoffs.

Stage#3

Hybrid Reasoning Switch

Hybrid reasoning can be enabled based on task complexity and workflow needs for flexible deployment.

Stage#4

DSA for Long Context

DeepSeek Sparse Attention reduces inference cost while preserving long-context capability across large input sequences.

Stage#5

Asynchronous RL Infrastructure

The slime framework provides asynchronous reinforcement learning for higher training throughput and faster iteration cycles.

Stage#6

Function Calling + Parallel Tools

GLM-5 supports Function Calling and parallel tool calls, enabling complex agentic workflows with multi-step orchestration.

Context Window

GLM-5 Context Window

Context window and output limits for GLM-5 across major providers. Displayed limits can vary by platform and routing layer.

1

200K context / up to 128K output

GLM-5 (docs.z.ai)

2

202,752 context

GLM-5 (OpenRouter)

3

up to 202,752

OpenRouter max completion

4

Limits vary by endpoint

Provider caveat

Stage#1

Long Docs and Multi-File Tasks

A 200K context window supports long-document QA, cross-file code analysis, and multi-step planning workflows.

Stage#2

Controllable Output Limits

GLM-5 supports up to 128K output tokens in official docs. Effective output limits can vary by provider and endpoint settings.

Stage#3

Platform-Level Differences

The 202,752 context number comes from OpenRouter. For production, verify effective limits on your target provider.

Benchmark

GLM-5 Benchmark Overview

Published benchmark visuals covering agentic, reasoning, coding, and long-horizon tasks for side-by-side model comparison.

Sources: docs.z.ai and z.ai/blog/glm-5, captured February 12, 2026

GLM-4.7GLM-5Claude Opus 4.5Gemini 3.0 ProGPT-5.2

LLM Performance Evaluation

Eight public benchmarks including Humanity's Last Exam, SWE-bench, Terminal-Bench, MCP-Atlas, and Vending Bench 2.

CC-Bench-V2

Comparison chart for frontend, backend, and long-horizon engineering tasks.

77.8

SWE-bench Verified

GLM-5 score on SWE-bench Verified for real-world code repair tasks.

73.3

SWE-bench Multilingual

Performance on multilingual software engineering tasks across multiple languages.

56.2

Terminal-Bench 2.0

Score on terminal-agent workloads measuring command-line task completion.

$4,432

Vending Bench 2

Final balance achieved by GLM-5 in the business simulation benchmark.

API Price

GLM-5 API Price by Platform

Pricing from docs.z.ai and OpenRouter as of February 12, 2026. All prices shown here are in USD; re-check provider pages before production launch.

1

$1.00 / 1M tokens

GLM-5 Input

2

$0.20 / 1M tokens

GLM-5 Cached Input

3

$3.20 / 1M tokens

GLM-5 Output

4

$1.20 in / $5.00 out

GLM-5-Code

Stage#1

GLM-5 via docs.z.ai

Input $1.00 / 1M, cached input $0.20 / 1M, and output $3.20 / 1M in USD.

Stage#2

GLM-5-Code via docs.z.ai

Input $1.20 / 1M, cached input $0.30 / 1M, and output $5.00 / 1M in USD.

Stage#3

Cached Input Storage

Cached input storage is marked as limited-time free on the official pricing page.

Stage#4

GLM-5 via OpenRouter

GLM-5 is listed at $1 / 1M input and $3.20 / 1M output in USD.

Stage#5

Per-1M Token Units

Both docs.z.ai and OpenRouter list prices per 1M tokens; check provider billing granularity before launch.

Stage#6

Provider Billing Differences

Final billed cost may differ by routing policy, platform markup, and cache behavior across providers.

Benefits

Why Teams Are Evaluating GLM-5 in 2026

GLM-5 stands out for systems-engineering workloads, long-horizon agent tasks, competitive pricing, and record-low hallucination rates.

GLM-5 is positioned for complex systems engineering and high-complexity execution workflows with multi-step tool use.

FAQ

GLM-5 Frequently Asked Questions

Common questions about GLM-5 benchmarks, API pricing, context windows, and model capabilities.

1

What is the GLM-5 API price?

As of February 12, 2026, docs.z.ai lists GLM-5 at $1.00/1M input, $0.20/1M cached input, and $3.20/1M output; GLM-5-Code is $1.20/1M input, $0.30/1M cached input, and $5.00/1M output. OpenRouter lists GLM-5 at $1/1M input and $3.20/1M output.

2

What is the GLM-5 context window?

GLM-5 supports a 200K context window with up to 128K output tokens in official docs. OpenRouter currently lists 202,752 context, and effective limits can vary by provider endpoint.

3

Is GLM-5 a Mixture-of-Experts model?

Yes. GLM-5 has 744B total parameters with 40B active per token, using a sparse-activation MoE architecture with 256 experts and 8 activated per token.

4

How does GLM-5 compare to other frontier models?

GLM-5 scores 77.8 on SWE-bench Verified. In published comparison charts, Claude Opus 4.5 is 80.9 and Gemini 3.0 Pro is 76.2. GLM-5 ranks first among open-weight models on Vending Bench 2.

5

Why do OpenRouter prices differ from api.z.ai?

OpenRouter is a routing platform and may apply platform-level routing and margin policies. Always verify final billing rules and units on each provider page.

6

Are GLM-5 weights open-source?

Yes. GLM-5 weights are available on Hugging Face and ModelScope under the MIT License, supporting local deployment with vLLM and SGLang.

GLM-5: Benchmark, Price, API and Architecture

What People Are Saying About GLM-5

GLM-5 Fully Tested: I GOT EARLY ACCESS & YES, IT BEATS 4.6 OPUS!

GLM-5 Is HERE – Is THIS the BEST Open Source Coding Model?

GLM-5: From Vibe Coding to Agentic Engineering

What Is GLM-5

Unified Hybrid Reasoning

Long Context with Cost Controls

Open Source + API Access

GLM-5 Architecture

Unified Base Model

Post-Training Specialization

Hybrid Reasoning Switch

DSA for Long Context

Asynchronous RL Infrastructure

Function Calling + Parallel Tools

GLM-5 Context Window

Long Docs and Multi-File Tasks

Controllable Output Limits

Platform-Level Differences

GLM-5 Benchmark Overview

LLM Performance Evaluation

CC-Bench-V2

GLM-5 API Price by Platform

GLM-5 via docs.z.ai

GLM-5-Code via docs.z.ai

Cached Input Storage

GLM-5 via OpenRouter

Per-1M Token Units

Provider Billing Differences

Why Teams Are Evaluating GLM-5 in 2026

GLM-5 Frequently Asked Questions

What is the GLM-5 API price?

What is the GLM-5 context window?

Is GLM-5 a Mixture-of-Experts model?

How does GLM-5 compare to other frontier models?

Why do OpenRouter prices differ from api.z.ai?

Are GLM-5 weights open-source?