Guides · 流式响应

流式响应

通过 Server-Sent Events 实现逐 token 的实时响应。

概述

AllToken 支持通过 Server-Sent Events (SSE) 进行流式响应。当设置 stream: true 时，API 会在生成 token 的同时逐步返回，而不是等待完整响应。

这使得实时 UI 成为可能——文本逐字显示，显著改善用户感知的延迟。

在 Chat Completions 请求中添加 stream: true：

TypeScript

1	import OpenAI from 'openai';
2
3	const client = new OpenAI({
4	apiKey: process.env.ALLTOKEN_API_KEY,
5	baseURL: 'https://api.alltoken.ai/v1',
6	});
7
8	const stream = await client.chat.completions.create({
9	model: 'claude-sonnet-4',
10	messages: [{ role: 'user', content: '解释量子计算' }],
11	stream: true,
12	});
13
14	for await (const chunk of stream) {
15	const content = chunk.choices[0]?.delta?.content;
16	if (content) process.stdout.write(content);
17	}

每个 SSE 事件是一个以 data: 为前缀的 JSON 对象：

响应流

1	data: {"choices":[{"delta":{"content":"你好"},"index":0}]}
2	data: {"choices":[{"delta":{"content":"世界"},"index":0}]}
3	data: [DONE]

流以 data: [DONE] 结束。之后可能会跟随一个费用注释：

费用信息

1	: {"cost":"0.0012","input_price":"0.0003","output_price":"0.0009","prompt_tokens":15,"completion_tokens":42}

部分模型支持扩展推理。启用后，模型会在最终回答之前先输出其内部推理过程：

TypeScript

1	const stream = await client.chat.completions.create({
2	model: 'deepseek-reasoner',
3	messages: [{ role: 'user', content: '证明根号2是无理数' }],
4	stream: true,
5	});
6
7	for await (const chunk of stream) {
8	// 推理过程（思考链）
9	const thinking = chunk.choices[0]?.delta?.reasoning_content;
10	if (thinking) process.stderr.write(thinking);
11
12	// 最终回答
13	const content = chunk.choices[0]?.delta?.content;
14	if (content) process.stdout.write(content);
15	}

reasoning_content 字段包含模型的逐步推理过程，而 content 包含最终回答。

流式传输中的错误通过 SSE 事件的 error 字段传递：

错误响应

1	data: {"error":{"message":"Rate limit exceeded","type":"rate_limit_error","code":429}}

常见的流式错误：

使用 AbortController 取消正在进行的流式请求：

TypeScript

1	const controller = new AbortController();
2
3	const stream = await client.chat.completions.create(
4	{
5	model: 'minimax-m2.7',
6	messages: [{ role: 'user', content: '写一篇长文' }],
7	stream: true,
8	},
9	{ signal: controller.signal }
10	);
11
12	// 5秒后取消
13	setTimeout(() => controller.abort(), 5000);