Streaming

Real-time token-by-token responses via Server-Sent Events.

Overview

AllToken supports streaming via Server-Sent Events (SSE). With stream: true, the API sends tokens as they arrive — no waiting for the complete response.

Text appears progressively in real time, which significantly improves perceived latency.

Basic usage

Add stream: true to any Chat Completions request:

TypeScript

1	import OpenAI from 'openai';
2
3	const client = new OpenAI({
4	apiKey: process.env.ALLTOKEN_API_KEY,
5	baseURL: 'https://api.alltoken.ai/v1',
6	});
7
8	const stream = await client.chat.completions.create({
9	model: 'claude-sonnet-4',
10	messages: [{ role: 'user', content: 'Explain quantum computing' }],
11	stream: true,
12	});
13
14	for await (const chunk of stream) {
15	const content = chunk.choices[0]?.delta?.content;
16	if (content) process.stdout.write(content);
17	}

SSE response format

Each SSE event is a JSON object prefixed with data: :

Response stream

1	data: {"choices":[{"delta":{"content":"Hello"},"index":0}]}
2	data: {"choices":[{"delta":{"content":" world"},"index":0}]}
3	data: [DONE]

The stream ends with data: [DONE]. After that, a cost comment may follow:

Cost info

1	: {"cost":"0.0012","input_price":"0.0003","output_price":"0.0009","prompt_tokens":15,"completion_tokens":42}

Thinking mode (extended reasoning)

Some models support extended reasoning. The model streams its reasoning before the final answer:

TypeScript

1	const stream = await client.chat.completions.create({
2	model: 'deepseek-reasoner',
3	messages: [{ role: 'user', content: 'Solve this step by step: 15! / 13!' }],
4	stream: true,
5	});
6
7	for await (const chunk of stream) {
8	// Reasoning content (thinking process)
9	const thinking = chunk.choices[0]?.delta?.reasoning_content;
10	if (thinking) process.stderr.write(thinking);
11
12	// Final answer content
13	const content = chunk.choices[0]?.delta?.content;
14	if (content) process.stdout.write(content);
15	}

reasoning_content carries the model's step-by-step reasoning; content carries the final answer.

Error handling

Errors during streaming are delivered as SSE events with an error field:

Error response

1	data: {"error":{"message":"Rate limit exceeded","type":"rate_limit_error","code":429}}

Common streaming errors:

401 — Invalid or expired API key. AllToken attempts one automatic token refresh.
429 — Rate limit exceeded. Back off and retry.
500 — Upstream provider error. AllToken retries on an alternative provider automatically.

Cancelling requests

Use AbortController to cancel an in-flight streaming request:

TypeScript

1	const controller = new AbortController();
2
3	const stream = await client.chat.completions.create(
4	{
5	model: 'deepseek-chat',
6	messages: [{ role: 'user', content: 'Write a long essay' }],
7	stream: true,
8	},
9	{ signal: controller.signal }
10	);
11
12	// Cancel after 5 seconds
13	setTimeout(() => controller.abort(), 5000);