Guides · 스트리밍 응답

스트리밍 응답

Server-Sent Events를 통한 Token 단위 실시간 응답.

개요

AllToken은 Server-Sent Events (SSE)를 통한 스트리밍 응답을 지원합니다. stream: true로 설정하면 API는 완전한 응답을 기다리지 않고 Token이 생성되는 즉시 점진적으로 반환합니다.

이를 통해 실시간 UI가 가능해집니다 — 텍스트가 한 글자씩 표시되어 사용자가 체감하는 지연시간이 크게 개선됩니다.

Chat Completions 요청에 stream: true를 추가하세요:

TypeScript

1	import OpenAI from 'openai';
2
3	const client = new OpenAI({
4	apiKey: process.env.ALLTOKEN_API_KEY,
5	baseURL: 'https://api.alltoken.ai/v1',
6	});
7
8	const stream = await client.chat.completions.create({
9	model: 'claude-sonnet-4',
10	messages: [{ role: 'user', content: '양자 컴퓨팅을 설명해 주세요' }],
11	stream: true,
12	});
13
14	for await (const chunk of stream) {
15	const content = chunk.choices[0]?.delta?.content;
16	if (content) process.stdout.write(content);
17	}

각 SSE 이벤트는 data: 접두사가 붙은 JSON 객체입니다:

응답 스트림

1	data: {"choices":[{"delta":{"content":"안녕"},"index":0}]}
2	data: {"choices":[{"delta":{"content":"하세요"},"index":0}]}
3	data: [DONE]

스트림은 data: [DONE]으로 끝납니다. 이후에 비용 주석이 따라올 수 있습니다:

비용 정보

1	: {"cost":"0.0012","input_price":"0.0003","output_price":"0.0009","prompt_tokens":15,"completion_tokens":42}

일부 모델은 확장 추론을 지원합니다. 활성화하면 모델이 최종 답변 전에 내부 추론 과정을 먼저 출력합니다:

TypeScript

1	const stream = await client.chat.completions.create({
2	model: 'deepseek-reasoner',
3	messages: [{ role: 'user', content: '루트 2가 무리수임을 증명하세요' }],
4	stream: true,
5	});
6
7	for await (const chunk of stream) {
8	// 추론 과정 (사고 체인)
9	const thinking = chunk.choices[0]?.delta?.reasoning_content;
10	if (thinking) process.stderr.write(thinking);
11
12	// 최종 답변
13	const content = chunk.choices[0]?.delta?.content;
14	if (content) process.stdout.write(content);
15	}

reasoning_content 필드에는 모델의 단계별 추론 과정이, content에는 최종 답변이 포함됩니다.

스트리밍 중 오류는 SSE 이벤트의 error 필드를 통해 전달됩니다:

오류 응답

1	data: {"error":{"message":"Rate limit exceeded","type":"rate_limit_error","code":429}}

일반적인 스트리밍 오류:

AbortController를 사용하여 진행 중인 스트리밍 요청을 취소하세요:

TypeScript

1	const controller = new AbortController();
2
3	const stream = await client.chat.completions.create(
4	{
5	model: 'minimax-m2.7',
6	messages: [{ role: 'user', content: '긴 글을 써 주세요' }],
7	stream: true,
8	},
9	{ signal: controller.signal }
10	);
11
12	// 5초 후 취소
13	setTimeout(() => controller.abort(), 5000);