Guides · 多模态

多模态

发送图片和文本给支持视觉的模型。

概述

多模态模型可以在单次请求中同时处理文本和图片。这使得图片分析、文档理解、图表解读和视觉问答等场景成为可能。

支持图片输入的模型包括 qwen3.6-plus、Claude Sonnet/Opus 和 Gemini。

在 messages 数组中使用 image_url 内容类型传递图片：

TypeScript

1	const completion = await client.chat.completions.create({
2	model: 'qwen3.6-plus',
3	messages: [
4	{
5	role: 'user',
6	content: [
7	{ type: 'text', text: '这张图片里有什么？' },
8	{
9	type: 'image_url',
10	image_url: { url: 'https://example.com/photo.jpg' },
11	},
12	],
13	},
14	],
15	});

也可以发送 base64 编码的图片。适用于图片不可公开访问的场景：

TypeScript

1	const base64Image = fs.readFileSync('photo.jpg', 'base64');
2
3	const completion = await client.chat.completions.create({
4	model: 'claude-sonnet-4',
5	messages: [
6	{
7	role: 'user',
8	content: [
9	{ type: 'text', text: '描述这张图片' },
10	{
11	type: 'image_url',
12	image_url: {
13	url: `data:image/jpeg;base64,${base64Image}`,
14	},
15	},
16	],
17	},
18	],
19	});

支持格式：JPEG、PNG、GIF、WebP。最大尺寸因模型而异（通常为 20MB）。

检查模型的 input_modalities 字段来确认是否支持图片输入。带有 image 输入模态的模型可接受多模态请求。

在模型页面使用"输入模态"筛选器找到多模态模型。