OpenAI Realtime API¶
Official Documentation
📝 Overview¶
Introduction¶
OpenAI Realtime API provides two connection methods:
-
WebRTC - For real-time audio/video interaction in browsers and mobile clients
-
WebSocket - For server-to-server application integration
Use Cases¶
- Real-time voice conversations
- Audio/video conferencing
- Real-time translation
- Speech transcription
- Real-time code generation
- Server-side real-time integration
Key Features¶
- Bidirectional audio streaming
- Mixed text and audio conversations
- Function calling support
- Automatic Voice Activity Detection (VAD)
- Audio transcription capabilities
- WebSocket server-side integration
🔐 Authentication & Security¶
Authentication Methods¶
- Standard API Key (server-side only)
- Ephemeral Token (client-side use)
Ephemeral Token¶
- Validity: 1 minute
- Usage limit: Single connection
- Generation: Created via server-side API
POST https://your-newapi-server-address/v1/realtime/sessions
Content-Type: application/json
Authorization: Bearer $API_KEY
{
"model": "gpt-4o-realtime-preview-2024-12-17",
"voice": "verse"
}
Security Recommendations¶
- Never expose standard API keys on the client side
- Use HTTPS/WSS for communication
- Implement appropriate access controls
- Monitor for unusual activity
🔌 Connection Establishment¶
WebRTC Connection¶
- URL:
https://your-newapi-server-address/v1/realtime - Query parameters:
model - Headers:
Authorization: Bearer EPHEMERAL_KEYContent-Type: application/sdp
WebSocket Connection¶
- URL:
wss://your-newapi-server-address/v1/realtime - Query parameters:
model - Headers:
Authorization: Bearer YOUR_API_KEYOpenAI-Beta: realtime=v1
Connection Flow¶
sequenceDiagram
participant Client
participant Server
participant OpenAI
alt WebRTC Connection
Client->>Server: Request ephemeral token
Server->>OpenAI: Create session
OpenAI-->>Server: Return ephemeral token
Server-->>Client: Return ephemeral token
Client->>OpenAI: Create WebRTC offer
OpenAI-->>Client: Return answer
Note over Client,OpenAI: Establish WebRTC connection
Client->>OpenAI: Create data channel
OpenAI-->>Client: Confirm data channel
else WebSocket Connection
Server->>OpenAI: Establish WebSocket connection
OpenAI-->>Server: Confirm connection
Note over Server,OpenAI: Begin real-time conversation
end
Data Channel¶
- Name:
oai-events - Purpose: Event transmission
- Format: JSON
Audio Stream¶
- Input:
addTrack() - Output:
ontrackevent
💬 Conversation Interaction¶
Conversation Modes¶
- Text-only conversations
- Voice conversations
- Mixed conversations
Session Management¶
- Create session
- Update session
- End session
- Session configuration
Event Types¶
- Text events
- Audio events
- Function calls
- Status updates
- Error events
⚙️ Configuration Options¶
Audio Configuration¶
- Input formats
pcm16g711_ulawg711_alaw- Output formats
pcm16g711_ulawg711_alaw- Voice types
alloyechoshimmer
Model Configuration¶
- Temperature
- Maximum output length
- System prompt
- Tool configuration
VAD Configuration¶
- Threshold
- Silence duration
- Prefix padding
💡 Request Examples¶
WebSocket Connection ✅¶
Node.js (ws module)¶
import WebSocket from "ws";
const url = "wss://your-newapi-server-address/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
headers: {
"Authorization": "Bearer " + process.env.API_KEY,
"OpenAI-Beta": "realtime=v1",
},
});
ws.on("open", function open() {
console.log("Connected to server.");
});
ws.on("message", function incoming(message) {
console.log(JSON.parse(message.toString()));
});
Python (websocket-client)¶
# Requires websocket-client library:
# pip install websocket-client
import os
import json
import websocket
API_KEY = os.environ.get("API_KEY")
url = "wss://your-newapi-server-address/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
headers = [
"Authorization: Bearer " + API_KEY,
"OpenAI-Beta: realtime=v1"
]
def on_open(ws):
print("Connected to server.");
def on_message(ws, message):
data = json.loads(message)
print("Received event:", json.dumps(data, indent=2))
ws = websocket.WebSocketApp(
url,
header=headers,
on_open=on_open,
on_message=on_message,
)
ws.run_forever()
Browser (Standard WebSocket)¶
/*
Note: In browser and other client environments, we recommend using WebRTC.
But in Deno and Cloudflare Workers and other browser-like environments,
you can also use the standard WebSocket interface.
*/
const ws = new WebSocket(
"wss://your-newapi-server-address/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17",
[
"realtime",
// Authentication
"openai-insecure-api-key." + API_KEY,
// Optional
"openai-organization." + OPENAI_ORG_ID,
"openai-project." + OPENAI_PROJECT_ID,
// Beta protocol, required
"openai-beta.realtime-v1"
]
);
ws.on("open", function open() {
console.log("Connected to server.");
});
ws.on("message", function incoming(message) {
console.log(message.data);
});
Message Send/Receive Example¶
Node.js/Browser¶
// Receive server events
ws.on("message", function incoming(message) {
// Need to parse message data from JSON
const serverEvent = JSON.parse(message.data)
console.log(serverEvent);
});
// Send events, create JSON data structure conforming to client event format
const event = {
type: "response.create",
response: {
modalities: ["audio", "text"],
instructions: "Give me a haiku about code.",
}
};
ws.send(JSON.stringify(event));
Python¶
# Send client events, serialize dictionary to JSON
def on_open(ws):
print("Connected to server.");
event = {
"type": "response.create",
"response": {
"modalities": ["text"],
"instructions": "Please assist the user."
}
}
ws.send(json.dumps(event))
# Receive messages need to parse message payload from JSON
def on_message(ws, message):
data = json.loads(message)
print("Received event:", json.dumps(data, indent=2))
WebSocket Python Audio Example¶
Example Documentation¶
This is a Python example of OpenAI Realtime WebSocket voice conversation, supporting real-time voice input and output.
Features¶
- 🎤 Real-time Voice Recording: Automatically detects voice input and sends it to the server
- 🔊 Real-time Audio Playback: Plays AI's voice responses
- 📝 Text Display: Simultaneously displays AI's text responses
- 🎯 Automatic Voice Detection: Uses server-side VAD (Voice Activity Detection)
- 🔄 Bidirectional Communication: Supports continuous conversation
Requirements¶
- Python 3.7+
- Microphone and speakers
- Stable network connection
Install Dependencies¶
System Dependencies: On Linux systems, you may need to install additional audio libraries:
# Ubuntu/Debian
sudo apt-get install portaudio19-dev python3-pyaudio
# CentOS/RHEL
sudo yum install portaudio-devel
Configuration¶
In the openai_realtime_client.py file, ensure the following configuration is correct:
WEBSOCKET_URL = "wss://new-api.weroam.xyz/v1/realtime"
API_KEY = "sk-QMA3eCob2EmDI2graXo4onUupApVnrk8wPXemJ7lSfK4QPa0"
MODEL = "gpt-4o-realtime-preview-2024-12-17"
Usage¶
-
Run the program:
-
Start conversation:
- The program will automatically start recording after launch
- Speak into the microphone
-
AI will respond to your voice in real-time
-
Stop the program:
- Press
Ctrl+Cto stop the program
Technical Details¶
Audio Configuration: - Sample Rate: 24kHz (OpenAI Realtime API requirement) - Format: PCM16 - Channels: Mono - Encoding: Base64
WebSocket Message Types:
- session.update: Session configuration
- input_audio_buffer.append: Send audio data
- input_audio_buffer.commit: Commit audio buffer
- response.audio.delta: Receive audio response
- response.text.delta: Receive text response
Voice Activity Detection: Uses server-side VAD configuration: - Threshold: 0.5 - Prefix padding: 300ms - Silence duration: 500ms
Troubleshooting¶
Common Issues:
-
Audio Device Issues:
-
Permission Issues:
- Ensure the program has microphone access permissions
-
Linux: Check ALSA/PulseAudio configuration
-
Network Connection Issues:
- Check if the WebSocket URL is correct
- Ensure the API key is valid
- Check firewall settings
Debug Mode:
Enable verbose logging:
Code Structure¶
├── openai_realtime_client.py # Main program file
├── requirements.txt # Python dependencies
└── README.md # Documentation
Main Classes and Methods:
OpenAIRealtimeClient: Main client classconnect(): Connect to WebSocketstart_audio_streams(): Start audio streamsstart_recording(): Start recordinghandle_response(): Handle responsesstart_conversation(): Start conversation
Notes¶
- Audio Quality: Ensure use in a quiet environment for best results
- Network Latency: Real-time conversation is sensitive to network latency
- Resource Usage: Long-running sessions may consume significant CPU and memory
- API Limits: Be aware of OpenAI API usage limits and costs
License¶
This project is for learning and testing purposes only. Please comply with OpenAI's terms of use.
Example Code¶
#!/usr/bin/env python3
"""
OpenAI Realtime WebSocket Audio Example
Supports real-time voice conversation, including audio recording, sending, and playback
"""
import asyncio
import json
import base64
import websockets
import pyaudio
import wave
import threading
import time
from typing import Optional
import logging
# Configure logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.StreamHandler(),
logging.FileHandler('websocket_debug.log', encoding='utf-8')
]
)
logger = logging.getLogger(__name__)
class OpenAIRealtimeClient:
def __init__(self,
websocket_url: str,
api_key: str,
model: str = "gpt-4o-realtime-preview-2024-12-17"):
self.websocket_url = websocket_url
self.api_key = api_key
self.model = model
self.websocket = None
self.is_recording = False
self.is_connected = False
# Audio configuration
self.audio_format = pyaudio.paInt16
self.channels = 1
self.rate = 24000 # OpenAI Realtime API required sample rate
self.chunk = 1024
self.audio = pyaudio.PyAudio()
# Audio streams
self.input_stream = None
self.output_stream = None
async def connect(self):
"""Connect to WebSocket server"""
headers = {
"Authorization": f"Bearer {self.api_key}",
"OpenAI-Beta": "realtime=v1"
}
logger.info("=" * 80)
logger.info("🚀 Starting WebSocket connection")
logger.info("=" * 80)
logger.info(f"Connection URL: {self.websocket_url}")
logger.info(f"API Key: {self.api_key[:10]}...")
logger.info(f"Headers: {json.dumps(headers, ensure_ascii=False, indent=2)}")
try:
self.websocket = await websockets.connect(
self.websocket_url,
additional_headers=headers
)
self.is_connected = True
logger.info("✅ WebSocket connection successful")
# Send session configuration
await self.send_session_config()
except Exception as e:
logger.error(f"❌ WebSocket connection failed: {e}")
logger.error(f"Error type: {type(e).__name__}")
raise
async def send_session_config(self):
"""Send session configuration"""
config = {
"type": "session.update",
"session": {
"modalities": ["text", "audio"],
"instructions": "You are a helpful AI assistant that can engage in real-time voice conversations.",
"voice": "alloy",
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": {
"model": "whisper-1"
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 500
},
"tools": [],
"tool_choice": "auto",
"temperature": 0.8,
"max_response_output_tokens": 4096
}
}
config_json = json.dumps(config, ensure_ascii=False, indent=2)
logger.info("=" * 60)
logger.info("📤 Sending session configuration:")
logger.info(f"Message type: {config['type']}")
logger.info(f"Configuration content:\n{config_json}")
logger.info("=" * 60)
await self.websocket.send(json.dumps(config))
logger.info("✅ Session configuration sent")
def start_audio_streams(self):
"""Start audio input and output streams"""
try:
# Input stream (microphone)
self.input_stream = self.audio.open(
format=self.audio_format,
channels=self.channels,
rate=self.rate,
input=True,
frames_per_buffer=self.chunk
)
# Output stream (speakers)
self.output_stream = self.audio.open(
format=self.audio_format,
channels=self.channels,
rate=self.rate,
output=True,
frames_per_buffer=self.chunk
)
logger.info("Audio streams started")
except Exception as e:
logger.error(f"Failed to start audio streams: {e}")
raise
def stop_audio_streams(self):
"""Stop audio streams"""
if self.input_stream:
self.input_stream.stop_stream()
self.input_stream.close()
self.input_stream = None
if self.output_stream:
self.output_stream.stop_stream()
self.output_stream.close()
self.output_stream = None
logger.info("Audio streams stopped")
async def start_recording(self):
"""Start recording and send audio data"""
self.is_recording = True
logger.info("Starting recording...")
try:
while self.is_recording and self.is_connected:
# Read audio data
audio_data = self.input_stream.read(self.chunk, exception_on_overflow=False)
# Encode audio data as base64
audio_base64 = base64.b64encode(audio_data).decode('utf-8')
# Send audio data
message = {
"type": "input_audio_buffer.append",
"audio": audio_base64
}
# Log audio data sending (every 10 times to avoid excessive logging)
if hasattr(self, '_audio_count'):
self._audio_count += 1
else:
self._audio_count = 1
if self._audio_count % 10 == 0: # Log every 10 times
logger.debug(f"🎤 Sending audio data #{self._audio_count}: length={len(audio_base64)} characters")
await self.websocket.send(json.dumps(message))
# Brief delay to avoid excessive sending
await asyncio.sleep(0.01)
except Exception as e:
logger.error(f"Error during recording: {e}")
finally:
logger.info("Recording stopped")
async def stop_recording(self):
"""Stop recording"""
self.is_recording = False
# Send recording end signal
if self.websocket and self.is_connected:
message = {
"type": "input_audio_buffer.commit"
}
logger.info("=" * 60)
logger.info("📤 Sending recording end signal:")
logger.info(f"Message type: {message['type']}")
logger.info("=" * 60)
await self.websocket.send(json.dumps(message))
logger.info("✅ Recording end signal sent")
async def handle_response(self):
"""Handle WebSocket responses"""
try:
async for message in self.websocket:
data = json.loads(message)
message_type = data.get("type", "unknown")
# Log all received messages in detail
logger.info("=" * 60)
logger.info("📥 Received WebSocket message:")
logger.info(f"Message type: {message_type}")
# Handle different message types
if message_type == "response.audio.delta":
# Handle audio response
audio_data = base64.b64decode(data.get("delta", ""))
logger.info(f"🎵 Audio data: length={len(audio_data)} bytes")
if audio_data and self.output_stream:
self.output_stream.write(audio_data)
logger.info("✅ Audio data played")
elif message_type == "response.text.delta":
# Handle text response
text = data.get("delta", "")
logger.info(f"💬 Text delta: '{text}'")
if text:
print(f"AI: {text}", end="", flush=True)
elif message_type == "response.text.done":
# Text response complete
logger.info("✅ Text response complete")
print("\n")
elif message_type == "response.audio.done":
# Audio response complete
logger.info("✅ Audio response complete")
elif message_type == "error":
# Handle errors
error_info = data.get('error', {})
logger.error("❌ Server error:")
logger.error(f"Error details: {json.dumps(error_info, ensure_ascii=False, indent=2)}")
elif message_type == "session.created":
# Session created successfully
logger.info("✅ Session created")
elif message_type == "session.updated":
# Session updated successfully
logger.info("✅ Session updated")
elif message_type == "conversation.item.created":
# Conversation item created
logger.info("📝 Conversation item created")
elif message_type == "conversation.item.input_audio_buffer.speech_started":
# Speech started
logger.info("🎤 Speech start detected")
elif message_type == "conversation.item.input_audio_buffer.speech_stopped":
# Speech stopped
logger.info("🔇 Speech stop detected")
elif message_type == "conversation.item.input_audio_buffer.committed":
# Audio buffer committed
logger.info("📤 Audio buffer committed")
else:
# Other unknown message types
logger.info(f"❓ Unknown message type: {message_type}")
# Log complete message content (except audio data, as it's too long)
if message_type != "response.audio.delta":
logger.info(f"Complete message content:\n{json.dumps(data, ensure_ascii=False, indent=2)}")
logger.info("=" * 60)
except websockets.exceptions.ConnectionClosed:
logger.info("WebSocket connection closed")
self.is_connected = False
except Exception as e:
logger.error(f"Error handling response: {e}")
self.is_connected = False
async def start_conversation(self):
"""Start conversation"""
try:
# Start audio streams
self.start_audio_streams()
# Create tasks
response_task = asyncio.create_task(self.handle_response())
recording_task = asyncio.create_task(self.start_recording())
logger.info("Conversation started, press Ctrl+C to stop")
# Wait for tasks to complete
await asyncio.gather(response_task, recording_task)
except KeyboardInterrupt:
logger.info("Stop signal received")
except Exception as e:
logger.error(f"Error during conversation: {e}")
finally:
await self.cleanup()
async def cleanup(self):
"""Clean up resources"""
self.is_recording = False
self.is_connected = False
# Stop audio streams
self.stop_audio_streams()
# Close WebSocket connection
if self.websocket:
await self.websocket.close()
logger.info("WebSocket connection closed")
# Terminate PyAudio
self.audio.terminate()
logger.info("Resource cleanup complete")
async def run(self):
"""Run client"""
try:
await self.connect()
await self.start_conversation()
except Exception as e:
logger.error(f"Error running client: {e}")
finally:
await self.cleanup()
async def main():
"""Main function"""
# Configuration parameters
WEBSOCKET_URL = "wss://new-api.weroam.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
API_KEY = "sk-EpnduEXFxjAt0AF55W08WBmzqZHlv9f4tmCDWd9TcJqBwVjV"
MODEL = "gpt-4o-realtime-preview-2024-12-17"
# Create client
client = OpenAIRealtimeClient(
websocket_url=WEBSOCKET_URL,
api_key=API_KEY,
model=MODEL
)
# Run client
await client.run()
if __name__ == "__main__":
print("OpenAI Realtime WebSocket Audio Example")
print("=" * 50)
print("Features:")
print("- Real-time voice conversation")
print("- Automatic speech recognition")
print("- Text and audio responses")
print("- Press Ctrl+C to stop")
print("=" * 50)
try:
asyncio.run(main())
except KeyboardInterrupt:
print("\nProgram stopped")
except Exception as e:
print(f"Program error: {e}")
⚠️ Error Handling¶
Common Errors¶
- Connection errors
- Network issues
- Authentication failures
- Configuration errors
- Audio errors
- Device permissions
- Unsupported formats
- Codec issues
- Session errors
- Token expiration
- Session timeout
- Concurrency limits
Error Recovery¶
- Automatic reconnection
- Session recovery
- Error retry
- Graceful degradation
📝 Event Reference¶
Common Request Headers¶
All events need to include the following request headers:
| Header | Type | Description | Example Value |
|---|---|---|---|
| Authorization | String | Authentication token | Bearer $API_KEY |
| OpenAI-Beta | String | API version | realtime=v1 |
Client Events¶
session.update¶
Update the default configuration for the session.
| Parameter | Type | Required | Description | Example Value/Optional Values |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_123 |
| type | String | No | Event type | session.update |
| modalities | String array | No | Modality types the model can respond with | ["text", "audio"] |
| instructions | String | No | System instructions prepended to model calls | "Your knowledge cutoff is 2023-10..." |
| voice | String | No | Voice type used by the model | alloy, echo, shimmer |
| input_audio_format | String | No | Input audio format | pcm16, g711_ulaw, g711_alaw |
| output_audio_format | String | No | Output audio format | pcm16, g711_ulaw, g711_alaw |
| input_audio_transcription.model | String | No | Model used for transcription | whisper-1 |
| turn_detection.type | String | No | Voice detection type | server_vad |
| turn_detection.threshold | Number | No | VAD activation threshold (0.0-1.0) | 0.8 |
| turn_detection.prefix_padding_ms | Integer | No | Audio duration included before speech starts | 500 |
| turn_detection.silence_duration_ms | Integer | No | Silence duration to detect speech stop | 1000 |
| tools | Array | No | List of tools available to the model | [] |
| tool_choice | String | No | How the model chooses tools | auto/none/required |
| temperature | Number | No | Model sampling temperature | 0.8 |
| max_output_tokens | String/Integer | No | Maximum tokens per response | "inf"/4096 |
input_audio_buffer.append¶
Append audio data to the input audio buffer.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_456 |
| type | String | No | Event type | input_audio_buffer.append |
| audio | String | No | Base64-encoded audio data | Base64EncodedAudioData |
input_audio_buffer.commit¶
Commit the audio data in the buffer as a user message.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_789 |
| type | String | No | Event type | input_audio_buffer.commit |
input_audio_buffer.clear¶
Clear all audio data from the input audio buffer.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_012 |
| type | String | No | Event type | input_audio_buffer.clear |
conversation.item.create¶
Add a new conversation item to the conversation.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_345 |
| type | String | No | Event type | conversation.item.create |
| previous_item_id | String | No | New item will be inserted after this ID | null |
| item.id | String | No | Unique identifier for the conversation item | msg_001 |
| item.type | String | No | Type of conversation item | message/function_call/function_call_output |
| item.status | String | No | Status of conversation item | completed/in_progress/incomplete |
| item.role | String | No | Role of message sender | user/assistant/system |
| item.content | Array | No | Message content | [text/audio/transcript] |
| item.call_id | String | No | ID of function call | call_001 |
| item.name | String | No | Name of called function | function_name |
| item.arguments | String | No | Arguments for function call | {"param": "value"} |
| item.output | String | No | Output result of function call | {"result": "value"} |
conversation.item.truncate¶
Truncate audio content in assistant messages.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_678 |
| type | String | No | Event type | conversation.item.truncate |
| item_id | String | No | ID of assistant message item to truncate | msg_002 |
| content_index | Integer | No | Index of content part to truncate | 0 |
| audio_end_ms | Integer | No | End time point for audio truncation | 1500 |
conversation.item.delete¶
Delete the specified conversation item from conversation history.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_901 |
| type | String | No | Event type | conversation.item.delete |
| item_id | String | No | ID of conversation item to delete | msg_003 |
response.create¶
Trigger response generation.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_234 |
| type | String | No | Event type | response.create |
| response.modalities | String array | No | Modality types for response | ["text", "audio"] |
| response.instructions | String | No | Instructions for the model | "Please assist the user." |
| response.voice | String | No | Voice type used by the model | alloy/echo/shimmer |
| response.output_audio_format | String | No | Output audio format | pcm16 |
| response.tools | Array | No | List of tools available to the model | ["type", "name", "description"] |
| response.tool_choice | String | No | How the model chooses tools | auto |
| response.temperature | Number | No | Sampling temperature | 0.7 |
| response.max_output_tokens | Integer/String | No | Maximum output tokens | 150/"inf" |
response.cancel¶
Cancel ongoing response generation.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Client-generated event identifier | event_567 |
| type | String | No | Event type | response.cancel |
Server Events¶
error¶
Event returned when an error occurs.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String array | No | Unique identifier for server event | ["event_890"] |
| type | String | No | Event type | error |
| error.type | String | No | Error type | invalid_request_error/server_error |
| error.code | String | No | Error code | invalid_event |
| error.message | String | No | Human-readable error message | "The 'type' field is missing." |
| error.param | String | No | Parameter related to error | null |
| error.event_id | String | No | ID of related event | event_567 |
conversation.item.input_audio_transcription.completed¶
Returned when input audio transcription is enabled and transcription succeeds.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_2122 |
| type | String | No | Event type | conversation.item.input_audio_transcription.completed |
| item_id | String | No | ID of user message item | msg_003 |
| content_index | Integer | No | Index of content part containing audio | 0 |
| transcript | String | No | Transcribed text content | "Hello, how are you?" |
conversation.item.input_audio_transcription.failed¶
Returned when input audio transcription is configured but transcription request for user message fails.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_2324 |
| type | String array | No | Event type | ["conversation.item.input_audio_transcription.failed"] |
| item_id | String | No | ID of user message item | msg_003 |
| content_index | Integer | No | Index of content part containing audio | 0 |
| error.type | String | No | Error type | transcription_error |
| error.code | String | No | Error code | audio_unintelligible |
| error.message | String | No | Human-readable error message | "The audio could not be transcribed." |
| error.param | String | No | Parameter related to error | null |
conversation.item.truncated¶
Returned when client truncates previous assistant audio message item.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_2526 |
| type | String | No | Event type | conversation.item.truncated |
| item_id | String | No | ID of truncated assistant message item | msg_004 |
| content_index | Integer | No | Index of truncated content part | 0 |
| audio_end_ms | Integer | No | Time point when audio was truncated (milliseconds) | 1500 |
conversation.item.deleted¶
Returned when an item in the conversation is deleted.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_2728 |
| type | String | No | Event type | conversation.item.deleted |
| item_id | String | No | ID of deleted conversation item | msg_005 |
input_audio_buffer.committed¶
Returned when audio buffer data is committed.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_1121 |
| type | String | No | Event type | input_audio_buffer.committed |
| previous_item_id | String | No | New conversation item will be inserted after this ID | msg_001 |
| item_id | String | No | ID of user message item to be created | msg_002 |
input_audio_buffer.cleared¶
Returned when client clears input audio buffer.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_1314 |
| type | String | No | Event type | input_audio_buffer.cleared |
input_audio_buffer.speech_started¶
In server voice detection mode, returned when voice input is detected.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_1516 |
| type | String | No | Event type | input_audio_buffer.speech_started |
| audio_start_ms | Integer | No | Milliseconds from session start to voice detection | 1000 |
| item_id | String | No | ID of user message item to be created when voice stops | msg_003 |
input_audio_buffer.speech_stopped¶
In server voice detection mode, returned when voice input stops.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_1718 |
| type | String | No | Event type | input_audio_buffer.speech_stopped |
| audio_start_ms | Integer | No | Milliseconds from session start to voice stop detection | 2000 |
| item_id | String | No | ID of user message item to be created | msg_003 |
response.created¶
Returned when a new response is created.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_2930 |
| type | String | No | Event type | response.created |
| response.id | String | No | Unique identifier for response | resp_001 |
| response.object | String | No | Object type | realtime.response |
| response.status | String | No | Status of response | in_progress |
| response.status_details | Object | No | Additional details about status | null |
| response.output | String array | No | List of output items generated by response | ["[]"] |
| response.usage | Object | No | Usage statistics for response | null |
response.done¶
Returned when response streaming is complete.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_3132 |
| type | String | No | Event type | response.done |
| response.id | String | No | Unique identifier for response | resp_001 |
| response.object | String | No | Object type | realtime.response |
| response.status | String | No | Final status of response | completed/cancelled/failed/incomplete |
| response.status_details | Object | No | Additional details about status | null |
| response.output | String array | No | List of output items generated by response | ["[...]"] |
| response.usage.total_tokens | Integer | No | Total tokens | 50 |
| response.usage.input_tokens | Integer | No | Input tokens | 20 |
| response.usage.output_tokens | Integer | No | Output tokens | 30 |
response.output_item.added¶
Returned when a new output item is created during response generation.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_3334 |
| type | String | No | Event type | response.output_item.added |
| response_id | String | No | ID of response the output item belongs to | resp_001 |
| output_index | String | No | Index of output item in response | 0 |
| item.id | String | No | Unique identifier for output item | msg_007 |
| item.object | String | No | Object type | realtime.item |
| item.type | String | No | Type of output item | message/function_call/function_call_output |
| item.status | String | No | Status of output item | in_progress/completed |
| item.role | String | No | Role associated with output item | assistant |
| item.content | Array | No | Content of output item | ["type", "text", "audio", "transcript"] |
response.output_item.done¶
Returned when output item streaming is complete.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_3536 |
| type | String | No | Event type | response.output_item.done |
| response_id | String | No | ID of response the output item belongs to | resp_001 |
| output_index | String | No | Index of output item in response | 0 |
| item.id | String | No | Unique identifier for output item | msg_007 |
| item.object | String | No | Object type | realtime.item |
| item.type | String | No | Type of output item | message/function_call/function_call_output |
| item.status | String | No | Final status of output item | completed/incomplete |
| item.role | String | No | Role associated with output item | assistant |
| item.content | Array | No | Content of output item | ["type", "text", "audio", "transcript"] |
response.content_part.added¶
Returned when a new content part is added to assistant message item during response generation.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_3738 |
| type | String | No | Event type | response.content_part.added |
| response_id | String | No | ID of response | resp_001 |
| item_id | String | No | ID of message item to add content part to | msg_007 |
| output_index | Integer | No | Index of output item in response | 0 |
| content_index | Integer | No | Index of content part in message item content array | 0 |
| part.type | String | No | Content type | text/audio |
| part.text | String | No | Text content | "Hello" |
| part.audio | String | No | Base64-encoded audio data | "base64_encoded_audio_data" |
| part.transcript | String | No | Transcribed text of audio | "Hello" |
response.content_part.done¶
Returned when content part in assistant message item streaming is complete.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_3940 |
| type | String | No | Event type | response.content_part.done |
| response_id | String | No | ID of response | resp_001 |
| item_id | String | No | ID of message item to add content part to | msg_007 |
| output_index | Integer | No | Index of output item in response | 0 |
| content_index | Integer | No | Index of content part in message item content array | 0 |
| part.type | String | No | Content type | text/audio |
| part.text | String | No | Text content | "Hello" |
| part.audio | String | No | Base64-encoded audio data | "base64_encoded_audio_data" |
| part.transcript | String | No | Transcribed text of audio | "Hello" |
response.text.delta¶
Returned when text value of "text" type content part is updated.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_4142 |
| type | String | No | Event type | response.text.delta |
| response_id | String | No | ID of response | resp_001 |
| item_id | String | No | ID of message item | msg_007 |
| output_index | Integer | No | Index of output item in response | 0 |
| content_index | Integer | No | Index of content part in message item content array | 0 |
| delta | String | No | Text delta update content | "Sure, I can h" |
response.text.done¶
Returned when "text" type content part text streaming is complete.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_4344 |
| type | String | No | Event type | response.text.done |
| response_id | String | No | ID of response | resp_001 |
| item_id | String | No | ID of message item | msg_007 |
| output_index | Integer | No | Index of output item in response | 0 |
| content_index | Integer | No | Index of content part in message item content array | 0 |
| delta | String | No | Final complete text content | "Sure, I can help with that." |
response.audio_transcript.delta¶
Returned when transcription content of model-generated audio output is updated.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_4546 |
| type | String | No | Event type | response.audio_transcript.delta |
| response_id | String | No | ID of response | resp_001 |
| item_id | String | No | ID of message item | msg_008 |
| output_index | Integer | No | Index of output item in response | 0 |
| content_index | Integer | No | Index of content part in message item content array | 0 |
| delta | String | No | Transcription text delta update content | "Hello, how can I a" |
response.audio_transcript.done¶
Returned when transcription of model-generated audio output streaming is complete.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_4748 |
| type | String | No | Event type | response.audio_transcript.done |
| response_id | String | No | ID of response | resp_001 |
| item_id | String | No | ID of message item | msg_008 |
| output_index | Integer | No | Index of output item in response | 0 |
| content_index | Integer | No | Index of content part in message item content array | 0 |
| transcript | String | No | Final complete transcribed text of audio | "Hello, how can I assist you today?" |
response.audio.delta¶
Returned when model-generated audio content is updated.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_4950 |
| type | String | No | Event type | response.audio.delta |
| response_id | String | No | ID of response | resp_001 |
| item_id | String | No | ID of message item | msg_008 |
| output_index | Integer | No | Index of output item in response | 0 |
| content_index | Integer | No | Index of content part in message item content array | 0 |
| delta | String | No | Base64-encoded audio data delta | "Base64EncodedAudioDelta" |
response.audio.done¶
Returned when model-generated audio is complete.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_5152 |
| type | String | No | Event type | response.audio.done |
| response_id | String | No | ID of response | resp_001 |
| item_id | String | No | ID of message item | msg_008 |
| output_index | Integer | No | Index of output item in response | 0 |
| content_index | Integer | No | Index of content part in message item content array | 0 |
Function Calling¶
response.function_call_arguments.delta¶
Returned when model-generated function call arguments are updated.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_5354 |
| type | String | No | Event type | response.function_call_arguments.delta |
| response_id | String | No | ID of response | resp_002 |
| item_id | String | No | ID of message item | fc_001 |
| output_index | Integer | No | Index of output item in response | 0 |
| call_id | String | No | ID of function call | call_001 |
| delta | String | No | JSON format function call arguments delta | "{\"location\": \"San\"" |
response.function_call_arguments.done¶
Returned when model-generated function call arguments streaming is complete.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_5556 |
| type | String | No | Event type | response.function_call_arguments.done |
| response_id | String | No | ID of response | resp_002 |
| item_id | String | No | ID of message item | fc_001 |
| output_index | Integer | No | Index of output item in response | 0 |
| call_id | String | No | ID of function call | call_001 |
| arguments | String | No | Final complete function call arguments (JSON format) | "{\"location\": \"San Francisco\"}" |
Other Status Updates¶
rate_limits.updated¶
Triggered after each "response.done" event to indicate updated rate limits.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_5758 |
| type | String | No | Event type | rate_limits.updated |
| rate_limits | Object array | No | List of rate limit information | [{"name": "requests_per_min", "limit": 60, "remaining": 45, "reset_seconds": 35}] |
conversation.created¶
Returned when conversation is created.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_9101 |
| type | String | No | Event type | conversation.created |
| conversation | Object | No | Conversation resource object | {"id": "conv_001", "object": "realtime.conversation"} |
conversation.item.created¶
Returned when conversation item is created.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_1920 |
| type | String | No | Event type | conversation.item.created |
| previous_item_id | String | No | ID of previous conversation item | msg_002 |
| item | Object | No | Conversation item object | {"id": "msg_003", "object": "realtime.item", "type": "message", "status": "completed", "role": "user", "content": [{"type": "text", "text": "Hello"}]} |
session.created¶
Returned when session is created.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_1234 |
| type | String | No | Event type | session.created |
| session | Object | No | Session object | {"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]} |
session.updated¶
Returned when session is updated.
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | No | Unique identifier for server event | event_5678 |
| type | String | No | Event type | session.updated |
| session | Object | No | Updated session object | {"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]} |
Rate Limit Event Parameter Table¶
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| name | String | Yes | Limit name | requests_per_min |
| limit | Integer | Yes | Limit value | 60 |
| remaining | Integer | Yes | Remaining available amount | 45 |
| reset_seconds | Integer | Yes | Reset time (seconds) | 35 |
Function Call Parameter Table¶
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| type | String | Yes | Function type | function |
| name | String | Yes | Function name | get_weather |
| description | String | No | Function description | Get the current weather |
| parameters | Object | Yes | Function parameter definition | {"type": "object", "properties": {...}} |
Audio Format Parameter Table¶
| Parameter | Type | Description | Optional Values |
|---|---|---|---|
| sample_rate | Integer | Sample rate | 8000, 16000, 24000, 44100, 48000 |
| channels | Integer | Number of channels | 1 (mono), 2 (stereo) |
| bits_per_sample | Integer | Bits per sample | 16 (pcm16), 8 (g711) |
| encoding | String | Encoding method | pcm16, g711_ulaw, g711_alaw |
Voice Detection Parameter Table¶
| Parameter | Type | Description | Default Value | Range |
|---|---|---|---|---|
| threshold | Float | VAD activation threshold | 0.5 | 0.0-1.0 |
| prefix_padding_ms | Integer | Voice prefix padding (milliseconds) | 500 | 0-5000 |
| silence_duration_ms | Integer | Silence detection duration (milliseconds) | 1000 | 100-10000 |
Tool Selection Parameter Table¶
| Parameter | Type | Description | Optional Values |
|---|---|---|---|
| tool_choice | String | Tool selection method | auto, none, required |
| tools | Array | Available tools list | [{type, name, description, parameters}] |
Model Configuration Parameter Table¶
| Parameter | Type | Description | Range/Optional Values | Default Value |
|---|---|---|---|---|
| temperature | Float | Sampling temperature | 0.0-2.0 | 1.0 |
| max_output_tokens | Integer/String | Maximum output length | 1-4096/"inf" | "inf" |
| modalities | String array | Response modalities | ["text", "audio"] | ["text"] |
| voice | String | Voice type | alloy, echo, shimmer | alloy |
Event Common Parameter Table¶
| Parameter | Type | Required | Description | Example Value |
|---|---|---|---|---|
| event_id | String | Yes | Unique identifier for event | event_123 |
| type | String | Yes | Event type | session.update |
| timestamp | Integer | No | Event timestamp (milliseconds) | 1677649363000 |
Session Status Parameter Table¶
| Parameter | Type | Description | Optional Values |
|---|---|---|---|
| status | String | Session status | active, ended, error |
| error | Object | Error information | {"type": "error_type", "message": "error message"} |
| metadata | Object | Session metadata | {"client_id": "web", "session_type": "chat"} |
Conversation Item Status Parameter Table¶
| Parameter | Type | Description | Optional Values |
|---|---|---|---|
| status | String | Conversation item status | completed, in_progress, incomplete |
| role | String | Sender role | user, assistant, system |
| type | String | Conversation item type | message, function_call, function_call_output |
Content Type Parameter Table¶
| Parameter | Type | Description | Optional Values |
|---|---|---|---|
| type | String | Content type | text, audio, transcript |
| format | String | Content format | plain, markdown, html |
| encoding | String | Encoding method | utf-8, base64 |
Response Status Parameter Table¶
| Parameter | Type | Description | Optional Values |
|---|---|---|---|
| status | String | Response status | completed, cancelled, failed, incomplete |
| status_details | Object | Status details | {"reason": "user_cancelled"} |
| usage | Object | Usage statistics | {"total_tokens": 50, "input_tokens": 20, "output_tokens": 30} |
Audio Transcription Parameter Table¶
| Parameter | Type | Description | Example Value |
|---|---|---|---|
| enabled | Boolean | Whether transcription is enabled | true |
| model | String | Transcription model | whisper-1 |
| language | String | Transcription language | en, zh, auto |
| prompt | String | Transcription prompt | "Transcript of a conversation" |
Audio Stream Parameter Table¶
| Parameter | Type | Description | Optional Values |
|---|---|---|---|
| chunk_size | Integer | Audio chunk size (bytes) | 1024, 2048, 4096 |
| latency | String | Latency mode | low, balanced, high |
| compression | String | Compression method | none, opus, mp3 |
WebRTC Configuration Parameter Table¶
| Parameter | Type | Description | Default Value |
|---|---|---|---|
| ice_servers | Array | ICE server list | [{"urls": "stun:stun.l.google.com:19302"}] |
| audio_constraints | Object | Audio constraints | {"echoCancellation": true} |
| connection_timeout | Integer | Connection timeout (milliseconds) | 30000 |