logoComputeVault
User GuideAPI ReferenceHelp & SupportBusiness Cooperation

OpenAI Realtime API

OpenAI Realtime API

📝 Overview

Introduction

OpenAI Realtime API provides two connection methods:

  1. WebRTC - For real-time audio/video interaction in browsers and mobile clients

  2. WebSocket - For server-to-server application integration

Use Cases

  • Real-time voice conversations
  • Audio/video conferencing
  • Real-time translation
  • Speech transcription
  • Real-time code generation
  • Server-side real-time integration

Key Features

  • Bidirectional audio streaming
  • Mixed text and audio conversations
  • Function calling support
  • Automatic Voice Activity Detection (VAD)
  • Audio transcription capabilities
  • WebSocket server-side integration

🔐 Authentication & Security

Authentication Methods

  1. Standard API Key (server-side only)
  2. Ephemeral Token (client-side use)

Ephemeral Token

  • Validity: 1 minute
  • Usage limit: Single connection
  • Generation: Created via server-side API
POST https://computevault.unodetech.xyz/v1/realtime/sessions
Content-Type: application/json
Authorization: Bearer $API_KEY

{
  "model": "gpt-4o-realtime-preview-2024-12-17",
  "voice": "verse"
}

Security Recommendations

  • Never expose standard API keys on the client side
  • Use HTTPS/WSS for communication
  • Implement appropriate access controls
  • Monitor for unusual activity

🔌 Connection Establishment

WebRTC Connection

  • URL: https://computevault.unodetech.xyz/v1/realtime
  • Query parameters: model
  • Headers:
    • Authorization: Bearer EPHEMERAL_KEY
    • Content-Type: application/sdp

WebSocket Connection

  • URL: wss://computevault.unodetech.xyz/v1/realtime
  • Query parameters: model
  • Headers:
    • Authorization: Bearer YOUR_API_KEY
    • OpenAI-Beta: realtime=v1

Connection Flow

sequenceDiagram
    participant Client
    participant Server
    participant OpenAI
    
    alt WebRTC Connection
        Client->>Server: Request ephemeral token
        Server->>OpenAI: Create session
        OpenAI-->>Server: Return ephemeral token
        Server-->>Client: Return ephemeral token
        
        Client->>OpenAI: Create WebRTC offer
        OpenAI-->>Client: Return answer
        
        Note over Client,OpenAI: Establish WebRTC connection
        
        Client->>OpenAI: Create data channel
        OpenAI-->>Client: Confirm data channel
    else WebSocket Connection
        Server->>OpenAI: Establish WebSocket connection
        OpenAI-->>Server: Confirm connection
        
        Note over Server,OpenAI: Begin real-time conversation
    end

Data Channel

  • Name: oai-events
  • Purpose: Event transmission
  • Format: JSON

Audio Stream

  • Input: addTrack()
  • Output: ontrack event

💬 Conversation Interaction

Conversation Modes

  1. Text-only conversations
  2. Voice conversations
  3. Mixed conversations

Session Management

  • Create session
  • Update session
  • End session
  • Session configuration

Event Types

  • Text events
  • Audio events
  • Function calls
  • Status updates
  • Error events

⚙️ Configuration Options

Audio Configuration

  • Input formats
    • pcm16
    • g711_ulaw
    • g711_alaw
  • Output formats
    • pcm16
    • g711_ulaw
    • g711_alaw
  • Voice types
    • alloy
    • echo
    • shimmer

Model Configuration

  • Temperature
  • Maximum output length
  • System prompt
  • Tool configuration

VAD Configuration

  • Threshold
  • Silence duration
  • Prefix padding

💡 Request Examples

WebSocket Connection ✅

Node.js (ws module)

import WebSocket from "ws";

const url = "wss://computevault.unodetech.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
  headers: {
    "Authorization": "Bearer " + process.env.API_KEY,
    "OpenAI-Beta": "realtime=v1",
  },
});

ws.on("open", function open() {
  console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
  console.log(JSON.parse(message.toString()));
});

Python (websocket-client)

# Requires websocket-client library:
# pip install websocket-client

import os
import json
import websocket

API_KEY = os.environ.get("API_KEY")

url = "wss://computevault.unodetech.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
headers = [
    "Authorization: Bearer " + API_KEY,
    "OpenAI-Beta: realtime=v1"
]

def on_open(ws):
    print("Connected to server.");

def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))

ws = websocket.WebSocketApp(
    url,
    header=headers,
    on_open=on_open,
    on_message=on_message,
)

ws.run_forever()

Browser (Standard WebSocket)

/*
Note: In browser and other client environments, we recommend using WebRTC.
But in Deno and Cloudflare Workers and other browser-like environments,
you can also use the standard WebSocket interface.
*/

const ws = new WebSocket(
  "wss://computevault.unodetech.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17",
  [
    "realtime",
    // Authentication
    "openai-insecure-api-key." + API_KEY, 
    // Optional
    "openai-organization." + OPENAI_ORG_ID,
    "openai-project." + OPENAI_PROJECT_ID,
    // Beta protocol, required
    "openai-beta.realtime-v1"
  ]
);

ws.on("open", function open() {
  console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
  console.log(message.data);
});

Message Send/Receive Example

Node.js/Browser
// Receive server events
ws.on("message", function incoming(message) {
  // Need to parse message data from JSON
  const serverEvent = JSON.parse(message.data)
  console.log(serverEvent);
});

// Send events, create JSON data structure conforming to client event format
const event = {
  type: "response.create",
  response: {
    modalities: ["audio", "text"],
    instructions: "Give me a haiku about code.",
  }
};
ws.send(JSON.stringify(event));
Python
# Send client events, serialize dictionary to JSON
def on_open(ws):
    print("Connected to server.");
    
    event = {
        "type": "response.create",
        "response": {
            "modalities": ["text"],
            "instructions": "Please assist the user."
        }
    }
    ws.send(json.dumps(event))

# Receive messages need to parse message payload from JSON
def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))

WebSocket Python Audio Example

Example Documentation

This is a Python example of OpenAI Realtime WebSocket voice conversation, supporting real-time voice input and output.

Features
  • 🎤 Real-time Voice Recording: Automatically detects voice input and sends it to the server
  • 🔊 Real-time Audio Playback: Plays AI's voice responses
  • 📝 Text Display: Simultaneously displays AI's text responses
  • 🎯 Automatic Voice Detection: Uses server-side VAD (Voice Activity Detection)
  • 🔄 Bidirectional Communication: Supports continuous conversation
Requirements
  • Python 3.7+
  • Microphone and speakers
  • Stable network connection
Install Dependencies
pip install -r requirements.txt

System Dependencies: On Linux systems, you may need to install additional audio libraries:

# Ubuntu/Debian
sudo apt-get install portaudio19-dev python3-pyaudio

# CentOS/RHEL
sudo yum install portaudio-devel
Configuration

In the openai_realtime_client.py file, ensure the following configuration is correct:

WEBSOCKET_URL = "wss://new-api.weroam.xyz/v1/realtime"
API_KEY = "sk-QMA3eCob2EmDI2graXo4onUupApVnrk8wPXemJ7lSfK4QPa0"
MODEL = "gpt-4o-realtime-preview-2024-12-17"
Usage
  1. Run the program:

    python openai_realtime_client.py
  2. Start conversation:

    • The program will automatically start recording after launch
    • Speak into the microphone
    • AI will respond to your voice in real-time
  3. Stop the program:

    • Press Ctrl+C to stop the program
Technical Details

Audio Configuration:

  • Sample Rate: 24kHz (OpenAI Realtime API requirement)
  • Format: PCM16
  • Channels: Mono
  • Encoding: Base64

WebSocket Message Types:

  • session.update: Session configuration
  • input_audio_buffer.append: Send audio data
  • input_audio_buffer.commit: Commit audio buffer
  • response.audio.delta: Receive audio response
  • response.text.delta: Receive text response

Voice Activity Detection: Uses server-side VAD configuration:

  • Threshold: 0.5
  • Prefix padding: 300ms
  • Silence duration: 500ms
Troubleshooting

Common Issues:

  1. Audio Device Issues:

    # Check audio devices
    python -c "import pyaudio; p = pyaudio.PyAudio(); print([p.get_device_info_by_index(i) for i in range(p.get_device_count())])"
  2. Permission Issues:

    • Ensure the program has microphone access permissions
    • Linux: Check ALSA/PulseAudio configuration
  3. Network Connection Issues:

    • Check if the WebSocket URL is correct
    • Ensure the API key is valid
    • Check firewall settings

Debug Mode:

Enable verbose logging:

logging.basicConfig(level=logging.DEBUG)
Code Structure
├── openai_realtime_client.py  # Main program file
├── requirements.txt           # Python dependencies
└── README.md                  # Documentation

Main Classes and Methods:

  • OpenAIRealtimeClient: Main client class
    • connect(): Connect to WebSocket
    • start_audio_streams(): Start audio streams
    • start_recording(): Start recording
    • handle_response(): Handle responses
    • start_conversation(): Start conversation
Notes
  1. Audio Quality: Ensure use in a quiet environment for best results
  2. Network Latency: Real-time conversation is sensitive to network latency
  3. Resource Usage: Long-running sessions may consume significant CPU and memory
  4. API Limits: Be aware of OpenAI API usage limits and costs
License

This project is for learning and testing purposes only. Please comply with OpenAI's terms of use.

Example Code
#!/usr/bin/env python3
"""
OpenAI Realtime WebSocket Audio Example
Supports real-time voice conversation, including audio recording, sending, and playback
"""

import asyncio
import json
import base64
import websockets
import pyaudio
import wave
import threading
import time
from typing import Optional
import logging

# Configure logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),
        logging.FileHandler('websocket_debug.log', encoding='utf-8')
    ]
)
logger = logging.getLogger(__name__)

class OpenAIRealtimeClient:
    def __init__(self, 
                 websocket_url: str,
                 api_key: str,
                 model: str = "gpt-4o-realtime-preview-2024-12-17"):
        self.websocket_url = websocket_url
        self.api_key = api_key
        self.model = model
        self.websocket = None
        self.is_recording = False
        self.is_connected = False
        
        # Audio configuration
        self.audio_format = pyaudio.paInt16
        self.channels = 1
        self.rate = 24000  # OpenAI Realtime API required sample rate
        self.chunk = 1024
        self.audio = pyaudio.PyAudio()
        
        # Audio streams
        self.input_stream = None
        self.output_stream = None
        
    async def connect(self):
        """Connect to WebSocket server"""
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "OpenAI-Beta": "realtime=v1"
        }
        
        logger.info("=" * 80)
        logger.info("🚀 Starting WebSocket connection")
        logger.info("=" * 80)
        logger.info(f"Connection URL: {self.websocket_url}")
        logger.info(f"API Key: {self.api_key[:10]}...")
        logger.info(f"Headers: {json.dumps(headers, ensure_ascii=False, indent=2)}")
        
        try:
            self.websocket = await websockets.connect(
                self.websocket_url,
                additional_headers=headers
            )
            self.is_connected = True
            logger.info("✅ WebSocket connection successful")
            
            # Send session configuration
            await self.send_session_config()
            
        except Exception as e:
            logger.error(f"❌ WebSocket connection failed: {e}")
            logger.error(f"Error type: {type(e).__name__}")
            raise
    
    async def send_session_config(self):
        """Send session configuration"""
        config = {
            "type": "session.update",
            "session": {
                "modalities": ["text", "audio"],
                "instructions": "You are a helpful AI assistant that can engage in real-time voice conversations.",
                "voice": "alloy",
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {
                    "model": "whisper-1"
                },
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.5,
                    "prefix_padding_ms": 300,
                    "silence_duration_ms": 500
                },
                "tools": [],
                "tool_choice": "auto",
                "temperature": 0.8,
                "max_response_output_tokens": 4096
            }
        }
        
        config_json = json.dumps(config, ensure_ascii=False, indent=2)
        logger.info("=" * 60)
        logger.info("📤 Sending session configuration:")
        logger.info(f"Message type: {config['type']}")
        logger.info(f"Configuration content:\n{config_json}")
        logger.info("=" * 60)
        
        await self.websocket.send(json.dumps(config))
        logger.info("✅ Session configuration sent")
    
    def start_audio_streams(self):
        """Start audio input and output streams"""
        try:
            # Input stream (microphone)
            self.input_stream = self.audio.open(
                format=self.audio_format,
                channels=self.channels,
                rate=self.rate,
                input=True,
                frames_per_buffer=self.chunk
            )
            
            # Output stream (speakers)
            self.output_stream = self.audio.open(
                format=self.audio_format,
                channels=self.channels,
                rate=self.rate,
                output=True,
                frames_per_buffer=self.chunk
            )
            
            logger.info("Audio streams started")
            
        except Exception as e:
            logger.error(f"Failed to start audio streams: {e}")
            raise
    
    def stop_audio_streams(self):
        """Stop audio streams"""
        if self.input_stream:
            self.input_stream.stop_stream()
            self.input_stream.close()
            self.input_stream = None
            
        if self.output_stream:
            self.output_stream.stop_stream()
            self.output_stream.close()
            self.output_stream = None
            
        logger.info("Audio streams stopped")
    
    async def start_recording(self):
        """Start recording and send audio data"""
        self.is_recording = True
        logger.info("Starting recording...")
        
        try:
            while self.is_recording and self.is_connected:
                # Read audio data
                audio_data = self.input_stream.read(self.chunk, exception_on_overflow=False)
                
                # Encode audio data as base64
                audio_base64 = base64.b64encode(audio_data).decode('utf-8')
                
                # Send audio data
                message = {
                    "type": "input_audio_buffer.append",
                    "audio": audio_base64
                }
                
                # Log audio data sending (every 10 times to avoid excessive logging)
                if hasattr(self, '_audio_count'):
                    self._audio_count += 1
                else:
                    self._audio_count = 1
                
                if self._audio_count % 10 == 0:  # Log every 10 times
                    logger.debug(f"🎤 Sending audio data #{self._audio_count}: length={len(audio_base64)} characters")
                
                await self.websocket.send(json.dumps(message))
                
                # Brief delay to avoid excessive sending
                await asyncio.sleep(0.01)
                
        except Exception as e:
            logger.error(f"Error during recording: {e}")
        finally:
            logger.info("Recording stopped")
    
    async def stop_recording(self):
        """Stop recording"""
        self.is_recording = False
        
        # Send recording end signal
        if self.websocket and self.is_connected:
            message = {
                "type": "input_audio_buffer.commit"
            }
            
            logger.info("=" * 60)
            logger.info("📤 Sending recording end signal:")
            logger.info(f"Message type: {message['type']}")
            logger.info("=" * 60)
            
            await self.websocket.send(json.dumps(message))
            logger.info("✅ Recording end signal sent")
    
    async def handle_response(self):
        """Handle WebSocket responses"""
        try:
            async for message in self.websocket:
                data = json.loads(message)
                message_type = data.get("type", "unknown")
                
                # Log all received messages in detail
                logger.info("=" * 60)
                logger.info("📥 Received WebSocket message:")
                logger.info(f"Message type: {message_type}")
                
                # Handle different message types
                if message_type == "response.audio.delta":
                    # Handle audio response
                    audio_data = base64.b64decode(data.get("delta", ""))
                    logger.info(f"🎵 Audio data: length={len(audio_data)} bytes")
                    if audio_data and self.output_stream:
                        self.output_stream.write(audio_data)
                        logger.info("✅ Audio data played")
                
                elif message_type == "response.text.delta":
                    # Handle text response
                    text = data.get("delta", "")
                    logger.info(f"💬 Text delta: '{text}'")
                    if text:
                        print(f"AI: {text}", end="", flush=True)
                
                elif message_type == "response.text.done":
                    # Text response complete
                    logger.info("✅ Text response complete")
                    print("\n")
                
                elif message_type == "response.audio.done":
                    # Audio response complete
                    logger.info("✅ Audio response complete")
                
                elif message_type == "error":
                    # Handle errors
                    error_info = data.get('error', {})
                    logger.error("❌ Server error:")
                    logger.error(f"Error details: {json.dumps(error_info, ensure_ascii=False, indent=2)}")
                
                elif message_type == "session.created":
                    # Session created successfully
                    logger.info("✅ Session created")
                
                elif message_type == "session.updated":
                    # Session updated successfully
                    logger.info("✅ Session updated")
                
                elif message_type == "conversation.item.created":
                    # Conversation item created
                    logger.info("📝 Conversation item created")
                
                elif message_type == "conversation.item.input_audio_buffer.speech_started":
                    # Speech started
                    logger.info("🎤 Speech start detected")
                
                elif message_type == "conversation.item.input_audio_buffer.speech_stopped":
                    # Speech stopped
                    logger.info("🔇 Speech stop detected")
                
                elif message_type == "conversation.item.input_audio_buffer.committed":
                    # Audio buffer committed
                    logger.info("📤 Audio buffer committed")
                
                else:
                    # Other unknown message types
                    logger.info(f"❓ Unknown message type: {message_type}")
                
                # Log complete message content (except audio data, as it's too long)
                if message_type != "response.audio.delta":
                    logger.info(f"Complete message content:\n{json.dumps(data, ensure_ascii=False, indent=2)}")
                
                logger.info("=" * 60)
                    
        except websockets.exceptions.ConnectionClosed:
            logger.info("WebSocket connection closed")
            self.is_connected = False
        except Exception as e:
            logger.error(f"Error handling response: {e}")
            self.is_connected = False
    
    async def start_conversation(self):
        """Start conversation"""
        try:
            # Start audio streams
            self.start_audio_streams()
            
            # Create tasks
            response_task = asyncio.create_task(self.handle_response())
            recording_task = asyncio.create_task(self.start_recording())
            
            logger.info("Conversation started, press Ctrl+C to stop")
            
            # Wait for tasks to complete
            await asyncio.gather(response_task, recording_task)
            
        except KeyboardInterrupt:
            logger.info("Stop signal received")
        except Exception as e:
            logger.error(f"Error during conversation: {e}")
        finally:
            await self.cleanup()
    
    async def cleanup(self):
        """Clean up resources"""
        self.is_recording = False
        self.is_connected = False
        
        # Stop audio streams
        self.stop_audio_streams()
        
        # Close WebSocket connection
        if self.websocket:
            await self.websocket.close()
            logger.info("WebSocket connection closed")
        
        # Terminate PyAudio
        self.audio.terminate()
        logger.info("Resource cleanup complete")
    
    async def run(self):
        """Run client"""
        try:
            await self.connect()
            await self.start_conversation()
        except Exception as e:
            logger.error(f"Error running client: {e}")
        finally:
            await self.cleanup()


async def main():
    """Main function"""
    # Configuration parameters
    WEBSOCKET_URL = "wss://new-api.weroam.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
    API_KEY = "sk-EpnduEXFxjAt0AF55W08WBmzqZHlv9f4tmCDWd9TcJqBwVjV"
    MODEL = "gpt-4o-realtime-preview-2024-12-17"
    
    # Create client
    client = OpenAIRealtimeClient(
        websocket_url=WEBSOCKET_URL,
        api_key=API_KEY,
        model=MODEL
    )
    
    # Run client
    await client.run()


if __name__ == "__main__":
    print("OpenAI Realtime WebSocket Audio Example")
    print("=" * 50)
    print("Features:")
    print("- Real-time voice conversation")
    print("- Automatic speech recognition")
    print("- Text and audio responses")
    print("- Press Ctrl+C to stop")
    print("=" * 50)
    
    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print("\nProgram stopped")
    except Exception as e:
        print(f"Program error: {e}")

⚠️ Error Handling

Common Errors

  1. Connection errors
    • Network issues
    • Authentication failures
    • Configuration errors
  2. Audio errors
    • Device permissions
    • Unsupported formats
    • Codec issues
  3. Session errors
    • Token expiration
    • Session timeout
    • Concurrency limits

Error Recovery

  1. Automatic reconnection
  2. Session recovery
  3. Error retry
  4. Graceful degradation

📝 Event Reference

Common Request Headers

All events need to include the following request headers:

HeaderTypeDescriptionExample Value
AuthorizationStringAuthentication tokenBearer $API_KEY
OpenAI-BetaStringAPI versionrealtime=v1

Client Events

session.update

Update the default configuration for the session.

ParameterTypeRequiredDescriptionExample Value/Optional Values
event_idStringNoClient-generated event identifierevent_123
typeStringNoEvent typesession.update
modalitiesString arrayNoModality types the model can respond with["text", "audio"]
instructionsStringNoSystem instructions prepended to model calls"Your knowledge cutoff is 2023-10..."
voiceStringNoVoice type used by the modelalloy, echo, shimmer
input_audio_formatStringNoInput audio formatpcm16, g711_ulaw, g711_alaw
output_audio_formatStringNoOutput audio formatpcm16, g711_ulaw, g711_alaw
input_audio_transcription.modelStringNoModel used for transcriptionwhisper-1
turn_detection.typeStringNoVoice detection typeserver_vad
turn_detection.thresholdNumberNoVAD activation threshold (0.0-1.0)0.8
turn_detection.prefix_padding_msIntegerNoAudio duration included before speech starts500
turn_detection.silence_duration_msIntegerNoSilence duration to detect speech stop1000
toolsArrayNoList of tools available to the model[]
tool_choiceStringNoHow the model chooses toolsauto/none/required
temperatureNumberNoModel sampling temperature0.8
max_output_tokensString/IntegerNoMaximum tokens per response"inf"/4096

input_audio_buffer.append

Append audio data to the input audio buffer.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_456
typeStringNoEvent typeinput_audio_buffer.append
audioStringNoBase64-encoded audio dataBase64EncodedAudioData

input_audio_buffer.commit

Commit the audio data in the buffer as a user message.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_789
typeStringNoEvent typeinput_audio_buffer.commit

input_audio_buffer.clear

Clear all audio data from the input audio buffer.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_012
typeStringNoEvent typeinput_audio_buffer.clear

conversation.item.create

Add a new conversation item to the conversation.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_345
typeStringNoEvent typeconversation.item.create
previous_item_idStringNoNew item will be inserted after this IDnull
item.idStringNoUnique identifier for the conversation itemmsg_001
item.typeStringNoType of conversation itemmessage/function_call/function_call_output
item.statusStringNoStatus of conversation itemcompleted/in_progress/incomplete
item.roleStringNoRole of message senderuser/assistant/system
item.contentArrayNoMessage content[text/audio/transcript]
item.call_idStringNoID of function callcall_001
item.nameStringNoName of called functionfunction_name
item.argumentsStringNoArguments for function call{"param": "value"}
item.outputStringNoOutput result of function call{"result": "value"}

conversation.item.truncate

Truncate audio content in assistant messages.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_678
typeStringNoEvent typeconversation.item.truncate
item_idStringNoID of assistant message item to truncatemsg_002
content_indexIntegerNoIndex of content part to truncate0
audio_end_msIntegerNoEnd time point for audio truncation1500

conversation.item.delete

Delete the specified conversation item from conversation history.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_901
typeStringNoEvent typeconversation.item.delete
item_idStringNoID of conversation item to deletemsg_003

response.create

Trigger response generation.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_234
typeStringNoEvent typeresponse.create
response.modalitiesString arrayNoModality types for response["text", "audio"]
response.instructionsStringNoInstructions for the model"Please assist the user."
response.voiceStringNoVoice type used by the modelalloy/echo/shimmer
response.output_audio_formatStringNoOutput audio formatpcm16
response.toolsArrayNoList of tools available to the model["type", "name", "description"]
response.tool_choiceStringNoHow the model chooses toolsauto
response.temperatureNumberNoSampling temperature0.7
response.max_output_tokensInteger/StringNoMaximum output tokens150/"inf"

response.cancel

Cancel ongoing response generation.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoClient-generated event identifierevent_567
typeStringNoEvent typeresponse.cancel

Server Events

error

Event returned when an error occurs.

ParameterTypeRequiredDescriptionExample Value
event_idString arrayNoUnique identifier for server event["event_890"]
typeStringNoEvent typeerror
error.typeStringNoError typeinvalid_request_error/server_error
error.codeStringNoError codeinvalid_event
error.messageStringNoHuman-readable error message"The 'type' field is missing."
error.paramStringNoParameter related to errornull
error.event_idStringNoID of related eventevent_567

conversation.item.input_audio_transcription.completed

Returned when input audio transcription is enabled and transcription succeeds.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_2122
typeStringNoEvent typeconversation.item.input_audio_transcription.completed
item_idStringNoID of user message itemmsg_003
content_indexIntegerNoIndex of content part containing audio0
transcriptStringNoTranscribed text content"Hello, how are you?"

conversation.item.input_audio_transcription.failed

Returned when input audio transcription is configured but transcription request for user message fails.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_2324
typeString arrayNoEvent type["conversation.item.input_audio_transcription.failed"]
item_idStringNoID of user message itemmsg_003
content_indexIntegerNoIndex of content part containing audio0
error.typeStringNoError typetranscription_error
error.codeStringNoError codeaudio_unintelligible
error.messageStringNoHuman-readable error message"The audio could not be transcribed."
error.paramStringNoParameter related to errornull

conversation.item.truncated

Returned when client truncates previous assistant audio message item.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_2526
typeStringNoEvent typeconversation.item.truncated
item_idStringNoID of truncated assistant message itemmsg_004
content_indexIntegerNoIndex of truncated content part0
audio_end_msIntegerNoTime point when audio was truncated (milliseconds)1500

conversation.item.deleted

Returned when an item in the conversation is deleted.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_2728
typeStringNoEvent typeconversation.item.deleted
item_idStringNoID of deleted conversation itemmsg_005

input_audio_buffer.committed

Returned when audio buffer data is committed.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_1121
typeStringNoEvent typeinput_audio_buffer.committed
previous_item_idStringNoNew conversation item will be inserted after this IDmsg_001
item_idStringNoID of user message item to be createdmsg_002

input_audio_buffer.cleared

Returned when client clears input audio buffer.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_1314
typeStringNoEvent typeinput_audio_buffer.cleared

input_audio_buffer.speech_started

In server voice detection mode, returned when voice input is detected.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_1516
typeStringNoEvent typeinput_audio_buffer.speech_started
audio_start_msIntegerNoMilliseconds from session start to voice detection1000
item_idStringNoID of user message item to be created when voice stopsmsg_003

input_audio_buffer.speech_stopped

In server voice detection mode, returned when voice input stops.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_1718
typeStringNoEvent typeinput_audio_buffer.speech_stopped
audio_start_msIntegerNoMilliseconds from session start to voice stop detection2000
item_idStringNoID of user message item to be createdmsg_003

response.created

Returned when a new response is created.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_2930
typeStringNoEvent typeresponse.created
response.idStringNoUnique identifier for responseresp_001
response.objectStringNoObject typerealtime.response
response.statusStringNoStatus of responsein_progress
response.status_detailsObjectNoAdditional details about statusnull
response.outputString arrayNoList of output items generated by response["[]"]
response.usageObjectNoUsage statistics for responsenull

response.done

Returned when response streaming is complete.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_3132
typeStringNoEvent typeresponse.done
response.idStringNoUnique identifier for responseresp_001
response.objectStringNoObject typerealtime.response
response.statusStringNoFinal status of responsecompleted/cancelled/failed/incomplete
response.status_detailsObjectNoAdditional details about statusnull
response.outputString arrayNoList of output items generated by response["[...]"]
response.usage.total_tokensIntegerNoTotal tokens50
response.usage.input_tokensIntegerNoInput tokens20
response.usage.output_tokensIntegerNoOutput tokens30

response.output_item.added

Returned when a new output item is created during response generation.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_3334
typeStringNoEvent typeresponse.output_item.added
response_idStringNoID of response the output item belongs toresp_001
output_indexStringNoIndex of output item in response0
item.idStringNoUnique identifier for output itemmsg_007
item.objectStringNoObject typerealtime.item
item.typeStringNoType of output itemmessage/function_call/function_call_output
item.statusStringNoStatus of output itemin_progress/completed
item.roleStringNoRole associated with output itemassistant
item.contentArrayNoContent of output item["type", "text", "audio", "transcript"]

response.output_item.done

Returned when output item streaming is complete.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_3536
typeStringNoEvent typeresponse.output_item.done
response_idStringNoID of response the output item belongs toresp_001
output_indexStringNoIndex of output item in response0
item.idStringNoUnique identifier for output itemmsg_007
item.objectStringNoObject typerealtime.item
item.typeStringNoType of output itemmessage/function_call/function_call_output
item.statusStringNoFinal status of output itemcompleted/incomplete
item.roleStringNoRole associated with output itemassistant
item.contentArrayNoContent of output item["type", "text", "audio", "transcript"]

response.content_part.added

Returned when a new content part is added to assistant message item during response generation.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_3738
typeStringNoEvent typeresponse.content_part.added
response_idStringNoID of responseresp_001
item_idStringNoID of message item to add content part tomsg_007
output_indexIntegerNoIndex of output item in response0
content_indexIntegerNoIndex of content part in message item content array0
part.typeStringNoContent typetext/audio
part.textStringNoText content"Hello"
part.audioStringNoBase64-encoded audio data"base64_encoded_audio_data"
part.transcriptStringNoTranscribed text of audio"Hello"

response.content_part.done

Returned when content part in assistant message item streaming is complete.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_3940
typeStringNoEvent typeresponse.content_part.done
response_idStringNoID of responseresp_001
item_idStringNoID of message item to add content part tomsg_007
output_indexIntegerNoIndex of output item in response0
content_indexIntegerNoIndex of content part in message item content array0
part.typeStringNoContent typetext/audio
part.textStringNoText content"Hello"
part.audioStringNoBase64-encoded audio data"base64_encoded_audio_data"
part.transcriptStringNoTranscribed text of audio"Hello"

response.text.delta

Returned when text value of "text" type content part is updated.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_4142
typeStringNoEvent typeresponse.text.delta
response_idStringNoID of responseresp_001
item_idStringNoID of message itemmsg_007
output_indexIntegerNoIndex of output item in response0
content_indexIntegerNoIndex of content part in message item content array0
deltaStringNoText delta update content"Sure, I can h"

response.text.done

Returned when "text" type content part text streaming is complete.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_4344
typeStringNoEvent typeresponse.text.done
response_idStringNoID of responseresp_001
item_idStringNoID of message itemmsg_007
output_indexIntegerNoIndex of output item in response0
content_indexIntegerNoIndex of content part in message item content array0
deltaStringNoFinal complete text content"Sure, I can help with that."

response.audio_transcript.delta

Returned when transcription content of model-generated audio output is updated.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_4546
typeStringNoEvent typeresponse.audio_transcript.delta
response_idStringNoID of responseresp_001
item_idStringNoID of message itemmsg_008
output_indexIntegerNoIndex of output item in response0
content_indexIntegerNoIndex of content part in message item content array0
deltaStringNoTranscription text delta update content"Hello, how can I a"

response.audio_transcript.done

Returned when transcription of model-generated audio output streaming is complete.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_4748
typeStringNoEvent typeresponse.audio_transcript.done
response_idStringNoID of responseresp_001
item_idStringNoID of message itemmsg_008
output_indexIntegerNoIndex of output item in response0
content_indexIntegerNoIndex of content part in message item content array0
transcriptStringNoFinal complete transcribed text of audio"Hello, how can I assist you today?"

response.audio.delta

Returned when model-generated audio content is updated.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_4950
typeStringNoEvent typeresponse.audio.delta
response_idStringNoID of responseresp_001
item_idStringNoID of message itemmsg_008
output_indexIntegerNoIndex of output item in response0
content_indexIntegerNoIndex of content part in message item content array0
deltaStringNoBase64-encoded audio data delta"Base64EncodedAudioDelta"

response.audio.done

Returned when model-generated audio is complete.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_5152
typeStringNoEvent typeresponse.audio.done
response_idStringNoID of responseresp_001
item_idStringNoID of message itemmsg_008
output_indexIntegerNoIndex of output item in response0
content_indexIntegerNoIndex of content part in message item content array0

Function Calling

response.function_call_arguments.delta

Returned when model-generated function call arguments are updated.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_5354
typeStringNoEvent typeresponse.function_call_arguments.delta
response_idStringNoID of responseresp_002
item_idStringNoID of message itemfc_001
output_indexIntegerNoIndex of output item in response0
call_idStringNoID of function callcall_001
deltaStringNoJSON format function call arguments delta"{"location": "San""

response.function_call_arguments.done

Returned when model-generated function call arguments streaming is complete.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_5556
typeStringNoEvent typeresponse.function_call_arguments.done
response_idStringNoID of responseresp_002
item_idStringNoID of message itemfc_001
output_indexIntegerNoIndex of output item in response0
call_idStringNoID of function callcall_001
argumentsStringNoFinal complete function call arguments (JSON format)"{"location": "San Francisco"}"

Other Status Updates

rate_limits.updated

Triggered after each "response.done" event to indicate updated rate limits.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_5758
typeStringNoEvent typerate_limits.updated
rate_limitsObject arrayNoList of rate limit information[{"name": "requests_per_min", "limit": 60, "remaining": 45, "reset_seconds": 35}]

conversation.created

Returned when conversation is created.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_9101
typeStringNoEvent typeconversation.created
conversationObjectNoConversation resource object{"id": "conv_001", "object": "realtime.conversation"}

conversation.item.created

Returned when conversation item is created.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_1920
typeStringNoEvent typeconversation.item.created
previous_item_idStringNoID of previous conversation itemmsg_002
itemObjectNoConversation item object{"id": "msg_003", "object": "realtime.item", "type": "message", "status": "completed", "role": "user", "content": [{"type": "text", "text": "Hello"}]}

session.created

Returned when session is created.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_1234
typeStringNoEvent typesession.created
sessionObjectNoSession object{"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]}

session.updated

Returned when session is updated.

ParameterTypeRequiredDescriptionExample Value
event_idStringNoUnique identifier for server eventevent_5678
typeStringNoEvent typesession.updated
sessionObjectNoUpdated session object{"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]}

Rate Limit Event Parameter Table

ParameterTypeRequiredDescriptionExample Value
nameStringYesLimit namerequests_per_min
limitIntegerYesLimit value60
remainingIntegerYesRemaining available amount45
reset_secondsIntegerYesReset time (seconds)35

Function Call Parameter Table

ParameterTypeRequiredDescriptionExample Value
typeStringYesFunction typefunction
nameStringYesFunction nameget_weather
descriptionStringNoFunction descriptionGet the current weather
parametersObjectYesFunction parameter definition{"type": "object", "properties": {...}}

Audio Format Parameter Table

ParameterTypeDescriptionOptional Values
sample_rateIntegerSample rate8000, 16000, 24000, 44100, 48000
channelsIntegerNumber of channels1 (mono), 2 (stereo)
bits_per_sampleIntegerBits per sample16 (pcm16), 8 (g711)
encodingStringEncoding methodpcm16, g711_ulaw, g711_alaw

Voice Detection Parameter Table

ParameterTypeDescriptionDefault ValueRange
thresholdFloatVAD activation threshold0.50.0-1.0
prefix_padding_msIntegerVoice prefix padding (milliseconds)5000-5000
silence_duration_msIntegerSilence detection duration (milliseconds)1000100-10000

Tool Selection Parameter Table

ParameterTypeDescriptionOptional Values
tool_choiceStringTool selection methodauto, none, required
toolsArrayAvailable tools list[{type, name, description, parameters}]

Model Configuration Parameter Table

ParameterTypeDescriptionRange/Optional ValuesDefault Value
temperatureFloatSampling temperature0.0-2.01.0
max_output_tokensInteger/StringMaximum output length1-4096/"inf""inf"
modalitiesString arrayResponse modalities["text", "audio"]["text"]
voiceStringVoice typealloy, echo, shimmeralloy

Event Common Parameter Table

ParameterTypeRequiredDescriptionExample Value
event_idStringYesUnique identifier for eventevent_123
typeStringYesEvent typesession.update
timestampIntegerNoEvent timestamp (milliseconds)1677649363000

Session Status Parameter Table

ParameterTypeDescriptionOptional Values
statusStringSession statusactive, ended, error
errorObjectError information{"type": "error_type", "message": "error message"}
metadataObjectSession metadata{"client_id": "web", "session_type": "chat"}

Conversation Item Status Parameter Table

ParameterTypeDescriptionOptional Values
statusStringConversation item statuscompleted, in_progress, incomplete
roleStringSender roleuser, assistant, system
typeStringConversation item typemessage, function_call, function_call_output

Content Type Parameter Table

ParameterTypeDescriptionOptional Values
typeStringContent typetext, audio, transcript
formatStringContent formatplain, markdown, html
encodingStringEncoding methodutf-8, base64

Response Status Parameter Table

ParameterTypeDescriptionOptional Values
statusStringResponse statuscompleted, cancelled, failed, incomplete
status_detailsObjectStatus details{"reason": "user_cancelled"}
usageObjectUsage statistics{"total_tokens": 50, "input_tokens": 20, "output_tokens": 30}

Audio Transcription Parameter Table

ParameterTypeDescriptionExample Value
enabledBooleanWhether transcription is enabledtrue
modelStringTranscription modelwhisper-1
languageStringTranscription languageen, zh, auto
promptStringTranscription prompt"Transcript of a conversation"

Audio Stream Parameter Table

ParameterTypeDescriptionOptional Values
chunk_sizeIntegerAudio chunk size (bytes)1024, 2048, 4096
latencyStringLatency modelow, balanced, high
compressionStringCompression methodnone, opus, mp3

WebRTC Configuration Parameter Table

ParameterTypeDescriptionDefault Value
ice_serversArrayICE server list[{"urls": "stun:stun.l.google.com:19302"}]
audio_constraintsObjectAudio constraints{"echoCancellation": true}
connection_timeoutIntegerConnection timeout (milliseconds)30000

How is this guide?

Last updated on

On this page

OpenAI Realtime API
📝 Overview
Introduction
Use Cases
Key Features
🔐 Authentication & Security
Authentication Methods
Ephemeral Token
Security Recommendations
🔌 Connection Establishment
WebRTC Connection
WebSocket Connection
Connection Flow
Data Channel
Audio Stream
💬 Conversation Interaction
Conversation Modes
Session Management
Event Types
⚙️ Configuration Options
Audio Configuration
Model Configuration
VAD Configuration
💡 Request Examples
WebSocket Connection ✅
Node.js (ws module)
Python (websocket-client)
Browser (Standard WebSocket)
Message Send/Receive Example
Node.js/Browser
Python
WebSocket Python Audio Example
Example Documentation
Features
Requirements
Install Dependencies
Configuration
Usage
Technical Details
Troubleshooting
Code Structure
Notes
License
Example Code
⚠️ Error Handling
Common Errors
Error Recovery
📝 Event Reference
Common Request Headers
Client Events
session.update
input_audio_buffer.append
input_audio_buffer.commit
input_audio_buffer.clear
conversation.item.create
conversation.item.truncate
conversation.item.delete
response.create
response.cancel
Server Events
error
conversation.item.input_audio_transcription.completed
conversation.item.input_audio_transcription.failed
conversation.item.truncated
conversation.item.deleted
input_audio_buffer.committed
input_audio_buffer.cleared
input_audio_buffer.speech_started
input_audio_buffer.speech_stopped
response.created
response.done
response.output_item.added
response.output_item.done
response.content_part.added
response.content_part.done
response.text.delta
response.text.done
response.audio_transcript.delta
response.audio_transcript.done
response.audio.delta
response.audio.done
Function Calling
response.function_call_arguments.delta
response.function_call_arguments.done
Other Status Updates
rate_limits.updated
conversation.created
conversation.item.created
session.created
session.updated
Rate Limit Event Parameter Table
Function Call Parameter Table
Audio Format Parameter Table
Voice Detection Parameter Table
Tool Selection Parameter Table
Model Configuration Parameter Table
Event Common Parameter Table
Session Status Parameter Table
Conversation Item Status Parameter Table
Content Type Parameter Table
Response Status Parameter Table
Audio Transcription Parameter Table
Audio Stream Parameter Table
WebRTC Configuration Parameter Table