whisper-large-v3
whisper-large-v3
is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI. Trained on over 5 million hours of labeled audio data, it delivers exceptional transcription accuracy across a wide range of languages, accents, and acoustic environments.
Key Features
- Parameters: 1.55 billion
- Architecture: Encoder-decoder Transformer
- Languages: Supports 99 languages with enhanced performance
- Capabilities: Transcription (speech-to-text) and translation (speech-to-English)
- Accuracy: Significant improvements over large-v2, with 10-20% error reduction across languages
- Input Type: Processes audio files in various formats (MP3, WAV, M4A, etc.)
API Reference
Endpoints
Our API follows the OpenAI-compatible format:
POST /v1/audio/transcriptionsPOST /v1/audio/translations
Model Name
Use the following model identifier in your requests:
whisper-large-v3
Parameters
Common Parameters
file
(file): The audio file to transcribe (in multipart/form-data).model
(string): The model identifier (“whisper-large-v3”).response_format
(string, optional): Response format - “json” (default), “text”.temperature
(float, optional): Sampling temperature (0.0 to 1.0, default: 0).stream
(boolean, optional): Whether to stream the response (default: false).
Transcription-Specific Parameters
language
(string, optional): Language code of the input audio (e.g., “en”, “es”, “fr”). If not specified, the model will auto-detect the language.prompt
(string, optional): Text to guide the model’s style or continue a previous audio segment.
Translation-Specific Parameters
prompt
(string, optional): Text to guide the translation.
Code Examples
Choose your preferred programming language:
Standard Request
from openai import OpenAI
# Initialize client with DeepRequest configurationclient = OpenAI( api_key="your-deeprequest-key", base_url="https://api.deeprequest.io/v1")
# Transcribe Spanish audio to textaudio_file = open("spanish_meeting.mp3", "rb")transcript = client.audio.transcriptions.create( model="whisper-large-v3", file=audio_file, language="es", # Specify source language (Spanish) response_format="json")print(f"Spanish Transcription: {transcript}")
# Translate audio directly to Englishtranslation = client.audio.translations.create( model="whisper-large-v3", file=audio_file, response_format="json")print(f"English Translation: {translation}")
Streaming Request
from openai import OpenAI
# Initialize client with DeepRequest configurationclient = OpenAI( api_key="your-deeprequest-key", base_url="https://api.deeprequest.io/v1")
# Transcribe Spanish audio to text with streamingaudio_file = open("spanish_meeting.mp3", "rb")stream = client.audio.transcriptions.create( model="whisper-large-v3", file=audio_file, language="es", # Specify source language (Spanish) response_format="json", stream=True # Enable streaming)
# Process the streaming responseprint("\nStreaming transcription:")for chunk in stream: if chunk.data: print(chunk.data, end="", flush=True)
# Translate audio directly to English with streamingstream = client.audio.translations.create( model="whisper-large-v3", file=audio_file, response_format="json", stream=True # Enable streaming)
# Process the streaming responseprint("\nStreaming translation:")for chunk in stream: if chunk.data: print(chunk.data, end="", flush=True)
Standard Request
// Install: npm install openaiimport OpenAI from 'openai';import { createReadStream } from 'fs';
// Initialize client with DeepRequest configurationconst openai = new OpenAI({ apiKey: 'your-deeprequest-key', baseURL: 'https://api.deeprequest.io/v1'});
async function processAudio() { // Transcribe Spanish audio to text const transcription = await openai.audio.transcriptions.create({ model: 'whisper-large-v3', file: await fetch('spanish_meeting.mp3'), language: 'es', // Specify source language (Spanish) response_format: 'json' }); console.log('Spanish Transcription:', transcription);
// Translate audio directly to English const translation = await openai.audio.translations.create({ model: 'whisper-large-v3', file: await fetch('spanish_meeting.mp3'), response_format: 'json' }); console.log('English Translation:', translation);}
processAudio();
Streaming Request
// Install: npm install openaiimport OpenAI from 'openai';import { createReadStream } from 'fs';
// Initialize client with DeepRequest configurationconst openai = new OpenAI({ apiKey: 'your-deeprequest-key', baseURL: 'https://api.deeprequest.io/v1'});
async function processAudioStreaming() { // Transcribe Spanish audio to text with streaming console.log('Streaming transcription:'); const transcriptionStream = await openai.audio.transcriptions.create({ model: 'whisper-large-v3', file: createReadStream('spanish_meeting.mp3'), language: 'es', // Specify source language (Spanish) response_format: 'json', stream: true // Enable streaming });
// Process the streaming transcription let transcriptionText = ''; for await (const chunk of transcriptionStream) { if (chunk.data) { process.stdout.write(chunk.data); transcriptionText += chunk.data; } }
// Translate audio directly to English with streaming console.log('\nStreaming translation:'); const translationStream = await openai.audio.translations.create({ model: 'whisper-large-v3', file: createReadStream('spanish_meeting.mp3'), response_format: 'json', stream: true // Enable streaming });
// Process the streaming translation let translationText = ''; for await (const chunk of translationStream) { if (chunk.data) { process.stdout.write(chunk.data); translationText += chunk.data; } }}
processAudioStreaming();
Standard Request
package main
import ( "context" "fmt" "github.com/sashabaranov/go-openai" "os")
func main() { // Initialize client with DeepRequest configuration config := openai.DefaultConfig("your-deeprequest-key") config.BaseURL = "https://api.deeprequest.io/v1" client := openai.NewClientWithConfig(config)
// Open audio file audioFile, err := os.Open("spanish_meeting.mp3") if err != nil { fmt.Printf("Error opening file: %v\n", err) return } defer audioFile.Close()
// Transcribe Spanish audio to text transcriptionReq := openai.AudioRequest{ Model: "whisper-large-v3", FilePath: audioFile.Name(), Language: "es", // Specify source language (Spanish) Format: "json", }
transcript, err := client.CreateTranscription(context.Background(), transcriptionReq) if err != nil { fmt.Printf("Transcription error: %v\n", err) return } fmt.Printf("Spanish Transcription: %s\n", transcript.Text)
// Translate audio directly to English translationReq := openai.AudioRequest{ Model: "whisper-large-v3", FilePath: audioFile.Name(), Format: "json", }
translation, err := client.CreateTranslation(context.Background(), translationReq) if err != nil { fmt.Printf("Translation error: %v\n", err) return } fmt.Printf("English Translation: %s\n", translation.Text)}
Streaming Request
package main
import ( "context" "fmt" "github.com/sashabaranov/go-openai" "io" "os")
func main() { // Initialize client with DeepRequest configuration config := openai.DefaultConfig("your-deeprequest-key") config.BaseURL = "https://api.deeprequest.io/v1" client := openai.NewClientWithConfig(config)
// Open audio file audioFile, err := os.Open("spanish_meeting.mp3") if err != nil { fmt.Printf("Error opening file: %v\n", err) return } defer audioFile.Close()
// Transcribe Spanish audio to text with streaming fmt.Println("Streaming transcription:") transcriptionReq := openai.AudioRequest{ Model: "whisper-large-v3", FilePath: audioFile.Name(), Language: "es", // Specify source language (Spanish) Format: "json", Stream: true, // Enable streaming }
transcriptStream, err := client.CreateTranscriptionStream(context.Background(), transcriptionReq) if err != nil { fmt.Printf("Transcription stream error: %v\n", err) return } defer transcriptStream.Close()
// Process the streaming transcription var transcriptionText string for { response, err := transcriptStream.Recv() if err == io.EOF { break } if err != nil { fmt.Printf("Receive error: %v\n", err) return } fmt.Print(response.Data) transcriptionText += response.Data }
// Reset file position for translation audioFile.Seek(0, 0)
// Translate audio directly to English with streaming fmt.Println("\nStreaming translation:") translationReq := openai.AudioRequest{ Model: "whisper-large-v3", FilePath: audioFile.Name(), Format: "json", Stream: true, // Enable streaming }
translationStream, err := client.CreateTranslationStream(context.Background(), translationReq) if err != nil { fmt.Printf("Translation stream error: %v\n", err) return } defer translationStream.Close()
// Process the streaming translation var translationText string for { response, err := translationStream.Recv() if err == io.EOF { break } if err != nil { fmt.Printf("Receive error: %v\n", err) return } fmt.Print(response.Data) translationText += response.Data }}
Standard Request
# Install: gem install ruby-openairequire 'openai'
# Initialize client with DeepRequest configurationclient = OpenAI::Client.new( access_token: 'your-deeprequest-key', uri_base: 'https://api.deeprequest.io/v1')
# Transcribe Spanish audio to textaudio_file = File.open('spanish_meeting.mp3', 'rb')transcript = client.audio.transcribe( parameters: { model: 'whisper-large-v3', file: audio_file, language: 'es', # Specify source language (Spanish) response_format: 'json' })puts "Spanish Transcription: #{transcript}"
# Translate audio directly to Englishaudio_file = File.open('spanish_meeting.mp3', 'rb')translation = client.audio.translate( parameters: { model: 'whisper-large-v3', file: audio_file, response_format: 'json' })puts "English Translation: #{translation}"
Streaming Request
# Install: gem install ruby-openairequire 'openai'
# Initialize client with DeepRequest configurationclient = OpenAI::Client.new( access_token: 'your-deeprequest-key', uri_base: 'https://api.deeprequest.io/v1')
# Transcribe Spanish audio to text with streamingaudio_file = File.open('spanish_meeting.mp3', 'rb')puts "Streaming transcription:"transcription_stream = client.audio.transcribe( parameters: { model: 'whisper-large-v3', file: audio_file, language: 'es', # Specify source language (Spanish) response_format: 'json', stream: true # Enable streaming })
# Process the streaming transcriptiontranscription_text = ""transcription_stream.each do |chunk| if chunk['data'] print chunk['data'] transcription_text += chunk['data'] endend
# Translate audio directly to English with streamingaudio_file = File.open('spanish_meeting.mp3', 'rb')puts "\nStreaming translation:"translation_stream = client.audio.translate( parameters: { model: 'whisper-large-v3', file: audio_file, response_format: 'json', stream: true # Enable streaming })
# Process the streaming translationtranslation_text = ""translation_stream.each do |chunk| if chunk['data'] print chunk['data'] translation_text += chunk['data'] endend
Standard Request
<?php// Install: composer require openai-php/clientrequire 'vendor/autoload.php';
// Initialize client with DeepRequest configuration$client = OpenAI::client('your-deeprequest-key', [ 'base_uri' => 'https://api.deeprequest.io/v1']);
// Transcribe Spanish audio to text$transcript = $client->audio()->transcribe([ 'model' => 'whisper-large-v3', 'file' => fopen('spanish_meeting.mp3', 'r'), 'language' => 'es', // Specify source language (Spanish) 'response_format' => 'json']);echo "Spanish Transcription: " . $transcript . PHP_EOL;
// Translate audio directly to English$translation = $client->audio()->translate([ 'model' => 'whisper-large-v3', 'file' => fopen('spanish_meeting.mp3', 'r'), 'response_format' => 'json']);echo "English Translation: " . $translation . PHP_EOL;?>
Streaming Request
<?php// Install: composer require openai-php/clientrequire 'vendor/autoload.php';
// Initialize client with DeepRequest configuration$client = OpenAI::client('your-deeprequest-key', [ 'base_uri' => 'https://api.deeprequest.io/v1']);
// Transcribe Spanish audio to text with streamingecho "Streaming transcription:" . PHP_EOL;$transcriptStream = $client->audio()->transcribeStreamed([ 'model' => 'whisper-large-v3', 'file' => fopen('spanish_meeting.mp3', 'r'), 'language' => 'es', // Specify source language (Spanish) 'response_format' => 'json', 'stream' => true // Enable streaming]);
// Process the streaming transcription$transcriptionText = '';foreach ($transcriptStream as $chunk) { if ($data = $chunk->data) { echo $data; flush(); $transcriptionText .= $data; }}
// Translate audio directly to English with streamingecho "\nStreaming translation:" . PHP_EOL;$translationStream = $client->audio()->translateStreamed([ 'model' => 'whisper-large-v3', 'file' => fopen('spanish_meeting.mp3', 'r'), 'response_format' => 'json', 'stream' => true // Enable streaming]);
// Process the streaming translation$translationText = '';foreach ($translationStream as $chunk) { if ($data = $chunk->data) { echo $data; flush(); $translationText .= $data; }}?>
Standard Request
# Transcribe Spanish audio to text curl https://api.deeprequest.io/v1/audio/transcriptions \ -H "Authorization: Bearer YOUR_API_KEY" \ -F file="@spanish_meeting.mp3" \ -F model="whisper-large-v3" \ -F language="es" \ -F response_format="json"
# Translate audio directly to English curl https://api.deeprequest.io/v1/audio/translations \ -H "Authorization: Bearer YOUR_API_KEY" \ -F file="@spanish_meeting.mp3" \ -F model="whisper-large-v3" \ -F response_format="json"
Streaming Request
# Transcribe Spanish audio to text with streaming curl https://api.deeprequest.io/v1/audio/transcriptions \ -H "Authorization: Bearer YOUR_API_KEY" \ -F file="@spanish_meeting.mp3" \ -F model="whisper-large-v3" \ -F language="es" \ -F response_format="json" \ -F stream=true
# Translate audio directly to English with streaming curl https://api.deeprequest.io/v1/audio/translations \ -H "Authorization: Bearer YOUR_API_KEY" \ -F file="@spanish_meeting.mp3" \ -F model="whisper-large-v3" \ -F response_format="json" \ -F stream=true
Response Formats
Below are examples of different response formats available:
Text Format Response
This is a transcription of the audio file that was submitted for processing.
JSON Format Response
{ "text": "This is a transcription of the audio file that was submitted for processing."}
Performance Notes
- File Size Limits: Maximum audio file size is 25MB.
- Duration: Handles audio files up to 4 hours in length.
- Processing Speed: Approximately 0.5x real-time (e.g., a 1-minute audio processes in about 30 seconds).
- Best Practices: For optimal results, clean audio with minimal background noise is recommended.
Additional Resources
For detailed API documentation, please visit our API Docs or ReDoc.
Pricing details at Pricing.