Skip to content

whisper-large-v3

whisper-large-v3 is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI. Trained on over 5 million hours of labeled audio data, it delivers exceptional transcription accuracy across a wide range of languages, accents, and acoustic environments.

Key Features

  • Parameters: 1.55 billion
  • Architecture: Encoder-decoder Transformer
  • Languages: Supports 99 languages with enhanced performance
  • Capabilities: Transcription (speech-to-text) and translation (speech-to-English)
  • Accuracy: Significant improvements over large-v2, with 10-20% error reduction across languages
  • Input Type: Processes audio files in various formats (MP3, WAV, M4A, etc.)

API Reference

Endpoints

Our API follows the OpenAI-compatible format:

POST /v1/audio/transcriptions
POST /v1/audio/translations

Model Name

Use the following model identifier in your requests:

whisper-large-v3

Parameters

Common Parameters

  • file (file): The audio file to transcribe (in multipart/form-data).
  • model (string): The model identifier (“whisper-large-v3”).
  • response_format (string, optional): Response format - “json” (default), “text”.
  • temperature (float, optional): Sampling temperature (0.0 to 1.0, default: 0).
  • stream (boolean, optional): Whether to stream the response (default: false).

Transcription-Specific Parameters

  • language (string, optional): Language code of the input audio (e.g., “en”, “es”, “fr”). If not specified, the model will auto-detect the language.
  • prompt (string, optional): Text to guide the model’s style or continue a previous audio segment.

Translation-Specific Parameters

  • prompt (string, optional): Text to guide the translation.

Code Examples

Choose your preferred programming language:

Standard Request

from openai import OpenAI
# Initialize client with DeepRequest configuration
client = OpenAI(
api_key="your-deeprequest-key",
base_url="https://api.deeprequest.io/v1"
)
# Transcribe Spanish audio to text
audio_file = open("spanish_meeting.mp3", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
language="es", # Specify source language (Spanish)
response_format="json"
)
print(f"Spanish Transcription: {transcript}")
# Translate audio directly to English
translation = client.audio.translations.create(
model="whisper-large-v3",
file=audio_file,
response_format="json"
)
print(f"English Translation: {translation}")

Streaming Request

from openai import OpenAI
# Initialize client with DeepRequest configuration
client = OpenAI(
api_key="your-deeprequest-key",
base_url="https://api.deeprequest.io/v1"
)
# Transcribe Spanish audio to text with streaming
audio_file = open("spanish_meeting.mp3", "rb")
stream = client.audio.transcriptions.create(
model="whisper-large-v3",
file=audio_file,
language="es", # Specify source language (Spanish)
response_format="json",
stream=True # Enable streaming
)
# Process the streaming response
print("\nStreaming transcription:")
for chunk in stream:
if chunk.data:
print(chunk.data, end="", flush=True)
# Translate audio directly to English with streaming
stream = client.audio.translations.create(
model="whisper-large-v3",
file=audio_file,
response_format="json",
stream=True # Enable streaming
)
# Process the streaming response
print("\nStreaming translation:")
for chunk in stream:
if chunk.data:
print(chunk.data, end="", flush=True)

Response Formats

Below are examples of different response formats available:

Text Format Response

This is a transcription of the audio file that was submitted for processing.

JSON Format Response

{
"text": "This is a transcription of the audio file that was submitted for processing."
}

Performance Notes

  • File Size Limits: Maximum audio file size is 25MB.
  • Duration: Handles audio files up to 4 hours in length.
  • Processing Speed: Approximately 0.5x real-time (e.g., a 1-minute audio processes in about 30 seconds).
  • Best Practices: For optimal results, clean audio with minimal background noise is recommended.

Additional Resources

For detailed API documentation, please visit our API Docs or ReDoc.

Pricing details at Pricing.