whisper-large-v3

whisper-large-v3 is a state-of-the-art automatic speech recognition (ASR) and speech translation model developed by OpenAI. Trained on over 5 million hours of labeled audio data, it delivers exceptional transcription accuracy across a wide range of languages, accents, and acoustic environments.

Key Features

Parameters: 1.55 billion
Architecture: Encoder-decoder Transformer
Languages: Supports 99 languages with enhanced performance
Capabilities: Transcription (speech-to-text) and translation (speech-to-English)
Accuracy: Significant improvements over large-v2, with 10-20% error reduction across languages
Input Type: Processes audio files in various formats (MP3, WAV, M4A, etc.)

API Reference

Endpoints

Our API follows the OpenAI-compatible format:

POST /v1/audio/transcriptions
POST /v1/audio/translations

Model Name

Use the following model identifier in your requests:

whisper-large-v3

Parameters

Common Parameters

file (file): The audio file to transcribe (in multipart/form-data).
model (string): The model identifier (“whisper-large-v3”).
response_format (string, optional): Response format - “json” (default), “text”.
temperature (float, optional): Sampling temperature (0.0 to 1.0, default: 0).
stream (boolean, optional): Whether to stream the response (default: false).

Transcription-Specific Parameters

language (string, optional): Language code of the input audio (e.g., “en”, “es”, “fr”). If not specified, the model will auto-detect the language.
prompt (string, optional): Text to guide the model’s style or continue a previous audio segment.

Translation-Specific Parameters

prompt (string, optional): Text to guide the translation.

Code Examples

Choose your preferred programming language:

Standard Request

from openai import OpenAI

# Initialize client with DeepRequest configuration
client = OpenAI(
    api_key="your-deeprequest-key",
    base_url="https://api.deeprequest.io/v1"
)

# Transcribe Spanish audio to text
audio_file = open("spanish_meeting.mp3", "rb")
transcript = client.audio.transcriptions.create(
    model="whisper-large-v3",
    file=audio_file,
    language="es",        # Specify source language (Spanish)
    response_format="json"
)
print(f"Spanish Transcription: {transcript}")

# Translate audio directly to English
translation = client.audio.translations.create(
    model="whisper-large-v3",
    file=audio_file,
    response_format="json"
)
print(f"English Translation: {translation}")

Streaming Request

from openai import OpenAI

# Initialize client with DeepRequest configuration
client = OpenAI(
    api_key="your-deeprequest-key",
    base_url="https://api.deeprequest.io/v1"
)

# Transcribe Spanish audio to text with streaming
audio_file = open("spanish_meeting.mp3", "rb")
stream = client.audio.transcriptions.create(
    model="whisper-large-v3",
    file=audio_file,
    language="es",        # Specify source language (Spanish)
    response_format="json",
    stream=True           # Enable streaming
)

# Process the streaming response
print("\nStreaming transcription:")
for chunk in stream:
    if chunk.data:
        print(chunk.data, end="", flush=True)

# Translate audio directly to English with streaming
stream = client.audio.translations.create(
    model="whisper-large-v3",
    file=audio_file,
    response_format="json",
    stream=True           # Enable streaming
)

# Process the streaming response
print("\nStreaming translation:")
for chunk in stream:
    if chunk.data:
        print(chunk.data, end="", flush=True)

Standard Request

// Install: npm install openai
import OpenAI from 'openai';
import { createReadStream } from 'fs';

// Initialize client with DeepRequest configuration
const openai = new OpenAI({
  apiKey: 'your-deeprequest-key',
  baseURL: 'https://api.deeprequest.io/v1'
});

async function processAudio() {
  // Transcribe Spanish audio to text
  const transcription = await openai.audio.transcriptions.create({
    model: 'whisper-large-v3',
    file: await fetch('spanish_meeting.mp3'),
    language: 'es',        // Specify source language (Spanish)
    response_format: 'json'
  });
  console.log('Spanish Transcription:', transcription);

  // Translate audio directly to English
  const translation = await openai.audio.translations.create({
    model: 'whisper-large-v3',
    file: await fetch('spanish_meeting.mp3'),
    response_format: 'json'
  });
  console.log('English Translation:', translation);
}

processAudio();

Streaming Request

// Install: npm install openai
import OpenAI from 'openai';
import { createReadStream } from 'fs';

// Initialize client with DeepRequest configuration
const openai = new OpenAI({
  apiKey: 'your-deeprequest-key',
  baseURL: 'https://api.deeprequest.io/v1'
});

async function processAudioStreaming() {
  // Transcribe Spanish audio to text with streaming
  console.log('Streaming transcription:');
  const transcriptionStream = await openai.audio.transcriptions.create({
    model: 'whisper-large-v3',
    file: createReadStream('spanish_meeting.mp3'),
    language: 'es',        // Specify source language (Spanish)
    response_format: 'json',
    stream: true           // Enable streaming
  });

  // Process the streaming transcription
  let transcriptionText = '';
  for await (const chunk of transcriptionStream) {
    if (chunk.data) {
      process.stdout.write(chunk.data);
      transcriptionText += chunk.data;
    }
  }

  // Translate audio directly to English with streaming
  console.log('\nStreaming translation:');
  const translationStream = await openai.audio.translations.create({
    model: 'whisper-large-v3',
    file: createReadStream('spanish_meeting.mp3'),
    response_format: 'json',
    stream: true           // Enable streaming
  });

  // Process the streaming translation
  let translationText = '';
  for await (const chunk of translationStream) {
    if (chunk.data) {
      process.stdout.write(chunk.data);
      translationText += chunk.data;
    }
  }
}

processAudioStreaming();

Standard Request

package main

import (
  "context"
  "fmt"
  "github.com/sashabaranov/go-openai"
  "os"
)

func main() {
  // Initialize client with DeepRequest configuration
  config := openai.DefaultConfig("your-deeprequest-key")
  config.BaseURL = "https://api.deeprequest.io/v1"
  client := openai.NewClientWithConfig(config)

  // Open audio file
  audioFile, err := os.Open("spanish_meeting.mp3")
  if err != nil {
    fmt.Printf("Error opening file: %v\n", err)
    return
  }
  defer audioFile.Close()

  // Transcribe Spanish audio to text
  transcriptionReq := openai.AudioRequest{
    Model:    "whisper-large-v3",
    FilePath: audioFile.Name(),
    Language: "es",        // Specify source language (Spanish)
    Format:   "json",
  }

  transcript, err := client.CreateTranscription(context.Background(), transcriptionReq)
  if err != nil {
    fmt.Printf("Transcription error: %v\n", err)
    return
  }
  fmt.Printf("Spanish Transcription: %s\n", transcript.Text)

  // Translate audio directly to English
  translationReq := openai.AudioRequest{
    Model:    "whisper-large-v3",
    FilePath: audioFile.Name(),
    Format:   "json",
  }

  translation, err := client.CreateTranslation(context.Background(), translationReq)
  if err != nil {
    fmt.Printf("Translation error: %v\n", err)
    return
  }
  fmt.Printf("English Translation: %s\n", translation.Text)
}

Streaming Request

package main

import (
  "context"
  "fmt"
  "github.com/sashabaranov/go-openai"
  "io"
  "os"
)

func main() {
  // Initialize client with DeepRequest configuration
  config := openai.DefaultConfig("your-deeprequest-key")
  config.BaseURL = "https://api.deeprequest.io/v1"
  client := openai.NewClientWithConfig(config)

  // Open audio file
  audioFile, err := os.Open("spanish_meeting.mp3")
  if err != nil {
    fmt.Printf("Error opening file: %v\n", err)
    return
  }
  defer audioFile.Close()

  // Transcribe Spanish audio to text with streaming
  fmt.Println("Streaming transcription:")
  transcriptionReq := openai.AudioRequest{
    Model:    "whisper-large-v3",
    FilePath: audioFile.Name(),
    Language: "es",        // Specify source language (Spanish)
    Format:   "json",
    Stream:   true,        // Enable streaming
  }

  transcriptStream, err := client.CreateTranscriptionStream(context.Background(), transcriptionReq)
  if err != nil {
    fmt.Printf("Transcription stream error: %v\n", err)
    return
  }
  defer transcriptStream.Close()

  // Process the streaming transcription
  var transcriptionText string
  for {
    response, err := transcriptStream.Recv()
    if err == io.EOF {
      break
    }
    if err != nil {
      fmt.Printf("Receive error: %v\n", err)
      return
    }
    fmt.Print(response.Data)
    transcriptionText += response.Data
  }

  // Reset file position for translation
  audioFile.Seek(0, 0)

  // Translate audio directly to English with streaming
  fmt.Println("\nStreaming translation:")
  translationReq := openai.AudioRequest{
    Model:    "whisper-large-v3",
    FilePath: audioFile.Name(),
    Format:   "json",
    Stream:   true,        // Enable streaming
  }

  translationStream, err := client.CreateTranslationStream(context.Background(), translationReq)
  if err != nil {
    fmt.Printf("Translation stream error: %v\n", err)
    return
  }
  defer translationStream.Close()

  // Process the streaming translation
  var translationText string
  for {
    response, err := translationStream.Recv()
    if err == io.EOF {
      break
    }
    if err != nil {
      fmt.Printf("Receive error: %v\n", err)
      return
    }
    fmt.Print(response.Data)
    translationText += response.Data
  }
}

Standard Request

# Install: gem install ruby-openai
require 'openai'

# Initialize client with DeepRequest configuration
client = OpenAI::Client.new(
  access_token: 'your-deeprequest-key',
  uri_base: 'https://api.deeprequest.io/v1'
)

# Transcribe Spanish audio to text
audio_file = File.open('spanish_meeting.mp3', 'rb')
transcript = client.audio.transcribe(
  parameters: {
    model: 'whisper-large-v3',
    file: audio_file,
    language: 'es',        # Specify source language (Spanish)
    response_format: 'json'
  }
)
puts "Spanish Transcription: #{transcript}"

# Translate audio directly to English
audio_file = File.open('spanish_meeting.mp3', 'rb')
translation = client.audio.translate(
  parameters: {
    model: 'whisper-large-v3',
    file: audio_file,
    response_format: 'json'
  }
)
puts "English Translation: #{translation}"

Streaming Request

# Install: gem install ruby-openai
require 'openai'

# Initialize client with DeepRequest configuration
client = OpenAI::Client.new(
  access_token: 'your-deeprequest-key',
  uri_base: 'https://api.deeprequest.io/v1'
)

# Transcribe Spanish audio to text with streaming
audio_file = File.open('spanish_meeting.mp3', 'rb')
puts "Streaming transcription:"
transcription_stream = client.audio.transcribe(
  parameters: {
    model: 'whisper-large-v3',
    file: audio_file,
    language: 'es',        # Specify source language (Spanish)
    response_format: 'json',
    stream: true           # Enable streaming
  }
)

# Process the streaming transcription
transcription_text = ""
transcription_stream.each do |chunk|
  if chunk['data']
    print chunk['data']
    transcription_text += chunk['data']
  end
end

# Translate audio directly to English with streaming
audio_file = File.open('spanish_meeting.mp3', 'rb')
puts "\nStreaming translation:"
translation_stream = client.audio.translate(
  parameters: {
    model: 'whisper-large-v3',
    file: audio_file,
    response_format: 'json',
    stream: true           # Enable streaming
  }
)

# Process the streaming translation
translation_text = ""
translation_stream.each do |chunk|
  if chunk['data']
    print chunk['data']
    translation_text += chunk['data']
  end
end

Standard Request

<?php
// Install: composer require openai-php/client
require 'vendor/autoload.php';

// Initialize client with DeepRequest configuration
$client = OpenAI::client('your-deeprequest-key', [
    'base_uri' => 'https://api.deeprequest.io/v1'
]);

// Transcribe Spanish audio to text
$transcript = $client->audio()->transcribe([
    'model' => 'whisper-large-v3',
    'file' => fopen('spanish_meeting.mp3', 'r'),
    'language' => 'es',        // Specify source language (Spanish)
    'response_format' => 'json'
]);
echo "Spanish Transcription: " . $transcript . PHP_EOL;

// Translate audio directly to English
$translation = $client->audio()->translate([
    'model' => 'whisper-large-v3',
    'file' => fopen('spanish_meeting.mp3', 'r'),
    'response_format' => 'json'
]);
echo "English Translation: " . $translation . PHP_EOL;
?>

Streaming Request

<?php
// Install: composer require openai-php/client
require 'vendor/autoload.php';

// Initialize client with DeepRequest configuration
$client = OpenAI::client('your-deeprequest-key', [
    'base_uri' => 'https://api.deeprequest.io/v1'
]);

// Transcribe Spanish audio to text with streaming
echo "Streaming transcription:" . PHP_EOL;
$transcriptStream = $client->audio()->transcribeStreamed([
    'model' => 'whisper-large-v3',
    'file' => fopen('spanish_meeting.mp3', 'r'),
    'language' => 'es',        // Specify source language (Spanish)
    'response_format' => 'json',
    'stream' => true           // Enable streaming
]);

// Process the streaming transcription
$transcriptionText = '';
foreach ($transcriptStream as $chunk) {
    if ($data = $chunk->data) {
        echo $data;
        flush();
        $transcriptionText .= $data;
    }
}

// Translate audio directly to English with streaming
echo "\nStreaming translation:" . PHP_EOL;
$translationStream = $client->audio()->translateStreamed([
    'model' => 'whisper-large-v3',
    'file' => fopen('spanish_meeting.mp3', 'r'),
    'response_format' => 'json',
    'stream' => true           // Enable streaming
]);

// Process the streaming translation
$translationText = '';
foreach ($translationStream as $chunk) {
    if ($data = $chunk->data) {
        echo $data;
        flush();
        $translationText .= $data;
    }
}
?>

Standard Request

    # Transcribe Spanish audio to text
    curl https://api.deeprequest.io/v1/audio/transcriptions \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F file="@spanish_meeting.mp3" \
      -F model="whisper-large-v3" \
      -F language="es" \
      -F response_format="json"

    # Translate audio directly to English
    curl https://api.deeprequest.io/v1/audio/translations \
  -H "Authorization: Bearer YOUR_API_KEY" \
      -F file="@spanish_meeting.mp3" \
      -F model="whisper-large-v3" \
      -F response_format="json"

Streaming Request

    # Transcribe Spanish audio to text with streaming
    curl https://api.deeprequest.io/v1/audio/transcriptions \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -F file="@spanish_meeting.mp3" \
      -F model="whisper-large-v3" \
      -F language="es" \
      -F response_format="json" \
      -F stream=true

    # Translate audio directly to English with streaming
    curl https://api.deeprequest.io/v1/audio/translations \
  -H "Authorization: Bearer YOUR_API_KEY" \
      -F file="@spanish_meeting.mp3" \
      -F model="whisper-large-v3" \
      -F response_format="json" \
  -F stream=true

Response Formats

Below are examples of different response formats available:

Text Format Response

This is a transcription of the audio file that was submitted for processing.

JSON Format Response

{
  "text": "This is a transcription of the audio file that was submitted for processing."
}

Performance Notes

File Size Limits: Maximum audio file size is 25MB.
Duration: Handles audio files up to 4 hours in length.
Processing Speed: Approximately 0.5x real-time (e.g., a 1-minute audio processes in about 30 seconds).
Best Practices: For optimal results, clean audio with minimal background noise is recommended.

Additional Resources

For detailed API documentation, please visit our API Docs or ReDoc.

Pricing details at Pricing.