Speech Recognition with whisper-large-v3

Transcribe audio effortlessly with whisper-large-v3.

Step 1: Prep Your Audio

Prepare your audio file in a common format like WAV, MP3, or M4A.

Step 2: Send a Transcription Request

import requests

url = "https://api.deeprequest.io/v1/audio/transcriptions"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

files = {
    "file": ("audio.mp3", open("path/to/your/audio.mp3", "rb")),
}

data = {
    "model": "whisper-large-v3",
    "response_format": "json"  # Options: json, text, srt, vtt, verbose_json
}

# Optional: specify language for better accuracy
# data["language"] = "en"

response = requests.post(url, files=files, data=data, headers=headers)
print(response.json()["text"])

Step 3: Translation (Optional)

To translate audio directly to English:

import requests

url = "https://api.deeprequest.io/v1/audio/translations"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

files = {
    "file": ("audio.mp3", open("path/to/your/audio.mp3", "rb")),
}

data = {
    "model": "whisper-large-v3",
    "response_format": "json"
}

response = requests.post(url, files=files, data=data, headers=headers)
print(response.json()["text"])

Pro Tips

For high-quality results, use clear audio with minimal background noise
File size limit is 25MB
For longer audio files, consider chunking them into smaller segments
Set the language parameter for better accuracy when the language is known
Use response_format: "verbose_json" to get timestamps and confidence scores