Skip to content

Speech Recognition with whisper-large-v3

Transcribe audio effortlessly with whisper-large-v3.

Step 1: Prep Your Audio

Prepare your audio file in a common format like WAV, MP3, or M4A.

Step 2: Send a Transcription Request

import requests
url = "https://api.deeprequest.io/v1/audio/transcriptions"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
files = {
"file": ("audio.mp3", open("path/to/your/audio.mp3", "rb")),
}
data = {
"model": "whisper-large-v3",
"response_format": "json" # Options: json, text, srt, vtt, verbose_json
}
# Optional: specify language for better accuracy
# data["language"] = "en"
response = requests.post(url, files=files, data=data, headers=headers)
print(response.json()["text"])

Step 3: Translation (Optional)

To translate audio directly to English:

import requests
url = "https://api.deeprequest.io/v1/audio/translations"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
files = {
"file": ("audio.mp3", open("path/to/your/audio.mp3", "rb")),
}
data = {
"model": "whisper-large-v3",
"response_format": "json"
}
response = requests.post(url, files=files, data=data, headers=headers)
print(response.json()["text"])

Pro Tips

  • For high-quality results, use clear audio with minimal background noise
  • File size limit is 25MB
  • For longer audio files, consider chunking them into smaller segments
  • Set the language parameter for better accuracy when the language is known
  • Use response_format: "verbose_json" to get timestamps and confidence scores