Accuracy Benchmarks
Salad Transcription API delivers industry-leading accuracy across a wide range of languages and public benchmark datasets. Below is a breakdown of results by language and dataset.Languages with Accuracy ≥ 90%
- English
- Portuguese
- French
- Spanish
- German
- Italian
- Russian
Languages with Accuracy between 80%–89%
- Hindi
- Hebrew
Languages with Accuracy < 80%
- Urdu
- Kazakh
- Thai (in progress)
English
Dataset | Sub-Dataset | Accuracy (Full) | WER (Full) | Accuracy (Lite) | WER (Lite) | Source |
---|---|---|---|---|---|---|
TED-LIUM | tedlium | 95.8% | 4.2% | 91.8% | 8.2% | TED-LIUM on Hugging Face |
Meanwhile | Meanwhile | 95.7% | 4.3% | 83.3% | 16.7% | Meanwhile on Hugging Face |
CommonVoice | cv-corpus-5.1-2020-06-22 | 95.1% | 4.9% | 81.3% | 18.7% | Common Voice |
CommonVoice | cv-corpus-20.0-delta-2024-12-06 | 93.1% | 6.9% | 78.1% | 21.9% | Common Voice |
Portuguese
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | cv-corpus-8.0-2022-01-19 | 92.0% | 8.0% | Common Voice |
French
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | cv-corpus-10.0-delta-2022-07-04 | 92.0% | 8.0% | Common Voice |
Spanish
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | cv-corpus-12.0-delta-2022-12-07 | 94.0% | 6.0% | Common Voice |
CommonVoice | cv-corpus-14.0-delta-2023-06-23 | 96.8% | 3.2% | Common Voice |
CommonVoice | cv-corpus-16.1-delta-2023-12-06 | 96.8% | 4.3% | Common Voice |
German
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | cv-corpus-13.0-delta-2023-03-09 | 96.3% | 3.7% | Common Voice |
Hindi
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | cv-corpus-20.0-2024-12-06 | 84.0% | 16.0% | Common Voice |
Italian
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | — | 93.3% | 6.7% | Common Voice |
Russian
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | — | 96.4% | 3.6% | Common Voice |
Hebrew
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | cv-corpus-17.0-2024-03-15 | 84.2% | 15.8% | Common Voice |
Kazakh
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | cv-corpus-19.0-2024-09-13 | 51.0% | 49.0% | Common Voice |
Urdu
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | cv-corpus-9.0-2022-04-27 | 78.8% | 21.2% | Common Voice |
Thai (in progress)
Dataset | Sub-Dataset | Accuracy | WER | Source |
---|---|---|---|---|
CommonVoice | cv-corpus-10.0-delta-2022-07-04 | 33.0% | 67.0%* | Common Voice |
Thai WER may need a recalculation due to formatting issues.
Methodology
To ensure fair and repeatable accuracy evaluation, we adopted a benchmarking methodology similar to AssemblyAI:- Public datasets were used for transparency and reproducibility
- Transcripts were normalized using Whisper Normalizer
- Accuracy was calculated using Word Error Rate (WER) via the JiWER library