Voxtral Transcribe 2: Mistral AI's Latest Transcription Model Now Available

February 10, 2026 by Raisetalk 10 min read

AiForGood

Speech To TextMistral AIDiarizationConversational AnalysisMade in France

<b>Voxtral Transcribe 2</b>: Mistral AI's Latest Transcription Model Now Available

Key Takeaways

Voxtral Mini Transcribe V2 from Mistral AI is now available in production on Raisetalk
Native diarization: automatic speaker identification, with no additional step required
Benchmark-leading performance: approximately 4% WER, outperforming GPT-4o mini Transcribe, Gemini 2.5 Flash, and Deepgram Nova
13 natively supported languages, 100% French technology (Mistral AI, Paris)
Ultra-competitive pricing: simply the lowest price available on Raisetalk

New: Voxtral Transcribe 2 Available on Raisetalk

In January, we announced the integration of Mistral AI as the LLM engine for conversational analysis. Today, we are taking a new step forward: Mistral AI enters the Speech-to-Text race with Voxtral Transcribe 2, and we have deployed it in production on Raisetalk.

Voxtral Mini Transcribe V2 is a batch transcription model designed for demanding professional environments. It delivers a highly anticipated feature: native diarization — the ability to automatically identify who is speaking and when, built directly into the transcription process.

Voxtral Mini Transcribe V2: Key Specifications

Feature	Details
Type	Batch transcription
Diarization	✅ Native, built-in
Maximum duration	Up to 30 minutes per recording
Languages	13 languages: FR, EN, ES, DE, IT, NL, PT, ZH, JA, KO, HI, AR, RU
Timestamps	Per segment and word-level
Context biasing	✅ Steers toward domain-specific vocabulary
WER	~4% on FLEURS
Translation	❌ Not available
Pseudonymization	❌ Not available

Quick Glossary

Diarization: the ability to identify and distinguish different speakers in a conversation ("who speaks when")
WER (Word Error Rate): the error rate per word — the lower, the better the transcription. A 4% WER means 96 out of 100 words are correctly transcribed
Context biasing: the ability to steer the model toward domain-specific vocabulary (product names, technical terms) to improve accuracy
Per-segment timestamps: timestamping of each transcription segment, enabling precise synchronization with the audio

Diarization: A Decisive Advantage

In our STT model comparison, we highlighted the importance of diarization for conversational analysis. For Quality Monitoring, you need to know whether the agent or the customer said a given sentence. For Sales Compliance, you need to attribute each commitment to the right speaker. For Voice of Customer, you need to isolate the customer's verbatims.

Until now, only Parakeet and Gemini 2.5 Pro achieved 5/5 in diarization in our comparison — but Parakeet offers neither translation nor pseudonymization, and Gemini 2.5 Pro is the slowest and most expensive model.

Voxtral Mini V2 changes the game: it combines top-tier diarization with the lowest cost on the market. It is a particularly compelling option for organizations processing large volumes of conversations that require reliable speaker identification.

Performance: The Numbers

Voxtral Transcribe 2 delivers impressive results in independent benchmarks:

Criterion	Voxtral Mini V2	Positioning
WER (FLEURS)	~4%	Outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, Deepgram Nova
Max duration	30 minutes	Average
Languages	13	Broad coverage including Asian languages

Sovereignty and "Made in France"

The integration of Voxtral is a continuation of our technological independence strategy. Mistral AI is a French company, founded in Paris in 2023 by former researchers from Meta and Google DeepMind.

For French and European organizations, choosing Voxtral for audio transcription addresses the same concerns as for the LLM:

Digital sovereignty: your audio data is processed by European technology
Regulatory compliance: a provider subject to the GDPR and the AI Act
Local ecosystem: supporting the development of European technology champions

Updated Comparison: 8 STT Models on Raisetalk

In January, we compared 7 STT models. Voxtral now enriches this lineup. Here is the updated table:

#	Model	Region	Durations	Transl.	Pseudo.	Cost	Speed	Quality	Diarization	Drift	Bonus	Total
1	`Voxtral`	EU	< 30 min	❌	❌	5/5	5/5	5/5	5/5	5/5	+8	33
2	`Light`	EU & US	All	✅	✅	4/5	5/5	4/5	3/5	5/5	+20	41
3	`Best`	EU & US	All	✅	✅	2/5	5/5	5/5	3/5	5/5	+20	40
4	`Gemini 2 Flash`	EU & US	< 10 min	✅	✅	5/5	5/5	4/5	4/5	2/5	+16	36
5	`Gemini 2.5 Flash`	EU & US	< 25 min	✅	✅	4/5	4/5	4/5	4/5	3/5	+17	36
6	`Gemini 2.5 Pro`	EU & US	< 45 min	✅	✅	1/5	1/5	5/5	5/5	4/5	+19	35
7	`Parakeet`	EU	All	❌	❌	3/5	5/5	4/5	5/5	5/5	+8	30
8	`Whisper`	EU	All	❌	❌	3/5	5/5	3/5	5/5	5/5	+8	29

Legend: all scores are out of 5, higher is better. The feature bonus rewards multi-region availability, long duration support, translation, and pseudonymization.

Our recommendation: Voxtral is the ideal choice if you are looking for the best combination of quality + diarization + cost, and you do not need pseudonymization or translation. For use cases where pseudonymization is mandatory, stick with Light or Best.

When Should You Choose Voxtral?

Voxtral is particularly well-suited if:

Diarization is important for your analyses (Quality Monitoring, Sales Compliance)
You process large volumes and cost is a deciding factor
You value technological sovereignty (French-made solution)
You do not need pseudonymization or translation built into the STT
Your recordings are moderately long (up to 30 minutes)

Voxtral is not recommended if:

STT-level pseudonymization is mandatory in your context -- use Light or Best instead
You need built-in translation -- use Light, Best, or the Gemini models instead

How to Enable Voxtral on Raisetalk

On Raisetalk, selecting your STT model is done simply at the point of submitting each analysis. You can:

Test Voxtral on a sample of conversations and compare the results with your current models
Mix and match approaches based on your use cases

Our team can also help you identify the optimal configuration based on your volumes, budget, and quality requirements.

Try It Yourself

The best way to judge is to test it firsthand.

Our trial space lets you transcribe your own conversations with the model of your choice: https://app.raisetalk.com/try