Key Takeaways

  • Voxtral Mini Transcribe V2 from Mistral AI is now available in production on Raisetalk
  • Native diarization: automatic speaker identification, with no additional step required
  • Benchmark-leading performance: approximately 4% WER, outperforming GPT-4o mini Transcribe, Gemini 2.5 Flash, and Deepgram Nova
  • 13 natively supported languages, 100% French technology (Mistral AI, Paris)
  • Ultra-competitive pricing: simply the lowest price available on Raisetalk

New: Voxtral Transcribe 2 Available on Raisetalk

In January, we announced the integration of Mistral AI as the LLM engine for conversational analysis. Today, we are taking a new step forward: Mistral AI enters the Speech-to-Text race with Voxtral Transcribe 2, and we have deployed it in production on Raisetalk.

Voxtral Mini Transcribe V2 is a batch transcription model designed for demanding professional environments. It delivers a highly anticipated feature: native diarization — the ability to automatically identify who is speaking and when, built directly into the transcription process.

Voxtral Mini Transcribe V2: Key Specifications

FeatureDetails
TypeBatch transcription
Diarization✅ Native, built-in
Maximum durationUp to 30 minutes per recording
Languages13 languages: FR, EN, ES, DE, IT, NL, PT, ZH, JA, KO, HI, AR, RU
TimestampsPer segment and word-level
Context biasing✅ Steers toward domain-specific vocabulary
WER~4% on FLEURS
Translation❌ Not available
Pseudonymization❌ Not available

Quick Glossary

  • Diarization: the ability to identify and distinguish different speakers in a conversation ("who speaks when")
  • WER (Word Error Rate): the error rate per word — the lower, the better the transcription. A 4% WER means 96 out of 100 words are correctly transcribed
  • Context biasing: the ability to steer the model toward domain-specific vocabulary (product names, technical terms) to improve accuracy
  • Per-segment timestamps: timestamping of each transcription segment, enabling precise synchronization with the audio

Diarization: A Decisive Advantage

In our STT model comparison, we highlighted the importance of diarization for conversational analysis. For Quality Monitoring, you need to know whether the agent or the customer said a given sentence. For Sales Compliance, you need to attribute each commitment to the right speaker. For Voice of Customer, you need to isolate the customer's verbatims.

Until now, only Parakeet and Gemini 2.5 Pro achieved 5/5 in diarization in our comparison — but Parakeet offers neither translation nor pseudonymization, and Gemini 2.5 Pro is the slowest and most expensive model.

Voxtral Mini V2 changes the game: it combines top-tier diarization with the lowest cost on the market. It is a particularly compelling option for organizations processing large volumes of conversations that require reliable speaker identification.

Performance: The Numbers

Voxtral Transcribe 2 delivers impressive results in independent benchmarks:

CriterionVoxtral Mini V2Positioning
WER (FLEURS)~4%Outperforms GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, Deepgram Nova
Max duration30 minutesAverage
Languages13Broad coverage including Asian languages

Sovereignty and "Made in France"

The integration of Voxtral is a continuation of our technological independence strategy. Mistral AI is a French company, founded in Paris in 2023 by former researchers from Meta and Google DeepMind.

For French and European organizations, choosing Voxtral for audio transcription addresses the same concerns as for the LLM:

  • Digital sovereignty: your audio data is processed by European technology
  • Regulatory compliance: a provider subject to the GDPR and the AI Act
  • Local ecosystem: supporting the development of European technology champions

Updated Comparison: 8 STT Models on Raisetalk

In January, we compared 7 STT models. Voxtral now enriches this lineup. Here is the updated table:

#ModelRegionDurationsTransl.Pseudo.CostSpeedQualityDiarizationDriftBonusTotal
1VoxtralEU< 30 min5/55/55/55/55/5+833
2LightEU & USAll4/55/54/53/55/5+2041
3BestEU & USAll2/55/55/53/55/5+2040
4Gemini 2 FlashEU & US< 10 min5/55/54/54/52/5+1636
5Gemini 2.5 FlashEU & US< 25 min4/54/54/54/53/5+1736
6Gemini 2.5 ProEU & US< 45 min1/51/55/55/54/5+1935
7ParakeetEUAll3/55/54/55/55/5+830
8WhisperEUAll3/55/53/55/55/5+829

Legend: all scores are out of 5, higher is better. The feature bonus rewards multi-region availability, long duration support, translation, and pseudonymization.

Our recommendation: Voxtral is the ideal choice if you are looking for the best combination of quality + diarization + cost, and you do not need pseudonymization or translation. For use cases where pseudonymization is mandatory, stick with Light or Best.

When Should You Choose Voxtral?

Voxtral is particularly well-suited if:

  • Diarization is important for your analyses (Quality Monitoring, Sales Compliance)
  • You process large volumes and cost is a deciding factor
  • You value technological sovereignty (French-made solution)
  • You do not need pseudonymization or translation built into the STT
  • Your recordings are moderately long (up to 30 minutes)

Voxtral is not recommended if:

  • STT-level pseudonymization is mandatory in your context -- use Light or Best instead
  • You need built-in translation -- use Light, Best, or the Gemini models instead

How to Enable Voxtral on Raisetalk

On Raisetalk, selecting your STT model is done simply at the point of submitting each analysis. You can:

  • Test Voxtral on a sample of conversations and compare the results with your current models
  • Mix and match approaches based on your use cases

Our team can also help you identify the optimal configuration based on your volumes, budget, and quality requirements.

Try It Yourself

The best way to judge is to test it firsthand.

Our trial space lets you transcribe your own conversations with the model of your choice: https://app.raisetalk.com/try