Stream live microphone audio to Mistral's Voxtral model for real-time cloud transcription using decibri and the official Mistral AI SDK.
This integration captures live audio from your microphone using decibri and streams it to Mistral's Voxtral Realtime API via the official SDK. Transcription text arrives progressively as you speak. No model download, no local inference, and no audio format conversion. decibri's default output matches Voxtral's expected input exactly.
Voxtral is an open-weights model (Apache 2.0) built by Mistral, a European AI company. It supports 13 languages with automatic detection. Choose this when you want a cloud STT with open-weights transparency, or when you plan to self-host via vLLM. For use cases where audio must stay on your device, see the sherpa-onnx or whisper.cpp local integrations instead.
import syntax, not require(). Add "type": "module" to your package.json, or use a .mjs file extension. This is different from the other decibri integration examples which use CommonJS. For environment variables, use import 'dotenv/config' instead of require('dotenv').config().
.env file in your project root:MISTRAL_API_KEY=your_key_here
The dotenv package loads your API key from the .env file. If you set environment variables another way, you can omit dotenv from the install command.
No model download is required. All processing happens in Mistral's cloud.
Import decibri, the Mistral Realtime SDK, and dotenv. decibri is a CommonJS package; the default import works in ESM via Node.js interop.
import 'dotenv/config';
import Decibri from 'decibri';
import { RealtimeTranscription, AudioEncoding } from '@mistralai/mistralai/extra/realtime';
const API_KEY = process.env.MISTRAL_API_KEY;
const MODEL = 'voxtral-mini-transcribe-realtime-2602';
const audioFormat = {
encoding: AudioEncoding.PcmS16le,
sampleRate: 16000,
};
The audio format matches decibri's default output exactly: 16-bit signed integer PCM, little-endian, 16 kHz mono. No configuration needed on the decibri side.
The Mistral SDK expects an AsyncGenerator<Uint8Array> as its audio input. Wrap decibri's Readable stream as an async generator that yields each chunk as a Uint8Array.
async function* createAudioStream() {
const mic = new Decibri({ sampleRate: 16000, channels: 1 });
console.log('Listening... Speak into your microphone. (Ctrl+C to stop)');
try {
for await (const chunk of mic) {
yield new Uint8Array(chunk);
}
} finally {
mic.stop();
}
}
The for await...of loop consumes decibri as an async iterable (all Node.js Readable streams support this). The finally block ensures the microphone is stopped when the generator is closed.
Create a RealtimeTranscription client and call transcribeStream() with the audio generator, model, and format. The SDK handles the WebSocket connection internally.
const client = new RealtimeTranscription({ apiKey: API_KEY });
const audioStream = createAudioStream();
for await (const event of client.transcribeStream(
audioStream,
MODEL,
{ audioFormat }
)) {
// handle events
}
The SDK emits three event types. Handle progressive text deltas, completion, and errors.
if (event.type === 'transcription.text.delta') {
process.stdout.write(event.text);
} else if (event.type === 'transcription.done') {
process.stdout.write('\n');
break;
} else if (event.type === 'error') {
const msg = typeof event.error.message === 'string'
? event.error.message
: JSON.stringify(event.error.message);
console.error('\nError:', msg);
break;
}
transcription.text.delta events contain partial transcription text that arrives word-by-word as you speak. transcription.done signals the end of a transcription segment.
Wrap the transcription loop in a try/finally block. Calling audioStream.return() triggers the generator's finally block, which stops the microphone. Ctrl+C interrupts the async iterator and flows through the same cleanup path.
try {
for await (const event of client.transcribeStream(audioStream, MODEL, { audioFormat })) {
// ... handle events
}
} finally {
await audioStream.return?.();
}
import 'dotenv/config';
import Decibri from 'decibri';
import { RealtimeTranscription, AudioEncoding } from '@mistralai/mistralai/extra/realtime';
const API_KEY = process.env.MISTRAL_API_KEY;
const MODEL = 'voxtral-mini-transcribe-realtime-2602';
const audioFormat = {
encoding: AudioEncoding.PcmS16le,
sampleRate: 16000,
};
async function* createAudioStream() {
const mic = new Decibri({ sampleRate: 16000, channels: 1 });
console.log('Listening... Speak into your microphone. (Ctrl+C to stop)');
try {
for await (const chunk of mic) {
yield new Uint8Array(chunk);
}
} finally {
mic.stop();
}
}
const client = new RealtimeTranscription({ apiKey: API_KEY });
const audioStream = createAudioStream();
try {
for await (const event of client.transcribeStream(
audioStream,
MODEL,
{ audioFormat }
)) {
if (event.type === 'transcription.text.delta') {
process.stdout.write(event.text);
} else if (event.type === 'transcription.done') {
process.stdout.write('\n');
break;
} else if (event.type === 'error') {
const msg = typeof event.error.message === 'string'
? event.error.message
: JSON.stringify(event.error.message);
console.error('\nError:', msg);
break;
}
}
} finally {
await audioStream.return?.();
}
Save this as a .mjs file (e.g. transcribe.mjs) or add "type": "module" to your package.json, then run with node transcribe.mjs.
Options passed to transcribeStream() and the RealtimeTranscription client.
| Option | Default | Description |
|---|---|---|
model |
'voxtral-mini-transcribe-realtime-2602' |
Realtime transcription model. Currently the only realtime-capable model. |
encoding |
AudioEncoding.PcmS16le |
Audio encoding. PCM 16-bit signed little-endian matches decibri's default. |
sampleRate |
16000 |
Sample rate in Hz. 16 kHz matches decibri's default. |
targetStreamingDelayMs |
none | Optional. Milliseconds to wait before starting transcription to gather context. 480 ms is a good balance between latency and accuracy. Range: 240–2400 ms. |
serverURL |
'wss://api.mistral.ai' |
Optional. WebSocket endpoint. Override for self-hosted deployments via vLLM. |
Check Mistral's model documentation for the latest model version and available options.