Stream live microphone audio to Azure Speech-to-Text using decibri and PushAudioInputStream. Fills the missing microphone support in Azure's Node.js SDK.
This integration captures live audio from your microphone using decibri and pushes it into Azure's PushAudioInputStream for real-time cloud transcription. Results come back via event callbacks as you speak, with both partial and final results.
fromDefaultMicrophoneInput() method is browser-only. decibri provides the missing audio capture layer, feeding PCM audio into Azure's PushAudioInputStream.
Choose this when you need Azure's speech models, enterprise Azure integration, or the most generous free tier (5 hours/month). For use cases where audio must stay on your device, see the sherpa-onnx or whisper.cpp local integrations instead.
Azure Speech uses a subscription key and region. Simpler than AWS (IAM) or Google (service account JSON).
australiaeast)Configure credentials using one of these methods:
Option 1: .env file with dotenv
AZURE_SPEECH_KEY=your_subscription_key
AZURE_SPEECH_REGION=australiaeast
Option 2: Environment variables
export AZURE_SPEECH_KEY=your_subscription_key
export AZURE_SPEECH_REGION=australiaeast
The dotenv package loads your credentials from a .env file. If you set environment variables another way, you can skip it.
No model download is required. All processing happens in Azure's cloud.
Import decibri, the Azure Speech SDK, and dotenv. Create a SpeechConfig with your subscription key and region.
'use strict';
require('dotenv').config();
const Decibri = require('decibri');
const sdk = require('microsoft-cognitiveservices-speech-sdk');
const speechConfig = sdk.SpeechConfig.fromSubscription(
process.env.AZURE_SPEECH_KEY,
process.env.AZURE_SPEECH_REGION
);
speechConfig.speechRecognitionLanguage = 'en-US';
Azure's Node.js SDK cannot access the microphone directly. Instead, create a PushAudioInputStream and wire it into a SpeechRecognizer. The default push stream format is 16 kHz, 16-bit, mono PCM, which matches decibri's default output exactly.
const pushStream = sdk.AudioInputStream.createPushStream();
const audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
const recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
Create a decibri instance at 16 kHz mono and push each audio chunk into the push stream.
const mic = new Decibri({ sampleRate: 16000, channels: 1 });
mic.on('data', (chunk) => {
pushStream.write(chunk.buffer.slice(
chunk.byteOffset,
chunk.byteOffset + chunk.byteLength
));
});
pushStream.write() expects an ArrayBuffer, not a Node.js Buffer. Use chunk.buffer.slice(chunk.byteOffset, chunk.byteOffset + chunk.byteLength) to safely extract the underlying ArrayBuffer. This handles cases where Node.js pools Buffers with a shared backing store.
Azure Speech uses event callbacks for results. Unlike other integrations that use async iteration or stream events, Azure fires recognizing and recognized events on the recognizer:
recognizer.recognizing = (s, e) => {
process.stdout.write(`\r [partial] ${e.result.text} `);
};
recognizer.recognized = (s, e) => {
if (e.result.reason === sdk.ResultReason.RecognizedSpeech) {
console.log(`\n [final] ${e.result.text}`);
}
};
recognizer.canceled = (s, e) => {
if (e.reason === sdk.CancellationReason.Error) {
console.error(`Error: ${e.errorDetails}`);
}
};
Use startContinuousRecognitionAsync() for ongoing microphone input. Do not use recognizeOnceAsync(), which stops after a single utterance.
recognizer.startContinuousRecognitionAsync();
Stop recognition, close the recognizer, microphone, and push stream when the user presses Ctrl+C.
process.on('SIGINT', () => {
console.log('\nStopping...');
recognizer.stopContinuousRecognitionAsync(() => {
recognizer.close();
mic.stop();
pushStream.close();
process.exit(0);
});
});
Recognition may process buffered audio after Ctrl+C before the session fully stops. This is normal.
'use strict';
require('dotenv').config();
const Decibri = require('decibri');
const sdk = require('microsoft-cognitiveservices-speech-sdk');
async function main() {
const speechConfig = sdk.SpeechConfig.fromSubscription(
process.env.AZURE_SPEECH_KEY,
process.env.AZURE_SPEECH_REGION
);
speechConfig.speechRecognitionLanguage = 'en-US';
// Create push stream (default format: 16kHz, 16-bit, mono PCM)
const pushStream = sdk.AudioInputStream.createPushStream();
const audioConfig = sdk.AudioConfig.fromStreamInput(pushStream);
const recognizer = new sdk.SpeechRecognizer(speechConfig, audioConfig);
// Open microphone and push audio to Azure
const mic = new Decibri({ sampleRate: 16000, channels: 1 });
mic.on('data', (chunk) => {
pushStream.write(chunk.buffer.slice(
chunk.byteOffset,
chunk.byteOffset + chunk.byteLength
));
});
console.log('Azure Speech-to-Text streaming test');
console.log('Speak into your microphone. Press Ctrl+C to stop.\n');
// Handle results
recognizer.recognizing = (s, e) => {
process.stdout.write(`\r [partial] ${e.result.text} `);
};
recognizer.recognized = (s, e) => {
if (e.result.reason === sdk.ResultReason.RecognizedSpeech) {
console.log(`\n [final] ${e.result.text}`);
}
};
recognizer.canceled = (s, e) => {
if (e.reason === sdk.CancellationReason.Error) {
console.error(`Error: ${e.errorDetails}`);
}
};
recognizer.sessionStopped = (s, e) => {
console.log('\nSession stopped.');
};
// Start continuous recognition
recognizer.startContinuousRecognitionAsync(
() => console.log('Recognition started.\n'),
(err) => console.error('Error starting recognition:', err)
);
// Clean shutdown
process.on('SIGINT', () => {
console.log('\nStopping...');
recognizer.stopContinuousRecognitionAsync(() => {
recognizer.close();
mic.stop();
pushStream.close();
process.exit(0);
});
});
}
main().catch(console.error);
The SpeechConfig controls how Azure processes your audio. Here are the most useful options:
| Option | Default | Description |
|---|---|---|
speechRecognitionLanguage |
'en-US' |
BCP-47 language code (e.g. 'fr-FR', 'de-DE', 'ja-JP'). Supports 100+ languages. |
outputFormat |
Simple |
Set to sdk.OutputFormat.Detailed for confidence scores and alternatives. |
enableDictation |
false |
Enable dictation mode for longer speech with automatic punctuation. |
See the Azure Speech-to-Text documentation for the complete list of options.