Deepgram Real-Time Transcription

Stream live microphone audio to Deepgram for real-time cloud transcription using decibri and the official Deepgram SDK.

What this does

This integration captures live audio from your microphone using decibri and streams it to Deepgram's cloud API over a WebSocket. Transcription results return in real-time as you speak. There is no model download, no local inference, and no format conversion required.

Choose this when you need the highest accuracy, support for 30+ languages, or features like speaker diarization and smart formatting without managing local models. For use cases where audio must stay on your device, see the sherpa-onnx or whisper.cpp local integrations instead.

Cloud vs local

Note: Deepgram is a cloud service. Audio is sent to Deepgram's servers for processing. Deepgram does not store audio by default. If your use case requires audio to stay entirely on-device, use the local integrations: sherpa-onnx (real-time streaming) or whisper.cpp (batch transcription).

Prerequisites

Get an API key

  1. Sign up at console.deepgram.com (free tier includes $200 in credits, no credit card required)
  2. Create an API key from the dashboard
  3. Store it in a .env file in your project root:
DEEPGRAM_API_KEY=your_key_here

Install packages

$ npm install decibri @deepgram/sdk dotenv

The dotenv package loads your API key from the .env file. If you set environment variables another way, you can skip it.

No model download is required. All processing happens in Deepgram's cloud.

Code walkthrough

1. Configuration

Import decibri, the Deepgram SDK, and dotenv. Create a Deepgram client with your API key.

require('dotenv').config();

const Decibri = require('decibri');
const { DeepgramClient } = require('@deepgram/sdk');

const deepgram = new DeepgramClient({ apiKey: process.env.DEEPGRAM_API_KEY });

2. Connect to Deepgram

Create a WebSocket connection with audio parameters that match decibri's configuration. The encoding, sample_rate, and channels must match what decibri outputs.

const socket = await deepgram.listen.v1.createConnection({
  model: 'nova-3',
  language: 'en',
  encoding: 'linear16',
  sample_rate: 16000,
  channels: 1,
  punctuate: true,
  smart_format: true,
});

socket.connect();
await socket.waitForOpen();

Audio must only be sent after waitForOpen() resolves. Sending before the WebSocket is open will drop chunks silently.

3. Open the microphone

Create a decibri instance at 16 kHz mono. The default format is 16-bit signed integer PCM, which matches Deepgram's linear16 encoding directly.

const mic = new Decibri({ sampleRate: 16000, channels: 1 });

4. Stream audio

Send each audio chunk directly to Deepgram. No format conversion is needed. decibri's raw Int16 PCM Buffer is sent as-is via sendMedia().

mic.on('data', (chunk) => {
  socket.sendMedia(chunk);
});

5. Handle transcription results

Listen for message events on the socket. Results arrive with data.type === 'Results' and contain one or more transcript alternatives.

socket.on('message', (data) => {
  if (data.type === 'Results' && data.channel?.alternatives?.[0]) {
    const transcript = data.channel.alternatives[0].transcript;
    if (transcript) {
      console.log(transcript);
    }
  }
});

6. Clean shutdown

Stop the microphone and close the Deepgram connection when the user presses Ctrl+C.

socket.on('close', () => {
  console.log('Connection closed.');
  process.exit(0);
});

socket.on('error', (err) => {
  console.error('Deepgram error:', err);
});

mic.on('error', (err) => {
  console.error('Mic error:', err.message);
});

process.on('SIGINT', () => {
  mic.stop();
  socket.requestClose();
});

console.log('Listening... (Ctrl+C to stop)');

Full example

View complete code
'use strict';
require('dotenv').config();

const Decibri = require('decibri');
const { DeepgramClient } = require('@deepgram/sdk');

const live = async () => {
  const deepgram = new DeepgramClient({ apiKey: process.env.DEEPGRAM_API_KEY });

  const socket = await deepgram.listen.v1.createConnection({
    model: 'nova-3',
    language: 'en',
    encoding: 'linear16',
    sample_rate: 16000,
    channels: 1,
    punctuate: true,
    smart_format: true,
  });

  socket.on('message', (data) => {
    if (data.type === 'Results' && data.channel?.alternatives?.[0]) {
      const transcript = data.channel.alternatives[0].transcript;
      if (transcript) {
        console.log(transcript);
      }
    }
  });

  socket.on('close', () => {
    console.log('Connection closed.');
    process.exit(0);
  });

  socket.on('error', (err) => {
    console.error('Deepgram error:', err);
  });

  socket.connect();
  await socket.waitForOpen();

  const mic = new Decibri({ sampleRate: 16000, channels: 1 });

  mic.on('data', (chunk) => {
    socket.sendMedia(chunk);
  });

  mic.on('error', (err) => {
    console.error('Mic error:', err.message);
  });

  process.on('SIGINT', () => {
    console.log('\nStopping...');
    mic.stop();
    socket.requestClose();
  });

  console.log('Listening... (Ctrl+C to stop)\n');
};

live().catch(console.error);

Configuration options

The connection options control how Deepgram processes your audio. Here are the most useful ones:

Option Default Description
model 'nova-3' Transcription model. Nova-3 is the latest and most accurate.
language 'en' Language code. Use 'multi' for automatic language detection.
punctuate false Add punctuation to transcripts.
smart_format false Apply formatting for numerals, currency, and dates.
diarize false Identify different speakers in the audio.
interim_results false Return progressive results that refine as more audio is processed.
endpointing 10 Milliseconds of silence before a final result is triggered.

See the Deepgram streaming API reference for the complete list of options.