AssemblyAI Real-Time Transcription

Stream live microphone audio to AssemblyAI for real-time cloud transcription using decibri and the official AssemblyAI SDK.

What this does

This integration captures live audio from your microphone using decibri and streams it to AssemblyAI's cloud API over a WebSocket. Transcription results return in real-time using a turn-based model, where speech is grouped into natural segments with partial and final results for each turn. There is no model download, no local inference, and no format conversion required.

Choose this when you need turn-based transcription with speech understanding features, keyterm prompting support, or EU data residency. For a free-tier cloud option, see Deepgram. For use cases where audio must stay on your device, see the sherpa-onnx or whisper.cpp local integrations.

Cloud vs local

Note: AssemblyAI is a cloud service. Audio is sent to AssemblyAI's servers for processing. An EU endpoint is available at streaming.eu.assemblyai.com for data residency requirements. If your use case requires audio to stay entirely on-device, use the local integrations: sherpa-onnx (real-time streaming) or whisper.cpp (batch transcription).

Prerequisites

Get an API key

Sign up at assemblyai.com
Upgrade your account (Settings > Billing > add a payment method). Streaming is only available on upgraded accounts.
Copy your API key from the dashboard
Store it in a .env file in your project root:

ASSEMBLYAI_API_KEY=your_key_here

Install packages

$ npm install decibri assemblyai dotenv

The dotenv package loads your API key from the .env file. If you set environment variables another way, you can skip it.

No model download is required. All processing happens in AssemblyAI's cloud.

Code walkthrough

1. Configuration

Import decibri, the AssemblyAI SDK, and dotenv. Create a client with your API key.

require('dotenv').config();

const Decibri = require('decibri');
const { AssemblyAI } = require('assemblyai');

const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });

2. Create streaming transcriber

Create a streaming transcriber with audio parameters that match decibri's configuration. The speechModel option is required and has no default. Omitting it will cause the connection to fail.

const transcriber = client.streaming.transcriber({
  speechModel: 'u3-rt-pro',
  sampleRate: 16_000,
});

3. Register event handlers

transcriber.on('open', ({ id }) => {
  console.log('Session:', id);
});

transcriber.on('turn', (turn) => {
  if (turn.transcript) {
    console.log(turn.transcript);
  }
});

transcriber.on('error', (err) => {
  console.error('AssemblyAI error:', err);
});

transcriber.on('close', (code, reason) => {
  console.log('Connection closed:', code, reason);
});

4. Connect and open microphone

Connect to AssemblyAI, then start the microphone. Audio must only be sent after connect() resolves.

await transcriber.connect();

const mic = new Decibri({ sampleRate: 16000, channels: 1 });

5. Stream audio

Send each audio chunk directly to AssemblyAI. No format conversion is needed. decibri's raw Int16 PCM Buffer is sent as-is via sendAudio().

mic.on('data', (chunk) => {
  transcriber.sendAudio(chunk);
});

6. Understanding turn-based results

AssemblyAI groups speech into turns, which are natural segments of speech separated by pauses. Each turn emits multiple events as audio is processed:

turn.end_of_turn === false means the result is partial and still being refined
turn.end_of_turn === true means the turn is complete with a final transcript
turn.turn_order increments with each new turn (starting from 0)
turn.utterance is empty for partial results and contains the final text for completed turns

To show only final results, filter on end_of_turn:

transcriber.on('turn', (turn) => {
  if (turn.end_of_turn && turn.transcript) {
    console.log(turn.transcript);
  }
});

7. Clean shutdown

Stop the microphone and close the AssemblyAI connection when the user presses Ctrl+C.

process.on('SIGINT', async () => {
  console.log('\nStopping...');
  mic.stop();
  await transcriber.close();
  process.exit(0);
});

Full example

View complete code

'use strict';
require('dotenv').config();

const Decibri = require('decibri');
const { AssemblyAI } = require('assemblyai');

const run = async () => {
  const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });

  const transcriber = client.streaming.transcriber({
    speechModel: 'u3-rt-pro',
    sampleRate: 16_000,
  });

  transcriber.on('open', ({ id }) => {
    console.log('AssemblyAI connected. Session:', id);
  });

  transcriber.on('turn', (turn) => {
    if (turn.end_of_turn && turn.transcript) {
      console.log(turn.transcript);
    }
  });

  transcriber.on('error', (err) => {
    console.error('AssemblyAI error:', err);
  });

  transcriber.on('close', (code, reason) => {
    console.log('Connection closed:', code, reason);
  });

  await transcriber.connect();

  const mic = new Decibri({ sampleRate: 16000, channels: 1 });

  mic.on('data', (chunk) => {
    transcriber.sendAudio(chunk);
  });

  mic.on('error', (err) => {
    console.error('Mic error:', err.message);
  });

  process.on('SIGINT', async () => {
    console.log('\nStopping...');
    mic.stop();
    await transcriber.close();
    process.exit(0);
  });

  console.log('Listening... (Ctrl+C to stop)\n');
};

run().catch(console.error);

Configuration options

The transcriber options control how AssemblyAI processes your audio. Here are the key ones:

Option	Value	Description
`speechModel`	`'u3-rt-pro'`	Required. Universal-3 Pro Streaming model for highest accuracy.
`sampleRate`	`16000`	Must match decibri's sample rate.

Additional options such as keyterm prompting and speaker diarization are available. See the AssemblyAI streaming documentation for the complete list.