Sherpa-ONNX Keyword Spotting

Detect specific spoken keywords and wake phrases in real time using decibri and sherpa-onnx. Runs entirely offline with no API key, no cloud service, and no network dependency.

What this does

This integration captures live microphone audio using decibri and feeds it to a sherpa-onnx keyword spotting (KWS) engine. When a user speaks one of your defined keywords or phrases, the engine detects it and reports which phrase was matched.

Choose this for wake word detection, voice command triggers, or any scenario where you need to listen for specific phrases without transcribing everything.

Prerequisites

Install packages

$ npm install decibri sherpa-onnx

Download a keyword spotting model

Download a KWS model from the sherpa-onnx releases. For example, the Zipformer transducer KWS model:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2
tar xvf sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2

This creates a directory with the model files: encoder.onnx, decoder.onnx, joiner.onnx, tokens.txt, and bpe.model.

Code walkthrough

1. Configuration

Define your model paths and the keywords you want to detect. Keywords are encoded as BPE token sequences using the model's tokens.txt file.

const Decibri = require('decibri');
const sherpa = require('sherpa-onnx');

const modelDir = './sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01';

const config = {
  featConfig: { sampleRate: 16000, featureDim: 80 },
  modelConfig: {
    transducer: {
      encoder: `${modelDir}/encoder-epoch-12-avg-2-chunk-16-left-64.onnx`,
      decoder: `${modelDir}/decoder-epoch-12-avg-2-chunk-16-left-64.onnx`,
      joiner: `${modelDir}/joiner-epoch-12-avg-2-chunk-16-left-64.onnx`,
    },
    tokens: `${modelDir}/tokens.txt`,
    numThreads: 2,
    provider: 'cpu',
  },
  keywordsFile: `${modelDir}/keywords.txt`,
};

Keywords file format: Each line contains a keyword phrase followed by its BPE-encoded token IDs and an optional detection threshold. See the sherpa-onnx KWS documentation for the exact format.

2. Create the KWS engine

Instantiate the keyword spotter and create a detection stream.

const kws = new sherpa.KeywordSpotter(config);
const stream = kws.createStream();

3. Open the microphone

Create a decibri instance at 16 kHz mono to match the model's expected input.

const mic = new Decibri({ sampleRate: 16000, channels: 1 });

4. Process audio and detect keywords

Convert each incoming Int16 buffer to Float32, feed it to the KWS engine, and check for keyword detections.

mic.on('data', (chunk) => {
  // Convert Int16 PCM to Float32
  const int16 = new Int16Array(chunk.buffer, chunk.byteOffset, chunk.length / 2);
  const float32 = new Float32Array(int16.length);
  for (let i = 0; i < int16.length; i++) {
    float32[i] = int16[i] / 32768;
  }

  // Feed audio to the KWS engine
  stream.acceptWaveform(16000, float32);
  while (kws.isReady(stream)) {
    kws.decode(stream);
  }

  // Check for keyword detections
  const keyword = kws.getResult(stream).keyword;
  if (keyword) {
    console.log(`Detected: "${keyword}"`);
  }
});

5. Clean shutdown

Stop the microphone and free resources when the user presses Ctrl+C.

process.on('SIGINT', () => {
  mic.stop();
  stream.free();
  kws.free();
  process.exit(0);
});

console.log('Listening for keywords... (Ctrl+C to stop)');

Full example

View complete code

const Decibri = require('decibri');
const sherpa = require('sherpa-onnx');

const modelDir = './sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01';

const config = {
  featConfig: { sampleRate: 16000, featureDim: 80 },
  modelConfig: {
    transducer: {
      encoder: `${modelDir}/encoder-epoch-12-avg-2-chunk-16-left-64.onnx`,
      decoder: `${modelDir}/decoder-epoch-12-avg-2-chunk-16-left-64.onnx`,
      joiner: `${modelDir}/joiner-epoch-12-avg-2-chunk-16-left-64.onnx`,
    },
    tokens: `${modelDir}/tokens.txt`,
    numThreads: 2,
    provider: 'cpu',
  },
  keywordsFile: `${modelDir}/keywords.txt`,
};

const kws = new sherpa.KeywordSpotter(config);
const stream = kws.createStream();
const mic = new Decibri({ sampleRate: 16000, channels: 1 });

mic.on('data', (chunk) => {
  const int16 = new Int16Array(chunk.buffer, chunk.byteOffset, chunk.length / 2);
  const float32 = new Float32Array(int16.length);
  for (let i = 0; i < int16.length; i++) {
    float32[i] = int16[i] / 32768;
  }

  stream.acceptWaveform(16000, float32);
  while (kws.isReady(stream)) {
    kws.decode(stream);
  }

  const keyword = kws.getResult(stream).keyword;
  if (keyword) {
    console.log(`Detected: "${keyword}"`);
  }
});

process.on('SIGINT', () => {
  mic.stop();
  stream.free();
  kws.free();
  process.exit(0);
});

console.log('Listening for keywords... (Ctrl+C to stop)');

Sherpa-ONNX Keyword Spotting

What this does

Prerequisites

Install packages

Download a keyword spotting model

Code walkthrough

1. Configuration

2. Create the KWS engine

3. Open the microphone

4. Process audio and detect keywords

5. Clean shutdown

Full example

Related