decibri captures audio. These integrations process it. Use the table below to find the right one for your use case.
| Use case | Integration | Latency | Cost | Offline |
|---|---|---|---|---|
| Real-time local STT | Sherpa-ONNX | Low | Free | Yes |
| High-accuracy local STT | Whisper.cpp | Medium | Free | Yes |
| Wake word detection | Sherpa-ONNX KWS | Low | Free | Yes |
| Voice activity detection | Sherpa-ONNX VAD | Low | Free | Yes |
| Real-time cloud STT | Deepgram | Low | Pay-per-use (free tier) | No |
| Real-time cloud STT | AssemblyAI | Low | Pay-per-use | No |
| Real-time cloud STT | OpenAI Realtime | Low | Pay-per-use | No |
Local integrations (Sherpa-ONNX, Whisper.cpp) run entirely on-device. No API key, no network, no usage fees. Audio never leaves the machine. Trade-off: you supply the compute and manage the model files.
Cloud integrations (Deepgram, AssemblyAI, OpenAI) stream audio to an external API. Higher accuracy on some benchmarks, no local GPU required, and managed model updates. Trade-off: requires an API key, network connectivity, and incurs per-use costs.
Real-time local transcription
Detect spoken keywords with sherpa-onnx
Detect speech vs silence with Silero VAD