Deepgram is a speech-to-text API. In Genesis it is used for inbound audio/voice-note transcription through tools.media.audio and for Voice Call streaming STT through plugins.entries.voice-call.config.streaming.

For batch transcription, Genesis uploads the complete audio file to Deepgram and injects the transcript into the reply pipeline ({{Transcript}} + [Audio] block). For Voice Call streaming, Genesis forwards live G.711 u-law frames over Deepgram's WebSocket listen endpoint and emits partial or final transcripts as Deepgram returns them.

Detail Value
Website deepgram.com
Docs developers.deepgram.com
Auth DEEPGRAM_API_KEY
Default model nova-3

Getting started

Set your API key

Add your Deepgram API key to the environment:

```
DEEPGRAM_API_KEY=dg_...
```

Enable the audio provider

```json5
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [{ provider: "deepgram", model: "nova-3" }],
      },
    },
  },
}
```

Send a voice note

Send an audio message through any connected channel. Genesis transcribes it
via Deepgram and injects the transcript into the reply pipeline.

Configuration options

Option Path Description
model tools.media.audio.models[].model Deepgram model id (default: nova-3)
language tools.media.audio.models[].language Language hint (optional)
detect_language tools.media.audio.providerOptions.deepgram.detect_language Enable language detection (optional)
punctuate tools.media.audio.providerOptions.deepgram.punctuate Enable punctuation (optional)
smart_format tools.media.audio.providerOptions.deepgram.smart_format Enable smart formatting (optional)

With language hint

```json5
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [{ provider: "deepgram", model: "nova-3", language: "en" }],
      },
    },
  },
}
```

With Deepgram options

```json5
{
  tools: {
    media: {
      audio: {
        enabled: true,
        providerOptions: {
          deepgram: {
            detect_language: true,
            punctuate: true,
            smart_format: true,
          },
        },
        models: [{ provider: "deepgram", model: "nova-3" }],
      },
    },
  },
}
```

Voice Call streaming STT

The bundled deepgram plugin also registers a realtime transcription provider for the Voice Call plugin.

Setting Config path Default
API key plugins.entries.voice-call.config.streaming.providers.deepgram.apiKey Falls back to DEEPGRAM_API_KEY
Model ...deepgram.model nova-3
Language ...deepgram.language (unset)
Encoding ...deepgram.encoding mulaw
Sample rate ...deepgram.sampleRate 8000
Endpointing ...deepgram.endpointingMs 800
Interim results ...deepgram.interimResults true
{
  plugins: {
    entries: {
      "voice-call": {
        config: {
          streaming: {
            enabled: true,
            provider: "deepgram",
            providers: {
              deepgram: {
                apiKey: "${DEEPGRAM_API_KEY}",
                model: "nova-3",
                endpointingMs: 800,
                language: "en-US",
              },
            },
          },
        },
      },
    },
  },
}
Voice Call receives telephony audio as 8 kHz G.711 u-law. The Deepgram streaming provider defaults to `encoding: "mulaw"` and `sampleRate: 8000`, so Twilio media frames can be forwarded directly.

Notes

Authentication

Authentication follows the standard provider auth order. `DEEPGRAM_API_KEY` is
the simplest path.

Proxy and custom endpoints

Override endpoints or headers with `tools.media.audio.baseUrl` and
`tools.media.audio.headers` when using a proxy.

Output behavior

Output follows the same audio rules as other providers (size caps, timeouts,
transcript injection).

Related

  • Media tools Audio, image, and video processing pipeline overview.

  • Configuration Full config reference including media tool settings.

  • Troubleshooting Common issues and debugging steps.

  • FAQ Frequently asked questions about Genesis setup.