Genesis ships a bundled xai provider plugin for Grok models.

Getting started

Create an API key

Create an API key in the [xAI console](https://console.x.ai/).

Set your API key

Set `XAI_API_KEY`, or run:

```bash
genesis onboard --auth-choice xai-api-key
```

Pick a model

```json5
{
  agents: { defaults: { model: { primary: "xai/grok-4" } } },
}
```
Genesis uses the xAI Responses API as the bundled xAI transport. The same `XAI_API_KEY` can also power Grok-backed `web_search`, first-class `x_search`, and remote `code_execution`. If you store an xAI key under `plugins.entries.xai.config.webSearch.apiKey`, the bundled xAI model provider reuses that key as a fallback too. `code_execution` tuning lives under `plugins.entries.xai.config.codeExecution`.

Built-in catalog

Genesis includes these xAI model families out of the box:

Family Model ids
Grok 3 grok-3, grok-3-fast, grok-3-mini, grok-3-mini-fast
Grok 4 grok-4, grok-4-0709
Grok 4 Fast grok-4-fast, grok-4-fast-non-reasoning
Grok 4.1 Fast grok-4-1-fast, grok-4-1-fast-non-reasoning
Grok 4.20 Beta grok-4.20-beta-latest-reasoning, grok-4.20-beta-latest-non-reasoning
Grok Code grok-code-fast-1

The plugin also forward-resolves newer grok-4* and grok-code-fast* ids when they follow the same API shape.

`grok-4-fast`, `grok-4-1-fast`, and the `grok-4.20-beta-*` variants are the current image-capable Grok refs in the bundled catalog.

Genesis feature coverage

The bundled plugin maps xAI's current public API surface onto Genesis's shared provider and tool contracts. Capabilities that don't fit the shared contract (for example streaming TTS and realtime voice) are not exposed — see the table below.

xAI capability Genesis surface Status
Chat / Responses xai/<model> model provider Yes
Server-side web search web_search provider grok Yes
Server-side X search x_search tool Yes
Server-side code execution code_execution tool Yes
Images image_generate Yes
Videos video_generate Yes
Batch text-to-speech messages.tts.provider: "xai" / tts Yes
Streaming TTS Not exposed; Genesis's TTS contract returns complete audio buffers
Batch speech-to-text tools.media.audio / media understanding Yes
Streaming speech-to-text Voice Call streaming.provider: "xai" Yes
Realtime voice Not exposed yet; different session/WebSocket contract
Files / batches Generic model API compatibility only Not a first-class Genesis tool
Genesis uses xAI's REST image/video/TTS/STT APIs for media generation, speech, and batch transcription, xAI's streaming STT WebSocket for live voice-call transcription, and the Responses API for model, search, and code-execution tools. Features that need different Genesis contracts, such as Realtime voice sessions, are documented here as upstream capabilities rather than hidden plugin behavior.

Fast-mode mappings

/fast on or agents.defaults.models["xai/<model>"].params.fastMode: true rewrites native xAI requests as follows:

Source model Fast-mode target
grok-3 grok-3-fast
grok-3-mini grok-3-mini-fast
grok-4 grok-4-fast
grok-4-0709 grok-4-fast

Legacy compatibility aliases

Legacy aliases still normalize to the canonical bundled ids:

Legacy alias Canonical id
grok-4-fast-reasoning grok-4-fast
grok-4-1-fast-reasoning grok-4-1-fast
grok-4.20-reasoning grok-4.20-beta-latest-reasoning
grok-4.20-non-reasoning grok-4.20-beta-latest-non-reasoning

Features

Web search

The bundled `grok` web-search provider uses `XAI_API_KEY` too:

```bash
genesis config set tools.web.search.provider grok
```

Video generation

The bundled `xai` plugin registers video generation through the shared
`video_generate` tool.

- Default video model: `xai/grok-imagine-video`
- Modes: text-to-video, image-to-video, remote video edit, and remote video
  extension
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `3:2`, `2:3`
- Resolutions: `480P`, `720P`
- Duration: 1-15 seconds for generation/image-to-video, 2-10 seconds for
  extension

<div class="callout warning">
Local video buffers are not accepted. Use remote `http(s)` URLs for
video edit/extend inputs. Image-to-video accepts local image buffers because
Genesis can encode those as data URLs for xAI.
</div>

To use xAI as the default video provider:

```json5
{
  agents: {
    defaults: {
      videoGenerationModel: {
        primary: "xai/grok-imagine-video",
      },
    },
  },
}
```

<div class="callout note">
See [Video Generation](/tools/video-generation) for shared tool parameters,
provider selection, and failover behavior.
</div>

Image generation

The bundled `xai` plugin registers image generation through the shared
`image_generate` tool.

- Default image model: `xai/grok-imagine-image`
- Additional model: `xai/grok-imagine-image-pro`
- Modes: text-to-image and reference-image edit
- Reference inputs: one `image` or up to five `images`
- Aspect ratios: `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, `2:3`, `3:2`
- Resolutions: `1K`, `2K`
- Count: up to 4 images

Genesis asks xAI for `b64_json` image responses so generated media can be
stored and delivered through the normal channel attachment path. Local
reference images are converted to data URLs; remote `http(s)` references are
passed through.

To use xAI as the default image provider:

```json5
{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "xai/grok-imagine-image",
      },
    },
  },
}
```

<div class="callout note">
xAI also documents `quality`, `mask`, `user`, and additional native ratios
such as `1:2`, `2:1`, `9:20`, and `20:9`. Genesis forwards only the
shared cross-provider image controls today; unsupported native-only knobs
are intentionally not exposed through `image_generate`.
</div>

Text-to-speech

The bundled `xai` plugin registers text-to-speech through the shared `tts`
provider surface.

- Voices: `eve`, `ara`, `rex`, `sal`, `leo`, `una`
- Default voice: `eve`
- Formats: `mp3`, `wav`, `pcm`, `mulaw`, `alaw`
- Language: BCP-47 code or `auto`
- Speed: provider-native speed override
- Native Opus voice-note format is not supported

To use xAI as the default TTS provider:

```json5
{
  messages: {
    tts: {
      provider: "xai",
      providers: {
        xai: {
          voiceId: "eve",
        },
      },
    },
  },
}
```

<div class="callout note">
Genesis uses xAI's batch `/v1/tts` endpoint. xAI also offers streaming TTS
over WebSocket, but the Genesis speech provider contract currently expects
a complete audio buffer before reply delivery.
</div>

Speech-to-text

The bundled `xai` plugin registers batch speech-to-text through Genesis's
media-understanding transcription surface.

- Default model: `grok-stt`
- Endpoint: xAI REST `/v1/stt`
- Input path: multipart audio file upload
- Supported by Genesis wherever inbound audio transcription uses
  `tools.media.audio`, including Discord voice-channel segments and
  channel audio attachments

To force xAI for inbound audio transcription:

```json5
{
  tools: {
    media: {
      audio: {
        models: [
          {
            type: "provider",
            provider: "xai",
            model: "grok-stt",
          },
        ],
      },
    },
  },
}
```

Language can be supplied through the shared audio media config or per-call
transcription request. Prompt hints are accepted by the shared Genesis
surface, but the xAI REST STT integration only forwards file, model, and
language because those map cleanly to the current public xAI endpoint.

Streaming speech-to-text

The bundled `xai` plugin also registers a realtime transcription provider
for live voice-call audio.

- Endpoint: xAI WebSocket `wss://api.x.ai/v1/stt`
- Default encoding: `mulaw`
- Default sample rate: `8000`
- Default endpointing: `800ms`
- Interim transcripts: enabled by default

Voice Call's Twilio media stream sends G.711 µ-law audio frames, so the
xAI provider can forward those frames directly without transcoding:

```json5
{
  plugins: {
    entries: {
      "voice-call": {
        config: {
          streaming: {
            enabled: true,
            provider: "xai",
            providers: {
              xai: {
                apiKey: "${XAI_API_KEY}",
                endpointingMs: 800,
                language: "en",
              },
            },
          },
        },
      },
    },
  },
}
```

Provider-owned config lives under
`plugins.entries.voice-call.config.streaming.providers.xai`. Supported
keys are `apiKey`, `baseUrl`, `sampleRate`, `encoding` (`pcm`, `mulaw`, or
`alaw`), `interimResults`, `endpointingMs`, and `language`.

<div class="callout note">
This streaming provider is for Voice Call's realtime transcription path.
Discord voice currently records short segments and uses the batch
`tools.media.audio` transcription path instead.
</div>

x_search configuration

The bundled xAI plugin exposes `x_search` as an Genesis tool for searching
X (formerly Twitter) content via Grok.

Config path: `plugins.entries.xai.config.xSearch`

| Key                | Type    | Default            | Description                          |
| ------------------ | ------- | ------------------ | ------------------------------------ |
| `enabled`          | boolean | —                  | Enable or disable x_search           |
| `model`            | string  | `grok-4-1-fast`    | Model used for x_search requests     |
| `inlineCitations`  | boolean | —                  | Include inline citations in results  |
| `maxTurns`         | number  | —                  | Maximum conversation turns           |
| `timeoutSeconds`   | number  | —                  | Request timeout in seconds           |
| `cacheTtlMinutes`  | number  | —                  | Cache time-to-live in minutes        |

```json5
{
  plugins: {
    entries: {
      xai: {
        config: {
          xSearch: {
            enabled: true,
            model: "grok-4-1-fast",
            inlineCitations: true,
          },
        },
      },
    },
  },
}
```

Code execution configuration

The bundled xAI plugin exposes `code_execution` as an Genesis tool for
remote code execution in xAI's sandbox environment.

Config path: `plugins.entries.xai.config.codeExecution`

| Key               | Type    | Default            | Description                              |
| ----------------- | ------- | ------------------ | ---------------------------------------- |
| `enabled`         | boolean | `true` (if key available) | Enable or disable code execution  |
| `model`           | string  | `grok-4-1-fast`    | Model used for code execution requests   |
| `maxTurns`        | number  | —                  | Maximum conversation turns               |
| `timeoutSeconds`  | number  | —                  | Request timeout in seconds               |

<div class="callout note">
This is remote xAI sandbox execution, not local [`exec`](/tools/exec).
</div>

```json5
{
  plugins: {
    entries: {
      xai: {
        config: {
          codeExecution: {
            enabled: true,
            model: "grok-4-1-fast",
          },
        },
      },
    },
  },
}
```

Known limits

- Auth is API-key only today. There is no xAI OAuth or device-code flow in
  Genesis yet.
- `grok-4.20-multi-agent-experimental-beta-0304` is not supported on the
  normal xAI provider path because it requires a different upstream API
  surface than the standard Genesis xAI transport.
- xAI Realtime voice is not registered as an Genesis provider yet. It
  needs a different bidirectional voice session contract than batch STT or
  streaming transcription.
- xAI image `quality`, image `mask`, and extra native-only aspect ratios are
  not exposed until the shared `image_generate` tool has corresponding
  cross-provider controls.

Advanced notes

- Genesis applies xAI-specific tool-schema and tool-call compatibility fixes
  automatically on the shared runner path.
- Native xAI requests default `tool_stream: true`. Set
  `agents.defaults.models["xai/<model>"].params.tool_stream` to `false` to
  disable it.
- The bundled xAI wrapper strips unsupported strict tool-schema flags and
  reasoning payload keys before sending native xAI requests.
- `web_search`, `x_search`, and `code_execution` are exposed as Genesis
  tools. Genesis enables the specific xAI built-in it needs inside each tool
  request instead of attaching all native tools to every chat turn.
- `x_search` and `code_execution` are owned by the bundled xAI plugin rather
  than hardcoded into the core model runtime.
- `code_execution` is remote xAI sandbox execution, not local
  [`exec`](/tools/exec).

Live testing

The xAI media paths are covered by unit tests and opt-in live suites. The live commands load secrets from your login shell, including ~/.profile, before probing XAI_API_KEY.

pnpm test extensions/xai
GENESIS_LIVE_TEST=1 GENESIS_LIVE_TEST_QUIET=1 pnpm test:live -- extensions/xai/xai.live.test.ts
GENESIS_LIVE_TEST=1 GENESIS_LIVE_TEST_QUIET=1 GENESIS_LIVE_IMAGE_GENERATION_PROVIDERS=xai pnpm test:live -- test/image-generation.runtime.live.test.ts

The provider-specific live file synthesizes normal TTS, telephony-friendly PCM TTS, transcribes audio through xAI batch STT, streams the same PCM through xAI realtime STT, generates text-to-image output, and edits a reference image. The shared image live file verifies the same xAI provider through Genesis's runtime selection, fallback, normalization, and media attachment path.

Related