The Google plugin provides access to Gemini models through Google AI Studio, plus image generation, media understanding (image/audio/video), text-to-speech, and web search via Gemini Grounding.

  • Provider: google
  • Auth: GEMINI_API_KEY or GOOGLE_API_KEY
  • API: Google Gemini API
  • Runtime option: agents.defaults.embeddedHarness.runtime: "google-gemini-cli" reuses Gemini CLI OAuth while keeping model refs canonical as google/*.

Getting started

Choose your preferred auth method and follow the setup steps.

API key

**Best for:** standard Gemini API access through Google AI Studio.

Run onboarding

    ```bash
    genesis onboard --auth-choice gemini-api-key
    ```

    Or pass the key directly:

    ```bash
    genesis onboard --non-interactive \
      --mode local \
      --auth-choice gemini-api-key \
      --gemini-api-key "$GEMINI_API_KEY"
    ```

Set a default model

    ```json5
    {
      agents: {
        defaults: {
          model: { primary: "google/gemini-3.1-pro-preview" },
        },
      },
    }
    ```

Verify the model is available

    ```bash
    genesis models list --provider google
    ```
  




<div class="callout tip">
The environment variables `GEMINI_API_KEY` and `GOOGLE_API_KEY` are both accepted. Use whichever you already have configured.
</div>

Gemini CLI (OAuth)

**Best for:** reusing an existing Gemini CLI login via PKCE OAuth instead of a separate API key.

<div class="callout warning">
The `google-gemini-cli` provider is an unofficial integration. Some users
report account restrictions when using OAuth this way. Use at your own risk.
</div>

Install the Gemini CLI

    The local `gemini` command must be available on `PATH`.

    ```bash
    # Homebrew
    brew install gemini-cli

    # or npm
    npm install -g @google/gemini-cli
    ```

    Genesis supports both Homebrew installs and global npm installs, including
    common Windows/npm layouts.

Log in via OAuth

    ```bash
    genesis models auth login --provider google-gemini-cli --set-default
    ```

Verify the model is available

    ```bash
    genesis models list --provider google
    ```
  




- Default model: `google/gemini-3.1-pro-preview`
- Runtime: `google-gemini-cli`
- Alias: `gemini-cli`

**Environment variables:**

- `GENESIS_GEMINI_OAUTH_CLIENT_ID`
- `GENESIS_GEMINI_OAUTH_CLIENT_SECRET`

(Or the `GEMINI_CLI_*` variants.)

<div class="callout note">
If Gemini CLI OAuth requests fail after login, set `GOOGLE_CLOUD_PROJECT` or
`GOOGLE_CLOUD_PROJECT_ID` on the gateway host and retry.
</div>

<div class="callout note">
If login fails before the browser flow starts, make sure the local `gemini`
command is installed and on `PATH`.
</div>

`google-gemini-cli/*` model refs are legacy compatibility aliases. New
configs should use `google/*` model refs plus the `google-gemini-cli`
runtime when they want local Gemini CLI execution.

Capabilities

Capability Supported
Chat completions Yes
Image generation Yes
Music generation Yes
Text-to-speech Yes
Realtime voice Yes (Google Live API)
Image understanding Yes
Audio transcription Yes
Video understanding Yes
Web search (Grounding) Yes
Thinking/reasoning Yes (Gemini 2.5+ / Gemini 3+)
Gemma 4 models Yes
Gemini 3 models use `thinkingLevel` rather than `thinkingBudget`. Genesis maps Gemini 3, Gemini 3.1, and `gemini-*-latest` alias reasoning controls to `thinkingLevel` so default/low-latency runs do not send disabled `thinkingBudget` values.

/think adaptive keeps Google's dynamic thinking semantics instead of choosing a fixed Genesis level. Gemini 3 and Gemini 3.1 omit a fixed thinkingLevel so Google can choose the level; Gemini 2.5 sends Google's dynamic sentinel thinkingBudget: -1.

Gemma 4 models (for example gemma-4-26b-a4b-it) support thinking mode. Genesis rewrites thinkingBudget to a supported Google thinkingLevel for Gemma 4. Setting thinking to off preserves thinking disabled instead of mapping to MINIMAL.

Image generation

The bundled google image-generation provider defaults to google/gemini-3.1-flash-image-preview.

  • Also supports google/gemini-3-pro-image-preview
  • Generate: up to 4 images per request
  • Edit mode: enabled, up to 5 input images
  • Geometry controls: size, aspectRatio, and resolution

To use Google as the default image provider:

{
  agents: {
    defaults: {
      imageGenerationModel: {
        primary: "google/gemini-3.1-flash-image-preview",
      },
    },
  },
}
See [Image Generation](/tools/image-generation) for shared tool parameters, provider selection, and failover behavior.

Video generation

The bundled google plugin also registers video generation through the shared video_generate tool.

  • Default video model: google/veo-3.1-fast-generate-preview
  • Modes: text-to-video, image-to-video, and single-video reference flows
  • Supports aspectRatio, resolution, and audio
  • Current duration clamp: 4 to 8 seconds

To use Google as the default video provider:

{
  agents: {
    defaults: {
      videoGenerationModel: {
        primary: "google/veo-3.1-fast-generate-preview",
      },
    },
  },
}
See [Video Generation](/tools/video-generation) for shared tool parameters, provider selection, and failover behavior.

Music generation

The bundled google plugin also registers music generation through the shared music_generate tool.

  • Default music model: google/lyria-3-clip-preview
  • Also supports google/lyria-3-pro-preview
  • Prompt controls: lyrics and instrumental
  • Output format: mp3 by default, plus wav on google/lyria-3-pro-preview
  • Reference inputs: up to 10 images
  • Session-backed runs detach through the shared task/status flow, including action: "status"

To use Google as the default music provider:

{
  agents: {
    defaults: {
      musicGenerationModel: {
        primary: "google/lyria-3-clip-preview",
      },
    },
  },
}
See [Music Generation](/tools/music-generation) for shared tool parameters, provider selection, and failover behavior.

Text-to-speech

The bundled google speech provider uses the Gemini API TTS path with gemini-3.1-flash-tts-preview.

  • Default voice: Kore
  • Auth: messages.tts.providers.google.apiKey, models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY
  • Output: WAV for regular TTS attachments, PCM for Talk/telephony
  • Native voice-note output: not supported on this Gemini API path because the API returns PCM rather than Opus

To use Google as the default TTS provider:

{
  messages: {
    tts: {
      auto: "always",
      provider: "google",
      providers: {
        google: {
          model: "gemini-3.1-flash-tts-preview",
          voiceName: "Kore",
          audioProfile: "Speak professionally with a calm tone.",
        },
      },
    },
  },
}

Gemini API TTS uses natural-language prompting for style control. Set audioProfile to prepend a reusable style prompt before the spoken text. Set speakerName when your prompt text refers to a named speaker.

Gemini API TTS also accepts expressive square-bracket audio tags in the text, such as [whispers] or [laughs]. To keep tags out of the visible chat reply while sending them to TTS, put them inside a [[tts:text]]...[[/tts:text]] block:

Here is the clean reply text.

[[tts:text]][whispers] Here is the spoken version.[[/tts:text]]
A Google Cloud Console API key restricted to the Gemini API is valid for this provider. This is not the separate Cloud Text-to-Speech API path.

Realtime voice

The bundled google plugin registers a realtime voice provider backed by the Gemini Live API for backend audio bridges such as Voice Call and Google Meet.

Setting Config path Default
Model plugins.entries.voice-call.config.realtime.providers.google.model gemini-2.5-flash-native-audio-preview-12-2025
Voice ...google.voice Kore
Temperature ...google.temperature (unset)
VAD start sensitivity ...google.startSensitivity (unset)
VAD end sensitivity ...google.endSensitivity (unset)
Silence duration ...google.silenceDurationMs (unset)
API key ...google.apiKey Falls back to models.providers.google.apiKey, GEMINI_API_KEY, or GOOGLE_API_KEY

Example Voice Call realtime config:

{
  plugins: {
    entries: {
      "voice-call": {
        enabled: true,
        config: {
          realtime: {
            enabled: true,
            provider: "google",
            providers: {
              google: {
                model: "gemini-2.5-flash-native-audio-preview-12-2025",
                voice: "Kore",
              },
            },
          },
        },
      },
    },
  },
}
Google Live API uses bidirectional audio and function calling over a WebSocket. Genesis adapts telephony/Meet bridge audio to Gemini's PCM Live API stream and keeps tool calls on the shared realtime voice contract. Leave `temperature` unset unless you need sampling changes; Genesis omits non-positive values because Google Live can return transcripts without audio for `temperature: 0`. Gemini API transcription is enabled without `languageCodes`; the current Google SDK rejects language-code hints on this API path.
Control UI Talk browser sessions still require a realtime voice provider with a browser WebRTC session implementation. Today that path is OpenAI Realtime; the Google provider is for backend realtime bridges.

Advanced configuration

Direct Gemini cache reuse

For direct Gemini API runs (`api: "google-generative-ai"`), Genesis
passes a configured `cachedContent` handle through to Gemini requests.

- Configure per-model or global params with either
  `cachedContent` or legacy `cached_content`
- If both are present, `cachedContent` wins
- Example value: `cachedContents/prebuilt-context`
- Gemini cache-hit usage is normalized into Genesis `cacheRead` from
  upstream `cachedContentTokenCount`

```json5
{
  agents: {
    defaults: {
      models: {
        "google/gemini-2.5-pro": {
          params: {
            cachedContent: "cachedContents/prebuilt-context",
          },
        },
      },
    },
  },
}
```

Gemini CLI JSON usage notes

When using the `google-gemini-cli` OAuth provider, Genesis normalizes
the CLI JSON output as follows:

- Reply text comes from the CLI JSON `response` field.
- Usage falls back to `stats` when the CLI leaves `usage` empty.
- `stats.cached` is normalized into Genesis `cacheRead`.
- If `stats.input` is missing, Genesis derives input tokens from
  `stats.input_tokens - stats.cached`.

Environment and daemon setup

If the Gateway runs as a daemon (launchd/systemd), make sure `GEMINI_API_KEY`
is available to that process (for example, in `~/.genesis/.env` or via
`env.shellEnv`).

Related