Inferrs - Genesis Docs

inferrs can serve local models behind an OpenAI-compatible /v1 API. Genesis works with inferrs through the generic openai-completions path.

inferrs is currently best treated as a custom self-hosted OpenAI-compatible backend, not a dedicated Genesis provider plugin.

Getting started

Start inferrs with a model

```bash
inferrs serve google/gemma-4-E2B-it \
  --host 127.0.0.1 \
  --port 8080 \
  --device metal
```

Verify the server is reachable

```bash
curl http://127.0.0.1:8080/health
curl http://127.0.0.1:8080/v1/models
```

Add an Genesis provider entry

Add an explicit provider entry and point your default model at it. See the full config example below.

Full config example

This example uses Gemma 4 on a local inferrs server.

{
  agents: {
    defaults: {
      model: { primary: "inferrs/google/gemma-4-E2B-it" },
      models: {
        "inferrs/google/gemma-4-E2B-it": {
          alias: "Gemma 4 (inferrs)",
        },
      },
    },
  },
  models: {
    mode: "merge",
    providers: {
      inferrs: {
        baseUrl: "http://127.0.0.1:8080/v1",
        apiKey: "inferrs-local",
        api: "openai-completions",
        models: [
          {
            id: "google/gemma-4-E2B-it",
            name: "Gemma 4 E2B (inferrs)",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 131072,
            maxTokens: 4096,
            compat: {
              requiresStringContent: true,
            },
          },
        ],
      },
    },
  },
}

Advanced configuration

Why requiresStringContent matters

Some `inferrs` Chat Completions routes accept only string
`messages[].content`, not structured content-part arrays.

<div class="callout warning">
If Genesis runs fail with an error like:

```text
messages[1].content: invalid type: sequence, expected a string
```

set `compat.requiresStringContent: true` in your model entry.
</div>

```json5
compat: {
  requiresStringContent: true
}
```

Genesis will flatten pure text content parts into plain strings before sending
the request.

Gemma and tool-schema caveat

Some current `inferrs` + Gemma combinations accept small direct
`/v1/chat/completions` requests but still fail on full Genesis agent-runtime
turns.

If that happens, try this first:

```json5
compat: {
  requiresStringContent: true,
  supportsTools: false
}
```

That disables Genesis's tool schema surface for the model and can reduce prompt
pressure on stricter local backends.

If tiny direct requests still work but normal Genesis agent turns continue to
crash inside `inferrs`, the remaining issue is usually upstream model/server
behavior rather than Genesis's transport layer.

Manual smoke test

Once configured, test both layers:

```bash
curl http://127.0.0.1:8080/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{"model":"google/gemma-4-E2B-it","messages":[{"role":"user","content":"What is 2 + 2?"}],"stream":false}'
```

```bash
genesis infer model run \
  --model inferrs/google/gemma-4-E2B-it \
  --prompt "What is 2 + 2? Reply with one short sentence." \
  --json
```

If the first command works but the second fails, check the troubleshooting section below.

Proxy-style behavior

`inferrs` is treated as a proxy-style OpenAI-compatible `/v1` backend, not a
native OpenAI endpoint.

- Native OpenAI-only request shaping does not apply here
- No `service_tier`, no Responses `store`, no prompt-cache hints, and no
  OpenAI reasoning-compat payload shaping
- Hidden Genesis attribution headers (`originator`, `version`, `User-Agent`)
  are not injected on custom `inferrs` base URLs

Troubleshooting

curl /v1/models fails

`inferrs` is not running, not reachable, or not bound to the expected
host/port. Make sure the server is started and listening on the address you
configured.

messages[].content expected a string

Set `compat.requiresStringContent: true` in the model entry. See the
`requiresStringContent` section above for details.

Direct /v1/chat/completions calls pass but genesis infer model run fails

Try setting `compat.supportsTools: false` to disable the tool schema surface.
See the Gemma tool-schema caveat above.

inferrs still crashes on larger agent turns

If Genesis no longer gets schema errors but `inferrs` still crashes on larger
agent turns, treat it as an upstream `inferrs` or model limitation. Reduce
prompt pressure or switch to a different local backend or model.

For general help, see [Troubleshooting](/help/troubleshooting) and [FAQ](/help/faq).

Local models Running Genesis against local model servers.
Gateway troubleshooting Debugging local OpenAI-compatible backends that pass probes but fail agent runs.
Model selection Overview of all providers, model refs, and failover behavior.

Getting started

Start inferrs with a model

Verify the server is reachable

Add an Genesis provider entry

Full config example

Advanced configuration

Why requiresStringContent matters

Gemma and tool-schema caveat

Manual smoke test

Proxy-style behavior

Troubleshooting

curl /v1/models fails

messages[].content expected a string

Direct /v1/chat/completions calls pass but genesis infer model run fails

inferrs still crashes on larger agent turns

Related