inferrs can serve local models behind an
OpenAI-compatible /v1 API. Genesis works with inferrs through the generic
openai-completions path.
inferrs is currently best treated as a custom self-hosted OpenAI-compatible
backend, not a dedicated Genesis provider plugin.
Getting started
Start inferrs with a model
```bash
inferrs serve google/gemma-4-E2B-it \
--host 127.0.0.1 \
--port 8080 \
--device metal
```
Verify the server is reachable
```bash
curl http://127.0.0.1:8080/health
curl http://127.0.0.1:8080/v1/models
```
Add an Genesis provider entry
Add an explicit provider entry and point your default model at it. See the full config example below.
Full config example
This example uses Gemma 4 on a local inferrs server.
{
agents: {
defaults: {
model: { primary: "inferrs/google/gemma-4-E2B-it" },
models: {
"inferrs/google/gemma-4-E2B-it": {
alias: "Gemma 4 (inferrs)",
},
},
},
},
models: {
mode: "merge",
providers: {
inferrs: {
baseUrl: "http://127.0.0.1:8080/v1",
apiKey: "inferrs-local",
api: "openai-completions",
models: [
{
id: "google/gemma-4-E2B-it",
name: "Gemma 4 E2B (inferrs)",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 131072,
maxTokens: 4096,
compat: {
requiresStringContent: true,
},
},
],
},
},
},
}
Advanced configuration
Why requiresStringContent matters
Some `inferrs` Chat Completions routes accept only string
`messages[].content`, not structured content-part arrays.
<div class="callout warning">
If Genesis runs fail with an error like:
```text
messages[1].content: invalid type: sequence, expected a string
```
set `compat.requiresStringContent: true` in your model entry.
</div>
```json5
compat: {
requiresStringContent: true
}
```
Genesis will flatten pure text content parts into plain strings before sending
the request.
Gemma and tool-schema caveat
Some current `inferrs` + Gemma combinations accept small direct
`/v1/chat/completions` requests but still fail on full Genesis agent-runtime
turns.
If that happens, try this first:
```json5
compat: {
requiresStringContent: true,
supportsTools: false
}
```
That disables Genesis's tool schema surface for the model and can reduce prompt
pressure on stricter local backends.
If tiny direct requests still work but normal Genesis agent turns continue to
crash inside `inferrs`, the remaining issue is usually upstream model/server
behavior rather than Genesis's transport layer.
Manual smoke test
Once configured, test both layers:
```bash
curl http://127.0.0.1:8080/v1/chat/completions \
-H 'content-type: application/json' \
-d '{"model":"google/gemma-4-E2B-it","messages":[{"role":"user","content":"What is 2 + 2?"}],"stream":false}'
```
```bash
genesis infer model run \
--model inferrs/google/gemma-4-E2B-it \
--prompt "What is 2 + 2? Reply with one short sentence." \
--json
```
If the first command works but the second fails, check the troubleshooting section below.
Proxy-style behavior
`inferrs` is treated as a proxy-style OpenAI-compatible `/v1` backend, not a
native OpenAI endpoint.
- Native OpenAI-only request shaping does not apply here
- No `service_tier`, no Responses `store`, no prompt-cache hints, and no
OpenAI reasoning-compat payload shaping
- Hidden Genesis attribution headers (`originator`, `version`, `User-Agent`)
are not injected on custom `inferrs` base URLs
Troubleshooting
curl /v1/models fails
`inferrs` is not running, not reachable, or not bound to the expected
host/port. Make sure the server is started and listening on the address you
configured.
messages[].content expected a string
Set `compat.requiresStringContent: true` in the model entry. See the
`requiresStringContent` section above for details.
Direct /v1/chat/completions calls pass but genesis infer model run fails
Try setting `compat.supportsTools: false` to disable the tool schema surface.
See the Gemma tool-schema caveat above.
inferrs still crashes on larger agent turns
If Genesis no longer gets schema errors but `inferrs` still crashes on larger
agent turns, treat it as an upstream `inferrs` or model limitation. Reduce
prompt pressure or switch to a different local backend or model.
For general help, see [Troubleshooting](/help/troubleshooting) and [FAQ](/help/faq).
Related
-
Local models Running Genesis against local model servers.
-
Gateway troubleshooting Debugging local OpenAI-compatible backends that pass probes but fail agent runs.
-
Model selection Overview of all providers, model refs, and failover behavior.