The web_fetch tool does a plain HTTP GET and extracts readable content (HTML to markdown or text). It does not execute JavaScript.

For JS-heavy sites or login-protected pages, use the Web Browser instead.

Quick start

web_fetch is enabled by default -- no configuration needed. The agent can call it immediately:

await web_fetch({ url: "https://example.com/article" });

Tool parameters

url (string)

URL to fetch. http(s) only.

extractMode ('markdown' | 'text')

Output format after main-content extraction.

maxChars (number)

Truncate output to this many characters.

How it works

Fetch

Sends an HTTP GET with a Chrome-like User-Agent and `Accept-Language`
header. Blocks private/internal hostnames and re-checks redirects.

Extract

Runs Readability (main-content extraction) on the HTML response.

Fallback (optional)

If Readability fails and Firecrawl is configured, retries through the
Firecrawl API with bot-circumvention mode.

Cache

Results are cached for 15 minutes (configurable) to reduce repeated
fetches of the same URL.

Config

{
  tools: {
    web: {
      fetch: {
        enabled: true, // default: true
        provider: "firecrawl", // optional; omit for auto-detect
        maxChars: 50000, // max output chars
        maxCharsCap: 50000, // hard cap for maxChars param
        maxResponseBytes: 2000000, // max download size before truncation
        timeoutSeconds: 30,
        cacheTtlMinutes: 15,
        maxRedirects: 3,
        readability: true, // use Readability extraction
        userAgent: "Mozilla/5.0 ...", // override User-Agent
      },
    },
  },
}

Firecrawl fallback

If Readability extraction fails, web_fetch can fall back to Firecrawl for bot-circumvention and better extraction:

{
  tools: {
    web: {
      fetch: {
        provider: "firecrawl", // optional; omit for auto-detect from available credentials
      },
    },
  },
  plugins: {
    entries: {
      firecrawl: {
        enabled: true,
        config: {
          webFetch: {
            apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set
            baseUrl: "https://api.firecrawl.dev",
            onlyMainContent: true,
            maxAgeMs: 86400000, // cache duration (1 day)
            timeoutSeconds: 60,
          },
        },
      },
    },
  },
}

plugins.entries.firecrawl.config.webFetch.apiKey supports SecretRef objects. Legacy tools.web.fetch.firecrawl.* config is auto-migrated by genesis doctor --fix.

If Firecrawl is enabled and its SecretRef is unresolved with no `FIRECRAWL_API_KEY` env fallback, gateway startup fails fast.
Firecrawl `baseUrl` overrides are locked down: they must use `https://` and the official Firecrawl host (`api.firecrawl.dev`).

Current runtime behavior:

  • tools.web.fetch.provider selects the fetch fallback provider explicitly.
  • If provider is omitted, Genesis auto-detects the first ready web-fetch provider from available credentials. Today the bundled provider is Firecrawl.
  • If Readability is disabled, web_fetch skips straight to the selected provider fallback. If no provider is available, it fails closed.

Limits and safety

  • maxChars is clamped to tools.web.fetch.maxCharsCap
  • Response body is capped at maxResponseBytes before parsing; oversized responses are truncated with a warning
  • Private/internal hostnames are blocked
  • Redirects are checked and limited by maxRedirects
  • web_fetch is best-effort -- some sites need the Web Browser

Tor and .onion services

web_fetch can route .onion URLs through a Tor SOCKS5 proxy. Clearnet URLs always stay on the normal direct fetch path, even when Tor is enabled.

{
  tools: {
    web: {
      tor: {
        enabled: true,
        mode: "external",
        socksHost: "127.0.0.1",
        socksPort: 9050,
      },
    },
  },
}
Field Default Description
enabled false Enable Tor routing for .onion URLs.
mode "external" Proxy mode. Only "external" is supported today.
socksHost "127.0.0.1" SOCKS5 proxy host.
socksPort 9050 SOCKS5 proxy port.

When tools.web.tor.enabled is true and a web_fetch URL's hostname ends with .onion, the request is sent through the configured SOCKS5 proxy. DNS resolution is skipped and left to the Tor proxy. The hostname allowlist from tools.web.fetch.ssrfPolicy.hostnameAllowlist still applies.

.onion URLs are blocked when Tor is not enabled.

Tool profiles

If you use tool profiles or allowlists, add web_fetch or group:web:

{
  tools: {
    allow: ["web_fetch"],
    // or: allow: ["group:web"]  (includes web_fetch, web_search, and x_search)
  },
}

Related

  • Web Search -- search the web with multiple providers
  • Web Browser -- full browser automation for JS-heavy sites
  • Firecrawl -- Firecrawl search and scrape tools