The web_fetch tool does a plain HTTP GET and extracts readable content
(HTML to markdown or text). It does not execute JavaScript.
For JS-heavy sites or login-protected pages, use the Web Browser instead.
Quick start
web_fetch is enabled by default -- no configuration needed. The agent can
call it immediately:
await web_fetch({ url: "https://example.com/article" });
Tool parameters
url (string)
URL to fetch. http(s) only.
extractMode ('markdown' | 'text')
Output format after main-content extraction.
maxChars (number)
Truncate output to this many characters.
How it works
Fetch
Sends an HTTP GET with a Chrome-like User-Agent and `Accept-Language`
header. Blocks private/internal hostnames and re-checks redirects.
Extract
Runs Readability (main-content extraction) on the HTML response.
Fallback (optional)
If Readability fails and Firecrawl is configured, retries through the
Firecrawl API with bot-circumvention mode.
Cache
Results are cached for 15 minutes (configurable) to reduce repeated
fetches of the same URL.
Config
{
tools: {
web: {
fetch: {
enabled: true, // default: true
provider: "firecrawl", // optional; omit for auto-detect
maxChars: 50000, // max output chars
maxCharsCap: 50000, // hard cap for maxChars param
maxResponseBytes: 2000000, // max download size before truncation
timeoutSeconds: 30,
cacheTtlMinutes: 15,
maxRedirects: 3,
readability: true, // use Readability extraction
userAgent: "Mozilla/5.0 ...", // override User-Agent
},
},
},
}
Firecrawl fallback
If Readability extraction fails, web_fetch can fall back to
Firecrawl for bot-circumvention and better extraction:
{
tools: {
web: {
fetch: {
provider: "firecrawl", // optional; omit for auto-detect from available credentials
},
},
},
plugins: {
entries: {
firecrawl: {
enabled: true,
config: {
webFetch: {
apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set
baseUrl: "https://api.firecrawl.dev",
onlyMainContent: true,
maxAgeMs: 86400000, // cache duration (1 day)
timeoutSeconds: 60,
},
},
},
},
},
}
plugins.entries.firecrawl.config.webFetch.apiKey supports SecretRef objects.
Legacy tools.web.fetch.firecrawl.* config is auto-migrated by genesis doctor --fix.
Current runtime behavior:
tools.web.fetch.providerselects the fetch fallback provider explicitly.- If
provideris omitted, Genesis auto-detects the first ready web-fetch provider from available credentials. Today the bundled provider is Firecrawl. - If Readability is disabled,
web_fetchskips straight to the selected provider fallback. If no provider is available, it fails closed.
Limits and safety
maxCharsis clamped totools.web.fetch.maxCharsCap- Response body is capped at
maxResponseBytesbefore parsing; oversized responses are truncated with a warning - Private/internal hostnames are blocked
- Redirects are checked and limited by
maxRedirects web_fetchis best-effort -- some sites need the Web Browser
Tor and .onion services
web_fetch can route .onion URLs through a Tor SOCKS5 proxy. Clearnet URLs
always stay on the normal direct fetch path, even when Tor is enabled.
{
tools: {
web: {
tor: {
enabled: true,
mode: "external",
socksHost: "127.0.0.1",
socksPort: 9050,
},
},
},
}
| Field | Default | Description |
|---|---|---|
enabled |
false |
Enable Tor routing for .onion URLs. |
mode |
"external" |
Proxy mode. Only "external" is supported today. |
socksHost |
"127.0.0.1" |
SOCKS5 proxy host. |
socksPort |
9050 |
SOCKS5 proxy port. |
When tools.web.tor.enabled is true and a web_fetch URL's hostname ends
with .onion, the request is sent through the configured SOCKS5 proxy. DNS
resolution is skipped and left to the Tor proxy. The hostname allowlist from
tools.web.fetch.ssrfPolicy.hostnameAllowlist still applies.
.onion URLs are blocked when Tor is not enabled.
Tool profiles
If you use tool profiles or allowlists, add web_fetch or group:web:
{
tools: {
allow: ["web_fetch"],
// or: allow: ["group:web"] (includes web_fetch, web_search, and x_search)
},
}
Related
- Web Search -- search the web with multiple providers
- Web Browser -- full browser automation for JS-heavy sites
- Firecrawl -- Firecrawl search and scrape tools