Benchmark · AgentWebBench

The Benchmark

Two agent types, three strategies, four tasks

Information is organized by website domain; each website maintains its own retrieval infrastructure. Coordination strategies progressively introduce model reasoning for selection and agent-to-agent communication.

User Agent

Acts on behalf of the user, processes queries, and coordinates information gathering across many websites. It has no direct corpus access and must work through website-specific interfaces, reflecting realistic access constraints.

Content Agents ×100

Each operates autonomously for one website domain. It retrieves within its domain and returns both natural-language summaries and full document contents, supporting downstream ranking, evidence aggregation, and synthesis.

Coordination strategies

Baseline

Classical

Centralized retrieval over the full index, with universal access and no decentralized constraint. The upper-reference baseline.

Direct corpus access
LLM website selection
Agent-to-agent comms

Strategy

Tool_E

A lightweight baseline: websites chosen by embedding similarity; documents fetched by a dense-retrieval tool. No agent communication.

Direct corpus access
LLM website selection
Agent-to-agent comms

Strategy

Tool_P

Uses LLM reasoning over website descriptions to select sites, while keeping tool-based retrieval.

Direct corpus access
LLM website selection
Agent-to-agent comms

Strategy

Multi-Agent

Full agent-to-agent communication: both user and content agents reason autonomously, with iterative refinement and adaptive gathering.

Direct corpus access
LLM website selection
Agent-to-agent comms

User Agent

Content Agents ×100

Coordination strategies

Classical

ToolE

ToolP

Multi-Agent

Four information-seeking tasks

Tool_E

Tool_P