The Benchmark

Two agent types, three strategies, four tasks

Information is organized by website domain; each website maintains its own retrieval infrastructure. Coordination strategies progressively introduce model reasoning for selection and agent-to-agent communication.

User Agent

Acts on behalf of the user, processes queries, and coordinates information gathering across many websites. It has no direct corpus access and must work through website-specific interfaces, reflecting realistic access constraints.

Content Agents ×100

Each operates autonomously for one website domain. It retrieves within its domain and returns both natural-language summaries and full document contents, supporting downstream ranking, evidence aggregation, and synthesis.

Coordination strategies

Baseline

Classical

Centralized retrieval over the full index, with universal access and no decentralized constraint. The upper-reference baseline.

  • Direct corpus access
  • LLM website selection
  • Agent-to-agent comms
Strategy

ToolE

A lightweight baseline: websites chosen by embedding similarity; documents fetched by a dense-retrieval tool. No agent communication.

  • Direct corpus access
  • LLM website selection
  • Agent-to-agent comms
Strategy

ToolP

Uses LLM reasoning over website descriptions to select sites, while keeping tool-based retrieval.

  • Direct corpus access
  • LLM website selection
  • Agent-to-agent comms
Strategy

Multi-Agent

Full agent-to-agent communication: both user and content agents reason autonomously, with iterative refinement and adaptive gathering.

  • Direct corpus access
  • LLM website selection
  • Agent-to-agent comms

Four information-seeking tasks