Task Taxonomy

20 verified, skill-dependent tasks spanning 6 categories and 15 sub-domains, derived from a community-driven taxonomy of real-world skill usage patterns. Each task contains multiple instances to evaluate skill reusability.

6Categories
15Sub-domains
20Tasks
100Instances

Browse Tasks

20 verified tasks at a glance. Filter by category or search by keyword — click any card for the full spec.

Showing all 20 tasks
T01 2 inst.
python-scala-translation
Software EngineeringCode Generation

Translate a Python Tokenizer module (with multiple classes and functions) into idiomatic Scala 2.13, following Scala conventions, proper abstractions, and standard library usage. Must pass provided test specs.

View details →
T02 3 inst.
nlp-paper-reproduction
Software EngineeringCode Generation

Reproduce the simpo_loss function of SimPOTrainer from a research paper (PDF provided). Run unit tests with fixed input tensors and save the loss values for verification.

View details →
T03 5 inst.
dependency-vulnerability-check
Software EngineeringDebug & Analysis

Perform a security audit of a package-lock.json to identify HIGH and CRITICAL vulnerabilities in third-party dependencies, collecting CVE IDs, CVSS scores, fixed versions, and references.

View details →
T04 5 inst.
github-repo-analytics
Software EngineeringVersion Control

Analyze PR metrics for a GitHub repository in a specific quarter: count PRs changing ≥8 files, compute merge rates, average time-to-merge, and identify top contributors.

View details →
T05 3 inst.
fix-security-bug
Software EngineeringInfrastructure

Fix a JavaScript injection CVE in Apache Druid 0.20.0 where authenticated attackers can execute arbitrary code through malicious payloads. Write patches, apply them, and rebuild with Maven.

View details →
T06 6 inst.
enterprise-information-search
Information RetrievalWeb Search

Retrieve information from heterogeneous enterprise data files to answer a set of questions, outputting structured answers with token consumption tracking.

View details →
T07 5 inst.
travel-planning
Information RetrievalWeb Search

Build a 7-day travel itinerary using a provided database of cities, restaurants, accommodations, attractions, and driving distances. Must follow budget, cuisine, pet-friendly, and transportation constraints.

View details →
T08 5 inst.
schedule-planning
Productivity ToolsTeam Communication

Parse meeting request emails and a visual PDF calendar, identify available time slots (treating blue blocks as overwritable), and generate structured meeting reply files.

View details →
T09 6 inst.
offer-letter-generator
Productivity ToolsDocument Systems

Fill a Word template with employee data from JSON, handling conditional sections (e.g., relocation packages) using IF/END_IF markers.

View details →
T10 6 inst.
court-form-filling
Productivity ToolsDocument Systems

Fill a California Small Claims Court form (SC-100 PDF) with case details from a text description, including plaintiff/defendant information, claim amounts, and dates.

View details →
T11 6 inst.
earthquake-plate-calculation
Data & AnalyticsData Processing

Use GeoPandas to find specific earthquakes relative to tectonic plate boundaries. Compute distances and output structured results with earthquake details.

View details →
T12 6 inst.
financial-analysis
Data & AnalyticsData Processing

Analyze SEC 13F hedge fund filings across quarters, answering questions about AUM, stock holdings, investment changes, and top fund managers.

View details →
T13 6 inst.
weighted-gdp-calculation
Data & AnalyticsMath & Calculation

Work in Excel using INDEX/MATCH formulas to fill data, calculate net exports as % of GDP for GCC countries, compute statistics, and weighted mean via SUMPRODUCT.

View details →
T14 5 inst.
dbscan-parameter-tuning
Data & AnalyticsMath & Calculation

Optimize DBSCAN hyperparameters to cluster citizen-science Mars cloud annotations. Grid search over min_samples, epsilon, and shape_weight to find the Pareto frontier.

View details →
T15 5 inst.
stock-data-visualization
Data & AnalyticsData Visualization

Build a D3.js (v6) single-page web app: bubble chart of 50 stocks sized by market cap, colored by sector, force-clustered, with an interactive data table synchronized to the chart.

View details →
T16 5 inst.
anthropic-poster-design
Content & CreativeImage Generation

Generate a technical 'exploded-view' poster for a hardware device using Anthropic brand colors and typography. Must correctly apply brand guidelines from the provided skill.

View details →
T17 5 inst.
chinese-poem-generator
Content & CreativeText Generation

Compose a Chinese seven-character regulated verse (七律) following strict format rules: 8 lines of 7 characters, with proper rhyming based on modern Mandarin pronunciation.

View details →
T18 5 inst.
video-object-counting
Content & CreativeAudio & Video

Extract keyframes from a Super Mario video, convert to grayscale, and use template matching to count coins, enemies, and turtles in each frame. Output results to CSV.

View details →
T19 6 inst.
organize-messy-files
Utilities & OtherLocal File Control

Sort 100+ PDF/PPTX/DOCX files into 5 subject folders based on content analysis. Files span LLM, trapped ion & quantum computing, black hole, DNA, and music history.

View details →
T20 5 inst.
temperature-simulation
Utilities & OtherCommand Execution

Run the General Lake Model (GLM) to simulate vertical water temperature for Lake Mendota. Calibrate 5 parameters within published ranges to meet RMSE thresholds.

View details →

Task Table

The full taxonomy — grouped by category and sub-domain.

Category Sub-domain Task #Instances
Software Engineering Code Generation python-scala-translation 2
nlp-paper-reproduction 3
Debug & Analysis dependency-vulnerability-check 5
Version Control github-repo-analytics 5
Infrastructure fix-security-bug 3
Information Retrieval Web Search enterprise-information-search 6
travel-planning 5
Productivity Tools Team Communication schedule-planning 5
Document Systems offer-letter-generator 6
court-form-filling 6
Data & Analytics Data Processing earthquake-plate-calculation 6
financial-analysis 6
Math & Calculation weighted-gdp-calculation 6
dbscan-parameter-tuning 5
Data Visualization stock-data-visualization 5
Content & Creative Image Generation anthropic-poster-design 5
Text Generation chinese-poem-generator 5
Audio & Video video-object-counting 5
Utilities & Other Local File Control organize-messy-files 6
Command Execution temperature-simulation 5
Total: 6 categories, 15 sub-domains 20 tasks 100