20 verified, skill-dependent tasks spanning 6 categories and 15 sub-domains, derived from a community-driven taxonomy of real-world skill usage patterns. Each task contains multiple instances to evaluate skill reusability.
20 verified tasks at a glance. Filter by category or search by keyword — click any card for the full spec.
Translate a Python Tokenizer module (with multiple classes and functions) into idiomatic Scala 2.13, following Scala conventions, proper abstractions, and standard library usage. Must pass provided test specs.
Reproduce the simpo_loss function of SimPOTrainer from a research paper (PDF provided). Run unit tests with fixed input tensors and save the loss values for verification.
Perform a security audit of a package-lock.json to identify HIGH and CRITICAL vulnerabilities in third-party dependencies, collecting CVE IDs, CVSS scores, fixed versions, and references.
Analyze PR metrics for a GitHub repository in a specific quarter: count PRs changing ≥8 files, compute merge rates, average time-to-merge, and identify top contributors.
Fix a JavaScript injection CVE in Apache Druid 0.20.0 where authenticated attackers can execute arbitrary code through malicious payloads. Write patches, apply them, and rebuild with Maven.
Retrieve information from heterogeneous enterprise data files to answer a set of questions, outputting structured answers with token consumption tracking.
Build a 7-day travel itinerary using a provided database of cities, restaurants, accommodations, attractions, and driving distances. Must follow budget, cuisine, pet-friendly, and transportation constraints.
Parse meeting request emails and a visual PDF calendar, identify available time slots (treating blue blocks as overwritable), and generate structured meeting reply files.
Fill a Word template with employee data from JSON, handling conditional sections (e.g., relocation packages) using IF/END_IF markers.
Fill a California Small Claims Court form (SC-100 PDF) with case details from a text description, including plaintiff/defendant information, claim amounts, and dates.
Use GeoPandas to find specific earthquakes relative to tectonic plate boundaries. Compute distances and output structured results with earthquake details.
Analyze SEC 13F hedge fund filings across quarters, answering questions about AUM, stock holdings, investment changes, and top fund managers.
Work in Excel using INDEX/MATCH formulas to fill data, calculate net exports as % of GDP for GCC countries, compute statistics, and weighted mean via SUMPRODUCT.
Optimize DBSCAN hyperparameters to cluster citizen-science Mars cloud annotations. Grid search over min_samples, epsilon, and shape_weight to find the Pareto frontier.
Build a D3.js (v6) single-page web app: bubble chart of 50 stocks sized by market cap, colored by sector, force-clustered, with an interactive data table synchronized to the chart.
Generate a technical 'exploded-view' poster for a hardware device using Anthropic brand colors and typography. Must correctly apply brand guidelines from the provided skill.
Compose a Chinese seven-character regulated verse (七律) following strict format rules: 8 lines of 7 characters, with proper rhyming based on modern Mandarin pronunciation.
Extract keyframes from a Super Mario video, convert to grayscale, and use template matching to count coins, enemies, and turtles in each frame. Output results to CSV.
Sort 100+ PDF/PPTX/DOCX files into 5 subject folders based on content analysis. Files span LLM, trapped ion & quantum computing, black hole, DNA, and music history.
Run the General Lake Model (GLM) to simulate vertical water temperature for Lake Mendota. Calibrate 5 parameters within published ranges to meet RMSE thresholds.
The full taxonomy — grouped by category and sub-domain.
| Category | Sub-domain | Task | #Instances |
|---|---|---|---|
| Software Engineering | Code Generation | python-scala-translation | 2 |
| nlp-paper-reproduction | 3 | ||
| Debug & Analysis | dependency-vulnerability-check | 5 | |
| Version Control | github-repo-analytics | 5 | |
| Infrastructure | fix-security-bug | 3 | |
| Information Retrieval | Web Search | enterprise-information-search | 6 |
| travel-planning | 5 | ||
| Productivity Tools | Team Communication | schedule-planning | 5 |
| Document Systems | offer-letter-generator | 6 | |
| court-form-filling | 6 | ||
| Data & Analytics | Data Processing | earthquake-plate-calculation | 6 |
| financial-analysis | 6 | ||
| Math & Calculation | weighted-gdp-calculation | 6 | |
| dbscan-parameter-tuning | 5 | ||
| Data Visualization | stock-data-visualization | 5 | |
| Content & Creative | Image Generation | anthropic-poster-design | 5 |
| Text Generation | chinese-poem-generator | 5 | |
| Audio & Video | video-object-counting | 5 | |
| Utilities & Other | Local File Control | organize-messy-files | 6 |
| Command Execution | temperature-simulation | 5 | |
| Total: 6 categories, 15 sub-domains | 20 tasks | 100 | |