Scorecard schema

A scorecard is the structured output of anc check <binary> --output json. The site reads scorecards under scorecards/ and renders them at /scorecards and at each tool's /score/<name> page. This page documents every field that appears in a scorecard, what it means, and where it comes from. It is intentionally exhaustive: anything in the JSON should be explainable from this page.

Filename

Scorecards are stored on disk as:

scorecards/<name>-v<version>.json

Where <name> matches the registry's name field (URL slug) and <version> is the SemVer string captured at scoring time. The filename's <version> segment is the canonical version anchor — the site reads it directly off disk and displays it as the scored version on every per-tool page. The scorecard's tool.version field (added in schema 0.4) is informational; when both are present and disagree, the build aborts with an integrity error. The filename never lies.

Top-level fields

{
  "schema_version": "0.4",
  "spec_version": "...",
  "tool":   { "name": "...", "binary": "...", "version": "..." },
  "anc":    { "version": "..." },
  "run":    { "invocation": "...", "started_at": "...", "duration_ms": 0,
              "platform": { "os": "...", "arch": "..." } },
  "target": { "kind": "...", "path": null, "command": "..." },
  "audience": "...",
  "audience_reason": null,
  "audit_profile": null,
  "summary": { ... },
  "coverage_summary": { ... },
  "results": [ ... ]
}
Field Type Source Meaning
schema_version string anc emitted Version of the JSON envelope itself. Pre-1.0; bumped when fields are added, removed, or renamed.
spec_version string anc emitted Version of the agentnative spec the run conformed to. Independent of schema_version.
tool object anc emitted Self-describing identity for the tool that was scored. See tool below. Added in 0.4.
anc object anc emitted Provenance of the anc build that produced the scorecard. See anc below. Added in 0.4.
run object anc emitted Run-context: what was invoked, when, on what platform. See run below. Added in 0.4.
target object anc emitted What the run was scoring (command vs. binary vs. project). See target below. Added in 0.4.
audience string | null anc emitted One-line audience classification. See audience signal.
audience_reason string | null anc emitted Set to "suppressed" when an active audit profile blanks the audience signal. Otherwise null.
audit_profile string | null registry Exception category passed to anc via --audit-profile. See audit profiles.
summary object derived Tally of how the runner's checks finished. See summary below.
coverage_summary object derived Tally of how many spec requirements the run verified. See coverage_summary below.
results array of result obj anc emitted One entry per behavioral check that ran. See results below.

tool

Self-describing tool identity. Lets a downstream consumer answer "what was scored?" without cross-referencing the registry.

"tool": {
  "name": "rg",
  "binary": "rg",
  "version": "ripgrep 15.1.0"
}
Field Type Meaning
name string The literal --command argv passed to anc. For tools where the registry name differs from the binary (e.g., registry ripgrep → binary rg), this is the binary, not the registry slug. The filename slug owns the registry-name side of the join.
binary string Executable name resolved from $PATH at scoring time. Usually equals tool.name for command-mode runs.
version string | null Best-effort version string. The CLI dumps the first line of <binary> --version here without further parsing — it may carry the marketing string ("eza - A modern, maintained replacement for ls"), the full multi-line block, or null when the binary doesn't print anything parseable. The filename's <version> is canonical; this field is a courtesy.

Build-time invariant: when tool.version contains a SemVer-shaped token (X.Y or X.Y.Z), it must equal the filename version. Drift fails the build with a parser-asymmetry error — the regen script's version_extract snippet and the CLI's internal probe are the only two places that derive a version from the binary, and they must agree.

anc

Provenance for the scorecard: which anc build produced it.

"anc": {
  "version": "0.2.1"
}
Field Type Meaning
version string Self-reported version of the anc binary that ran the score. Read from Cargo.toml at build time.

The per-tool page renders anc.version.

run

Run-context: the literal invocation, when it ran, how long it took, and what platform it ran on.

"run": {
  "invocation": "anc check --command rg --output json",
  "started_at": "2026-04-30T04:18:53.099683344Z",
  "duration_ms": 53,
  "platform": { "os": "linux", "arch": "x86_64" }
}
Field Type Meaning
invocation string The verbatim argv that produced this scorecard, joined with single spaces. The per-tool "Reproduce locally" CTA renders this directly for command-mode runs.
started_at string RFC 3339 timestamp of when the run started. UTC. Validated as parseable by Date at build time.
duration_ms integer Wall-clock duration of the entire scoring run in milliseconds.
platform.os string OS the binary ran on (linux, darwin, windows).
platform.arch string CPU architecture the binary ran on (x86_64, aarch64, …).

Security note — run.invocation: for command-mode runs the invocation is the canonical anc check --command <name> [--audit-profile <X>] [--output json] shape, which is safe to embed publicly. For project-mode runs (target.kind: "project") the invocation may include a local filesystem path (anc check ./local/repo); the site falls back to the synthesized form for those runs to avoid leaking machine-local paths into HTML, markdown, and /llms-full.txt. Mirror this gate downstream if you fetch the JSON directly.

target

What the run was scoring: a command, a binary on disk, or a project tree.

"target": {
  "kind": "command",
  "path": null,
  "command": "rg"
}
Field Type Meaning
kind string One of command, binary, or project. Drives whether the per-tool page's reproduce CTA renders run.invocation verbatim (command) or falls back to the synthesized form (others). Future schema-version source-layer scorecards will use project.
path string | null Filesystem path when kind is project or binary; null for command-mode runs.
command string | null The --command argv string when kind is command; null otherwise. For command-mode runs this equals tool.name.

Security note — target.path: when kind is project, this can carry a local directory path (/home/me/dev/foo). It is not currently rendered on any per-tool page (every leaderboard entry today is command-mode), but downstream consumers reading the JSON should treat it as machine-local.

summary

Counts of the checks the runner actually executed. Adds up to total.

"summary": {
  "total": 11,
  "pass": 8,
  "warn": 0,
  "fail": 0,
  "skip": 3,
  "error": 0
}
Field Type Meaning
total integer Number of checks the runner attempted on this tool. Equals pass + warn + fail + skip + error.
pass integer Checks that succeeded with no concerns.
warn integer Checks that found a soft signal — partial compliance, deprecated pattern, mild inconsistency.
fail integer Checks that found a clear non-compliance.
skip integer Checks the runner correctly judged inapplicable. Either the tool's shape made the check meaningless, or the active audit_profile suppressed it.
error integer The check itself crashed and produced no signal. Not evidence of a defect; not blended into the score.

The headline score on the leaderboard is pass / (pass + warn + fail)skip and error are excluded from the denominator on purpose, as documented on the methodology page.

coverage_summary

Tally of how much of the agentnative spec the run verified. Distinct from summary, which counts the runner's checks. One implemented check can verify zero, one, or many spec requirements; many spec requirements aren't yet covered by any implemented check.

"coverage_summary": {
  "must":   { "total": 23, "verified": 9 },
  "should": { "total": 16, "verified": 0 },
  "may":    { "total": 7,  "verified": 0 }
}
Field Type Meaning
must.total integer Number of MUST-tier requirements in the active spec version. Same for every tool scored at that spec.
must.verified integer MUSTs satisfied by passing checks for this tool.
should.total integer Number of SHOULD-tier requirements in the spec.
should.verified integer SHOULDs satisfied by passing checks for this tool.
may.total integer Number of MAY-tier requirements in the spec.
may.verified integer MAYs satisfied by passing checks for this tool.

If coverage_summary.must.verified is below summary.pass, that's expected — a single passing check can map to multiple MUSTs. If should.verified and may.verified are zero across the board, that's also expected: those tiers are aspirational and will fill in as the runner grows checks mapped to them.

results

Array of one object per check the runner attempted. Order is stable across runs of the same anc version.

{
  "id": "p3-help",
  "label": "Help flag produces useful output",
  "group": "P3",
  "layer": "behavioral",
  "status": "pass",
  "evidence": null,
  "confidence": "high"
}
Field Type Meaning
id string Stable identifier (e.g., p3-help, p1-non-interactive). Citeable in commits and PRs.
label string Human-readable name for the check.
group string Principle group this check belongs to: P1P7. Drives the principles met column on the leaderboard.
layer string behavioral, project, or source. See layers on methodology.
status string pass, warn, fail, skip, or error. Definitions match the summary table above.
evidence string | null Short explanation when status is skip, warn, or fail. Often references the suppressing audit profile or the input that triggered the check.
confidence string high, medium, or low. Reflects how directly the check observed the property — direct flag presence is high; inference from --help text is lower.

status semantics in detail

What is not in the scorecard (yet)

The site is transparent about gaps that future schema bumps may fill. Schema 0.4 closed the tool-identity / generated-at gap (see tool, anc, run, target above). Still outstanding today:

When the anc CLI grows fields for any of the above, the schema page above will be updated and schema_version will bump.

Why is audience null for some tools?

When a tool is scored with audit_profile: human-tui, anc suppresses one or more of the four signal checks the audience classifier consumes. With incomplete inputs, the classifier emits audience: null and sets audience_reason: "suppressed" rather than guess. This is correct, intentional, and shared across every TUI on the leaderboard. The methodology page covers the full classifier definition.