llm

Access large language models from the command-line

$ anc audit llm --json

notable Python simonw/llm

MUST 19 / 28SHOULD 13 / 211 failing17 warningsaudited 2026-06-01 · anc 0.5.0

68pass rate

3/8principles met

Eight principles, scored

Behavioral and source checks. Every non-pass row carries a copy-paste remediation prompt.

Non-Interactive by Default

2 flag(s) found in --help but no `[env: NAME]` bindings advertised · no default-value annotations found in --help. SHOULD-tier — agents reading help text need to see what value a flag falls back to when omitted (`[default: <value>]` per clap convention).

Remediation · P1

Make llm satisfy P1 (Non-Interactive by Default) from the agent-native CLI standard. Address:
- Flags advertise env-var bindings in --help (2 flag(s) found in --help but no `[env: NAME]` bindings advertised)
- `--help` advertises default values for flags (no default-value annotations found in --help. SHOULD-tier — agents reading help text need to see what value a flag falls back to when omitted (`[default: <value>]` per clap convention).)
- Rich-TUI affordance for TTY contexts (no rich-TUI affordance detected (no `--tui`/`--interactive`/`--ui` flag, no spinner/progress/tui mention in --help). MAY-tier — rich TUI in TTY contexts is a nice-to-have, not required.)
Requirements: https://anc.dev/p1

warn

Structured, Parseable Output

--output/--format flag detected but could not validate JSON via safe probes (--help/--version override output flags in most CLIs) · no --json or --jsonl short alias found. Agents and pipelines benefit from short forms alongside the canonical `--output` enum.

Remediation · P2

Make llm satisfy P2 (Structured, Parseable Output) from the agent-native CLI standard. Address:
- Structured output support (--output/--format flag detected but could not validate JSON via safe probes (--help/--version override output flags in most CLIs))
- --json / --jsonl short aliases for --output (no --json or --jsonl short alias found. Agents and pipelines benefit from short forms alongside the canonical `--output` enum.)
- `--raw` flag for pipe-safe unformatted output (no `--raw` flag advertised. MAY-tier — useful for pipelines that want to strip formatting before piping to other tools.)
Requirements: https://anc.dev/p2

warn

Progressive Help Discovery

`--version` works but no short alias responded (tried -V, -v, -version). Adding one shortens version probes for agents. · `--version` works but no short alias responded (tried -V, -v, -version). Adding one shortens version probes for agents.

Remediation · P3

Make llm satisfy P3 (Progressive Help Discovery) from the agent-native CLI standard. Address:
- Version flag works (`--version` plus short alias) (`--version` works but no short alias responded (tried -V, -v, -version). Adding one shortens version probes for agents.)
- Version flag works (`--version` plus short alias) (`--version` works but no short alias responded (tried -V, -v, -version). Adding one shortens version probes for agents.)
- `examples` subcommand or `--examples` flag for curated usage patterns (no `examples` subcommand or `--examples` flag found. MAY-tier — a curated usage block keeps agents from hunting through long help text.)
- Short `-h` summary differs from `--help` long form (`-h` and `--help` produce byte-identical output. SHOULD-tier — clap renders the short summary on `-h` and the full description on `--help` when `long_about` is set; collapsing them gives agents no concise list-level grep target.)
- Each subcommand's `--help` ships at least one invocation example (subcommands missing example invocations in their `--help`: aliases, chat, collections, embed, embed-models, fragments, install, keys, logs, models, openai, plugins, schemas, templates, tools, uninstall. Examples teach agents the call shape faster than option tables; use clap's `after_help` or a dedicated `Examples:` block.)
- Help text pairs human and `--output json` example invocations (no paired text + `--output json` example found within 5 lines in top-level or any subcommand `--help`. Pairing keeps agents from reverse-engineering the JSON invocation from the text one.)
Requirements: https://anc.dev/p3

fail

P4
Fail-Fast, Actionable Errors
All 3 checks pass.
pass
P5
Safe Retries & Mutation Boundaries
All 2 checks pass.
pass

Composable, Predictable Command Structure

3/18 subcommand(s) follow standard verb names. Non-standard: aliases, chat, collections, embed, embed-models, embed-multi, fragments, keys, models, openai, plugins, schemas, similar, templates, tools. MAY-tier — community-standard verbs (get/list/create/update/delete) help agents predict subcommand behavior across CLIs. · no `--color` flag advertised. MAY-tier — `auto|always|never` lets agents and pipelines override the TTY-based default.

Remediation · P6

Make llm satisfy P6 (Composable, Predictable Command Structure) from the agent-native CLI standard. Address:
- Subcommand verbs follow community-standard names (3/18 subcommand(s) follow standard verb names. Non-standard: aliases, chat, collections, embed, embed-models, embed-multi, fragments, keys, models, openai, plugins, schemas, similar, templates, tools. MAY-tier — community-standard verbs (get/list/create/update/delete) help agents predict subcommand behavior across CLIs.)
- `--color` flag for explicit color control (no `--color` flag advertised. MAY-tier — `auto|always|never` lets agents and pipelines override the TTY-based default.)
- Subcommand naming follows a consistent verb/noun convention (subcommand naming is inconsistent: 7 non-verb subcommand(s) (aliases, collections, fragments, keys, logs, schemas, templates) mix verb and non-verb children at the second level, so an agent cannot predict where the action lives. SHOULD-tier: pick a consistent shape (all verb-first, all noun-verb hierarchy, or any combination where each non-verb group's children are uniformly verbs). The verb list is a heuristic; inspect `--help` to confirm.)
Requirements: https://anc.dev/p6

warn

Bounded, High-Signal Responses

no --quiet/-q flag detected in --help output · no `--verbose` / `-v` flag advertised. SHOULD-tier — agents debugging failures need a way to escalate diagnostic detail.

Remediation · P7

Make llm satisfy P7 (Bounded, High-Signal Responses) from the agent-native CLI standard. Address:
- Quiet mode available (no --quiet/-q flag detected in --help output)
- `--verbose` flag for diagnostic escalation (no `--verbose` / `-v` flag advertised. SHOULD-tier — agents debugging failures need a way to escalate diagnostic detail.)
- Help text advertises TTY-aware verbosity behavior (no TTY-aware language found in `--help`. MAY-tier — automatic verbosity reduction when stdout is piped or redirected lets agents skip the explicit `--quiet` flag. Behavioral probes cannot simulate a real TTY without a pty crate, so this audit relies on documented intent.)
Requirements: https://anc.dev/p7

warn

P8
Discoverable Through Agent Skill Bundles
All 3 checks pass.
pass

All Audits

P1: Non-Interactive by Default

PASS	Non-interactive by default
SKIP	Non-interactive gate flag advertised in --help	target satisfies P1 via alternative gate (help-on-bare or stdin-primary)
WARN	Flags advertise env-var bindings in --help	2 flag(s) found in --help but no `[env: NAME]` bindings advertised
PASS	Secret-bearing flags expose stdin or *-file companion
WARN	`--help` advertises default values for flags	no default-value annotations found in --help. SHOULD-tier — agents reading help text need to see what value a flag falls back to when omitted (`[default: <value>]` per clap convention).
WARN	Rich-TUI affordance for TTY contexts	no rich-TUI affordance detected (no `--tui`/`--interactive`/`--ui` flag, no spinner/progress/tui mention in --help). MAY-tier — rich TUI in TTY contexts is a nice-to-have, not required.

P2: Structured, Parseable Output

WARN	Structured output support	--output/--format flag detected but could not validate JSON via safe probes (--help/--version override output flags in most CLIs)
SKIP	Structured-output CLI exposes its schema at runtime	no structured-output indicator (--output / --format / json / jsonl) in --help
WARN	--json / --jsonl short aliases for --output	no --json or --jsonl short alias found. Agents and pipelines benefit from short forms alongside the canonical `--output` enum.
WARN	`--raw` flag for pipe-safe unformatted output	no `--raw` flag advertised. MAY-tier — useful for pipelines that want to strip formatting before piping to other tools.
SKIP	`--output` advertises additional formats beyond text/json	no `--output` or `--format` flag advertised; vacuous skip for MAY-tier extra formats.
PASS	Bad invocation exits with structured usage-error code (2)
SKIP	Errors emit JSON envelope with `error`/`kind`/`message` under `--output json`	binary does not advertise `--output json` in --help; MUST applies only to CLIs that opt into the JSON contract.
SKIP	JSON success and error envelopes share their non-payload key set	binary does not advertise `--output json` in --help; envelope-consistency only applies to CLIs that opt into the JSON contract.

P3: Progressive Help Discovery

PASS	Help flag produces useful output
WARN	Version flag works (`--version` plus short alias)	`--version` works but no short alias responded (tried -V, -v, -version). Adding one shortens version probes for agents.
WARN	Version flag works (`--version` plus short alias)	`--version` works but no short alias responded (tried -V, -v, -version). Adding one shortens version probes for agents.
WARN	`examples` subcommand or `--examples` flag for curated usage patterns	no `examples` subcommand or `--examples` flag found. MAY-tier — a curated usage block keeps agents from hunting through long help text.
WARN	Short `-h` summary differs from `--help` long form	`-h` and `--help` produce byte-identical output. SHOULD-tier — clap renders the short summary on `-h` and the full description on `--help` when `long_about` is set; collapsing them gives agents no concise list-level grep target.
FAIL	Each subcommand's `--help` ships at least one invocation example	subcommands missing example invocations in their `--help`: aliases, chat, collections, embed, embed-models, fragments, install, keys, logs, models, openai, plugins, schemas, templates, tools, uninstall. Examples teach agents the call shape faster than option tables; use clap's `after_help` or a dedicated `Examples:` block.
WARN	Help text pairs human and `--output json` example invocations	no paired text + `--output json` example found within 5 lines in top-level or any subcommand `--help`. Pairing keeps agents from reverse-engineering the JSON invocation from the text one.

P4: Fail-Fast, Actionable Errors

PASS	Rejects invalid arguments
PASS	Error messages include a hint or remediation phrase
SKIP	`--output json` produces JSON-formatted errors	binary does not advertise `--output json` in --help; SHOULD applies only to CLIs that opt into the JSON contract.

P5: Safe Retries & Mutation Boundaries

SKIP	Destructive subcommands require `--force` or `--yes`	no destructive subcommands detected; MUST applies conditionally to CLIs with destructive operations.
SKIP	Read and write surfaces are both visible in subcommand list	no recognizable read or write subcommand verbs; the read/write distinction is unobservable from the help surface alone.

P6: Composable, Predictable Command Structure

PASS	Handles SIGPIPE gracefully
SKIP	Pager-using CLI ships --no-pager escape hatch	no pager signal (less/more/$PAGER/--pager) in --help
PASS	Respects NO_COLOR
WARN	Subcommand verbs follow community-standard names	3/18 subcommand(s) follow standard verb names. Non-standard: aliases, chat, collections, embed, embed-models, embed-multi, fragments, keys, models, openai, plugins, schemas, similar, templates, tools. MAY-tier — community-standard verbs (get/list/create/update/delete) help agents predict subcommand behavior across CLIs.
WARN	`--color` flag for explicit color control	no `--color` flag advertised. MAY-tier — `auto\|always\|never` lets agents and pipelines override the TTY-based default.
SKIP	Input-accepting commands read from stdin when no file is given	no input-accepting subcommand detected (process/parse/convert/transform/analyze/validate/format/lint/audit); vacuous skip for the conditional SHOULD.
WARN	Subcommand naming follows a consistent verb/noun convention	subcommand naming is inconsistent: 7 non-verb subcommand(s) (aliases, collections, fragments, keys, logs, schemas, templates) mix verb and non-verb children at the second level, so an agent cannot predict where the action lives. SHOULD-tier: pick a consistent shape (all verb-first, all noun-verb hierarchy, or any combination where each non-verb group's children are uniformly verbs). The verb list is a heuristic; inspect `--help` to confirm.
PASS	Operations are subcommands, not verb-shaped flags

P7: Bounded, High-Signal Responses

WARN	Quiet mode available	no --quiet/-q flag detected in --help output
WARN	`--verbose` flag for diagnostic escalation	no `--verbose` / `-v` flag advertised. SHOULD-tier — agents debugging failures need a way to escalate diagnostic detail.
SKIP	`--limit` / `--max-results` flag for list operations	no list-style subcommand detected (list/ls/search/query/find/show/get); vacuous skip for the list-only SHOULD.
SKIP	Cursor-based pagination flags for list traversal	no list-style subcommand detected; vacuous skip for the list-only MAY.
SKIP	`--timeout` flag for long-running operations	no long-running subcommand detected (serve/daemon/watch/tail/monitor/follow/run/start/stream); vacuous skip for the conditional SHOULD.
WARN	Help text advertises TTY-aware verbosity behavior	no TTY-aware language found in `--help`. MAY-tier — automatic verbosity reduction when stdout is piped or redirected lets agents skip the explicit `--quiet` flag. Behavioral probes cannot simulate a real TTY without a pty crate, so this audit relies on documented intent.

P8: Discoverable Through Agent Skill Bundles

PASS	Skill bundle has install path (`tool skill install [<host>]`)
PASS	`skill install --all` for multi-runtime install
PASS	`skill update` / `skill upgrade` for bundle refresh

Details

Version scored: 0.31
Audit date: 2026-06-01 17:36:00 UTC
Duration: 16.4s
Platform: linux/x86_64
Mode: command
Anc build: 0.5.0
Install: brew install llm

Embed the badge

The badge floor is 70%; this scorecard is at 68% (2 points below). Once the score clears the floor, the embed snippet will appear here. The top issues above are the place to start.

Reproduce this scorecard for llm locally and inspect the failing audits:

anc audit --command llm --output json

Install anc first if you don't have it. Add --output json to get the same JSON shape committed under scorecards/.