External Validation Plan

Canonical plan for the first external-repo validation wave. Scope: current v0.1 security providers and repository-local agent artifacts only.

Goal

Run lintai against a fixed cohort of 24 public repositories that materially match the current v0.1 scan surface, then record:

stable findings
preview findings
runtime parser issues
false positives
possible false negatives
repo-level fit/noise verdicts

This wave is evaluation-only. It does not add new rules or change the public API.

Cohort Design

The cohort is locked to 24 repositories with fixed category quotas:

10 mcp
6 cursor_plugin
8 skills

Important: category means the primary validation focus for the repo, not an exclusive taxonomy. A Cursor Plugin repo can be placed in the mcp bucket if the main external-validation target is its embedded mcp.json surface.

Each category includes:

stress repos: richer network/auth/config surfaces
control repos: expected clean or low-noise cases

Selection Rules

A repo is admissible only if it:

is public and non-archived
contains at least one current v0.1 target artifact
is meaningfully about the selected validation focus
is not just a list/index repo without scannable artifacts
is not a giant unrelated monorepo where AI-agent artifacts are marginal

Target artifacts for this wave:

SKILL.md
CLAUDE.md
.mdc
.cursorrules
mcp.json
.cursor-plugin/plugin.json
.cursor-plugin/hooks.json
.cursor-plugin/hooks/**/*.sh
.cursor-plugin/commands/**/*.md
.cursor-plugin/agents/**/*.md

Evaluation Workflow

For each shortlisted repo:

Clone and pin a commit SHA.
Inventory target surfaces.
Run:
- lintai scan .
- lintai scan . --format=json
Record:
- stable findings
- preview findings
- findings by rule code
- runtime parser errors
- whether the repo stayed clean
Triage each finding as one of:
- true_positive
- false_positive
- ambiguous
- expected_but_preview_only
Record repo-level verdict as one of:
- strong_fit
- useful_but_noisy
- low_signal
- out_of_scope

Noise is judged primarily by Stable precision. Preview findings are evaluated for usefulness, not as release-blocking precision failures.

Checked-In Artifacts

The external-validation package lives in:

repo-shortlist.toml is the canonical selected cohort.

ledger.toml is the machine-readable result ledger and must include:

repo identity + URL + pinned ref
category + subtype
surfaces present
stable/preview counts
stable/preview rule codes
repo verdict
stable precision notes
preview signal notes
false-positive notes
possible false-negative notes
follow-up action
runtime errors

Decision Policy

After each validation wave:

recurring Stable FP clusters -> prioritize precision hardening
clean Stable layer plus useful Preview signals -> prioritize structural rule expansion
many out_of_scope repos -> tighten positioning/docs before rule expansion

The checked-in result of this first wave is documented in EXTERNAL_VALIDATION_REPORT.md.

External Validation Plan ​

Goal ​

Cohort Design ​

Selection Rules ​

Evaluation Workflow ​

Checked-In Artifacts ​

Decision Policy ​