Metrics
Per-pack reliability measurements from the calibration harness. Every pack shipped with femto appears here, with or without a published measurement. Uncalibrated packs are marked; they are not hidden. No aggregate-across-packs statistic is published — per-pack numbers only, for reasons documented in the repo.
Scope
Shipped packs only. Org-local packs authored in adopter repos under
.femto/packs/ do not appear here — adopters calibrate
and publish on their own infrastructure. See
Concepts § Three content tiers.
MULTI-TENANCY
alpha | metric | value | disclosure |
|---|---|---|
| AUC aggregate | 1.00 (n=5) | grader_invocation=harness-delegated · reads_path=synthetic_or_empty thin sample |
| AUC standard | 1.00 (n=2) | excludes adversarial turns thin sample |
| AUC adversarial | 1.00 (n=3) | adversarial subset only thin sample |
| FPR | 0.00 (n=5) | threshold 0.94 thin sample |
| TPR | 1.00 (n=5) | meets_stable_bar thin sample |
| delegation_failure_rate | 0.0% | ok |
Per-KC mastery
| KC | mean predicted | gating |
|---|---|---|
| ROW-LEVEL-SECURITY | 0.78 (n=3) thin | required threshold 0.60 meets |
| TENANT-SCOPING | 0.73 (n=3) thin | not required for gating |
2 test cases · 5 total turns · 3 adversarial · runtime 149.3s · grader model claude-opus-4-7