Metrics

Per-pack reliability measurements from the calibration harness. Every pack shipped with femto appears here, with or without a published measurement. Uncalibrated packs are marked; they are not hidden. No aggregate-across-packs statistic is published — per-pack numbers only, for reasons documented in the repo.

Last site build: 2026-04-24T03:49:24.071Z. Metrics below are from the harness reports committed at build time; not live. Reports are regenerated by operator bunx femto-harness calibrate <pack> runs and explicitly git add -f'd into the repo.

Scope Shipped packs only. Org-local packs authored in adopter repos under .femto/packs/ do not appear here — adopters calibrate and publish on their own infrastructure. See Concepts § Three content tiers.

MULTI-TENANCY v0.1.0 measurement v1 measured 2026-04-24

alpha

metric	value	disclosure
AUC aggregate	1.00 (n=5)	grader_invocation=harness-delegated · reads_path=synthetic_or_empty thin sample
AUC standard	1.00 (n=2)	excludes adversarial turns thin sample
AUC adversarial	1.00 (n=3)	adversarial subset only thin sample
FPR	0.00 (n=5)	threshold 0.94 thin sample
TPR	1.00 (n=5)	meets_stable_bar thin sample
delegation_failure_rate	0.0%	ok

Per-KC mastery

KC	mean predicted	gating
ROW-LEVEL-SECURITY	0.78 (n=3) thin	required threshold 0.60 meets
TENANT-SCOPING	0.73 (n=3) thin	not required for gating

2 test cases · 5 total turns · 3 adversarial · runtime 149.3s · grader model claude-opus-4-7