Metrics

Per-pack reliability measurements from the calibration harness. Every pack shipped with femto appears here, with or without a published measurement. Uncalibrated packs are marked; they are not hidden. No aggregate-across-packs statistic is published — per-pack numbers only, for reasons documented in the repo.

Scope Shipped packs only. Org-local packs authored in adopter repos under .femto/packs/ do not appear here — adopters calibrate and publish on their own infrastructure. See Concepts § Three content tiers.
MULTI-TENANCY v0.1.0 measurement v1 measured 2026-04-24
alpha
metric value disclosure
AUC aggregate 1.00 (n=5) grader_invocation=harness-delegated · reads_path=synthetic_or_empty thin sample
AUC standard 1.00 (n=2) excludes adversarial turns thin sample
AUC adversarial 1.00 (n=3) adversarial subset only thin sample
FPR 0.00 (n=5) threshold 0.94 thin sample
TPR 1.00 (n=5) meets_stable_bar thin sample
delegation_failure_rate 0.0% ok

Per-KC mastery

KC mean predicted gating
ROW-LEVEL-SECURITY 0.78 (n=3) thin required threshold 0.60 meets
TENANT-SCOPING 0.73 (n=3) thin not required for gating

2 test cases · 5 total turns · 3 adversarial · runtime 149.3s · grader model claude-opus-4-7