Concepts
Knowledge Components
A Knowledge Component (KC) is the unit of domain knowledge femto reasons about. The structure is deliberately narrow so contributors write KCs in the same shape and the probe / grader can consume them uniformly.
Every KC is a markdown file with six required sections:
What the thing is, in domain terms. No code yet.
What must be true for the concept to hold. The load-bearing properties the engineer has to preserve.
How specific technologies implement it. PostgreSQL RLS, Supabase Auth, middleware patterns, etc.
How it breaks in the wild. Named failure-mode patterns, not just “be careful.”
What an expert looks for when troubleshooting. The hardest field to write and the one that carries most of the probe’s signal.
An authoritative source per claim — RFC, vendor doc, OWASP cheat sheet, peer-reviewed reference. Every bullet carries its own link; unsourced assertions fail schema validation.
Packs group related KCs (e.g., the multi-tenancy pack ships
row-level-security and tenant-scoping) and declare which KCs gate
code emission via required_for_gating.
The Socratic probe
The probe is pre-emission: it happens before the agent can edit or write. The engineer — not the agent — specifies the fix at domain level. The probe keeps asking even on correct answers until every required KC has hit its per-KC turn minimum.
The terminator is a checklist, not an LLM verdict. “The grader is satisfied” does not end the probe on its own; “every required KC has been touched enough times” does. That keeps the probe from drifting into single-mechanism gaming via framing attacks on the grader.
Probe, grader, and hook are separated
Three roles, three separate contexts — bridged only by files on disk:
- The probe (MCP server) drives the dialogue, tracks coverage, and
serializes turns to
probe-log.md. It does not decide mastery. - The grader (separate subagent, different model) reads the serialized
log plus the reads and writes
grader.md. It never sees live dialogue state; it cannot be talked into a higher score mid-turn. - The hook (PreToolUse) reads
grader.mdfrom disk and checks per-KC mastery against the pack threshold. It does not call an LLM. The emit-time decision is a file-system read, not a model prediction.
Files are the contract between layers. If any single layer is compromised
or lies, the artifacts on disk still tell the truth — and every session
ships events.jsonl as an append-only audit log.
Three content tiers
Femto’s content lives in three tiers with distinct authorship and trust models. The middle tier is what you see on the Packs page; the outer tiers matter for real adoption.
Library-shipped docs (llms.txt, bundled markdown) —
Next.js, Auth0, Clerk, Supabase, Stripe, Better Auth. Femto orchestrates
these on demand; it does not bundle or rehost them.
Packs shipped with femto under packs/*/. Operator-curated,
community-PR-extended, harness-calibrated, reliability published on
Metrics. Gate 1 strong-with-transparency in full.
Packs authored in the adopter’s own repository at
.femto/packs/*/. Same schema, same structural enforcement —
but the adopter owns the content, calibration, and trust boundary. No
public reliability claim from femto for these packs.
On session start the MCP server walks shipped packs first and org-local
packs second. On id collision, shipped wins — an org-local pack cannot
shadow a shipped one by reusing its id. Events log the source of every
loaded pack so the trust boundary is visible in
events.jsonl.
For compliance-grade work (HIPAA, PCI, SOC2, NIST SP800 controls,
internal engineering policies), this is the path: write the KCs against
the pack schema, commit them under your repo’s .femto/packs/,
and femto’s probe → grader → hook loop runs over your content exactly as
it does over shipped packs. See Contributing
for the authoring walkthrough.
Gate 1 — strong with transparency
Femto does not claim guaranteed understanding. Three readings of that commitment were on the table:
Strict as guarantee. Provably zero bypass, zero false passes. Unbuildable at current state-of-art — best published LLM-based mastery detection sits around 76% AUC. Delivering a guarantee requires either a trick definition of “understanding” or research beyond what shipped tooling can honestly promise.
Strict as design discipline. No bypass, stacked mechanisms, residual unmeasured. Ships as product but the claim is unfalsifiable from outside — adopters can’t audit it.
Strong with transparency. No designed-in bypass, stacked mechanisms, residual rate measured and published per domain. Ships as product with real artifacts — harness, test cases, numbers — that adopters and regulators can inspect and replicate. This is what femto commits to.
Concretely that means every pack has a harness (test cases with
held-out ground truth), a published AUC / FPR / TPR, and disclosure
labels sitting next to every number telling you whether the measurement
was harness-path (synthetic or empty reads, delegated grader) or
production-path. Uncalibrated packs appear on the metrics page with an
explicit not_yet_published tag rather than an elided zero.
Further reading
- Metrics — current per-pack numbers.
- Packs — shipped packs and their maturity tier.
- Contributing — KC schema and test-case contribution guide.
- docs/calibration.md — full harness run-flow and the three mandatory disclosure conditions.