Skip to content

Cognitive JIT / Procedural Learning — Landscape Research

Date: 2026-03-30 Author: Sean Galliher (Architect) Status: Complete — findings incorporated into AD-464 decomposition (AD-531–539)


The Gap

Every ProbOS agent invokes the LLM for every decision, regardless of whether it has solved an identical problem before. No mechanism exists to "compile" successful reasoning into replayable procedures. This wastes tokens, increases latency, and prevents the crew from building institutional expertise.

AD-430 (Action Memory, COMPLETE) closed the memory gap — agents now record every action as an episode. But episodes are raw experience, not distilled knowledge. The question: how do we get from "I remember doing this" to "I know how to do this"?

Landscape Survey

17 projects surveyed across 5 tiers:

Tier 1: Code-as-Skills (Most Relevant)

Project Approach Strengths Gaps
Voyager (NVIDIA, 2023) JavaScript function library built by LLM. Agent writes code to solve Minecraft tasks, stores as named functions, composes hierarchically. Compositional — simple skills build complex ones. Code IS the procedure — deterministic by definition. Curriculum-driven: easy skills first. Single-agent only. No trust/governance. Task-specific (Minecraft). No failure evolution. No multi-agent collaboration.
Cradle (Tencent, 2024) Pre-authored skill library per game environment. Agent selects and parameterizes skills. Clean abstraction between decision and execution. Skills are hand-authored, not learned. No LLM-to-deterministic compilation. Environment-specific.

ProbOS differentiator: Voyager's compositional approach is powerful but operates in a single-agent, single-domain (Minecraft) context with no governance. ProbOS adds: multi-agent compound procedures, trust-gated promotion, observational learning across agents, and graduated compilation (not binary code/no-code).

Tier 2: Prompt/Pipeline Optimization

Project Approach Strengths Gaps
DSPy (Stanford, 2024) Treats LLM pipelines as optimizable programs. Automatically tunes prompts, few-shot examples, and chain-of-thought structure via training data. Systematic, reproducible optimization. Metric-driven. Optimizes prompts, doesn't eliminate them. Still requires LLM at runtime. No zero-token replay. No agent identity.

ProbOS connection: DSPy's optimization principles could enhance Level 2 (Guided) procedures — optimizing the hint prompts that guide LLM-assisted replay. Not a competitor; a complementary technique.

Tier 3: Memory Systems

Project Approach Strengths Gaps
Mengram (2025) Three-tier memory: semantic (facts), episodic (events), procedural (workflows). Auto-clusters episodes by embedding similarity (≥3 → extract procedure). Failure evolution via procedure_feedback(). NL workflow versioning, not code. Failure evolution is novel — procedures learn from specific failure points. Apache 2.0 (~500 stars). Single-agent. No trust governance. No compilation levels (procedures are always NL, never deterministic). No multi-agent collaboration. No observational learning.
Letta (MemGPT team, 2024) Self-editing memory blocks. Agent can modify its own system prompt, core memories, and archival storage. Novel approach: the agent controls its own memory. Memory architecture, not procedural learning. No procedure extraction. No replay.

ProbOS absorption from Mengram: - Episode clustering threshold (cosine similarity, ≥3 episodes) → adopted in AD-531 - Failure evolution (procedure feedback with step-level failure tracking) → adopted in AD-532 negative procedure extraction - NL workflow format → ProbOS uses structured ProcedureStep schema instead, enabling Level 4 deterministic replay that Mengram can't achieve

Tier 4: Reflection & Self-Improvement

Project Approach Strengths Gaps
Reflexion (2023) Verbal reinforcement learning. Agent reflects on failures in natural language, stores reflections, uses them as guidance on retry. No gradient updates — pure text-based learning. Simple, effective. Reflections are prompt injection, not procedures. No compilation. No persistence. Session-only.
ExpeL (2023) Experience-driven rules. Extracts "insights" (rules) from batches of experiences. Rules injected as system prompt guidelines. Explicit rule extraction from experience. Compositional with Reflexion. Rules are guidelines, not executable procedures. Can't replay deterministically. No versioning. No trust.

ProbOS connection: ExpeL's insight extraction is conceptually similar to AD-532 procedure extraction but stops at NL rules. ProbOS goes further: rules → procedures → deterministic replay → graduated compilation.

Tier 5: Frameworks (No Learning)

Project Approach Gaps
AutoGen, CrewAI, LangGraph, Semantic Kernel, Agency Swarm, Claude MCP Orchestration/tooling frameworks No learning, no memory, no procedure extraction
ChatDev, MetaGPT Multi-agent code generation Role-play, not sovereign identity. No memory. No learning.

What No One Does

After surveying 17 projects, ProbOS's Cognitive JIT occupies a unique niche:

Capability Any Existing Project? ProbOS (AD-464)
LLM → deterministic compilation No Yes (AD-535, graduated levels)
Trust-gated procedure promotion No Yes (AD-536, dept chief + Captain)
Multi-agent compound procedures No Yes (AD-532, cross-agent extraction)
Observational learning (learn by watching) No Yes (AD-537, Ward Room observation)
Negative procedures (what NOT to do) No Yes (AD-532, from contradiction detection)
Procedure lifecycle management No Yes (AD-538, decay/re-validate/dedup)
Knowledge gap → training pipeline No Yes (AD-539, gap → Holodeck scenarios)
Graduated compilation (5 levels) No (all binary) Yes (AD-535, Novice→Expert)

Intellectual Lineage

Theory Author ProbOS Mapping
ACT-R: Declarative→Procedural Compilation Anderson (1983) Episodes (declarative) → Procedures (compiled) → Replay (automatic)
Dreyfus Skill Acquisition Model Dreyfus & Dreyfus (1986) 5 compilation levels map to Dreyfus Novice→Expert stages
Social Learning Theory Bandura (1977) AD-537 observational learning — agents learn by watching others
Zone of Proximal Development Vygotsky (1978) Graduated compilation = scaffolding. Level 2 (Guided) is LLM-as-scaffold for developing autonomy
Situated Cognition Lave & Wenger (1991) Procedures are context-bound (preconditions, invariants). Learning is participation in practice, not abstract knowledge transfer

Key Decision: Ship's Records, Not KnowledgeStore

KnowledgeStore was the original intended backend for procedures. Analysis revealed KnowledgeStore has evolved into operational state persistence (trust snapshots, routing weights, agent source code) — not a shared knowledge library. The _store_strategies() path in dreaming.py writes JSON files that nothing ever reads.

Decision: Use Ship's Records (AD-434) as the procedure store backend. Rationale: 1. Git-backed — automatic version history and diff for procedure evolution 2. YAML frontmatter — structured metadata alongside procedure content 3. Classification access control — procedures can be ship-wide or department-scoped 4. Already built and tested 5. Clean separation: KnowledgeStore = operational state, Ship's Records = institutional knowledge

AD-531 replaces the dead extract_strategies() code path with cluster-based pattern detection that feeds into proper procedure extraction.

Cautionary Tale: Dead Strategy Extraction

DreamingEngine.extract_strategies() (AD-383) runs during dream cycles and writes JSON files to KnowledgeStore/strategies/. Nothing in the codebase reads these files. The REL_STRATEGY Hebbian relationship type exists but is never written to by production code. This is write-only dead code — a reminder that building the write side without the read side produces zero value.

AD-531 through AD-534 are designed read-first: the replay dispatch mechanism (AD-534) that consumes procedures is designed before the extraction pipeline that produces them, ensuring every piece of learned knowledge has a consumer.