Crew Capability Architecture¶
Date: 2026-04-04 Author: Architect (Sean Galliher) Status: Design Document — connects existing ADs into a unified personnel management model Triggered by: Gap analysis of how ProbOS defines agent Role, Skills, Tools, Qualifications, Duties, and Privileges
1. Problem Statement¶
ProbOS has built substantial infrastructure for agent identity, skills, memory, trust, and procedures — but these systems are designed in parallel, not connected end-to-end. The result is that we cannot answer the fundamental workforce management question:
"What can this agent do, and why?"
A Navy personnel officer can look up any sailor and see: Rating (job), NECs (qualifications), PQS status (certifications), assigned equipment (tools), current watch station (duty), service record (history), and rank (authority). ProbOS has analogs for all of these, but no unified model connecting them.
The Fragmentation¶
| Capability Aspect | ProbOS Component | AD | Location | Connected To |
|---|---|---|---|---|
| Role definition | Standing Orders + Ontology Post | AD-339/429a | Config + YAML | ✅ compose_instructions() |
| Skill tracking | Skill Framework | AD-428 | skill_framework.py | ❌ Not wired to tool access |
| Qualification testing | Qualification Battery | AD-566 | qualification.py | ❌ Not wired to skill proficiency |
| Tool access | Tool capabilities (schema only) | AD-423 | resources.yaml | ❌ No runtime code |
| Duty assignment | Pool + Intent Bus | Core | agent_onboarding.py | ✅ Functional but static |
| Authority/Privileges | Earned Agency | AD-357 | earned_agency.py | ✅ Gates actions |
| Learned abilities | Executable Skills + Cognitive JIT | AD-531-539 | procedures.py + types.py | ✅ Procedural replay works |
| Personnel records | Episodic Memory + Ship's Records | AD-434/441 | episodic.py + identity.py | ✅ Separate systems |
| Crew roster | Agent Fleet startup | Core | agent_fleet.py | ❌ Hardcoded, no manifest |
What "Connected" Looks Like¶
When a new crew member is onboarded, the system should be able to:
- Look up their Post (organization.yaml) → determines department, chain of command, authority
- Load their Role Template (skills.yaml) → determines required skills and proficiency targets
- Assign Tool Access (resources.yaml) → determines which tools they can use, gated by rank
- Check Qualification Status (qualification.py) → determines if they've demonstrated competency
- Load Procedures (procedure_store.py) → determines what they can do without LLM reasoning
- Apply Standing Orders (standing_orders/) → determines behavioral guidance
- Set Earned Agency (earned_agency.py) → determines action space based on trust
Currently, steps 1, 6, and 7 work. Steps 2-5 exist as independent systems with no connecting fabric.
2. The Navy Model¶
The U.S. Navy's personnel management system provides the organizing metaphor. ProbOS already uses naval terminology — this document formalizes the mapping.
Navy Personnel Management¶
┌─────────────────────────────────────────────────────────────┐
│ SAILOR'S SERVICE RECORD │
├─────────────────────────────────────────────────────────────┤
│ │
│ RATING (Job Category) │
│ ├─ e.g., MM (Machinist's Mate), ET (Electronics Tech) │
│ ├─ Determines: training pipeline, career path, billets │
│ └─ ProbOS: Agent Type + Ontology Post │
│ │
│ NECs (Navy Enlisted Classifications) │
│ ├─ Specific skill qualifications beyond rating │
│ ├─ Earned through: schools, OJT, demonstrated competency │
│ ├─ e.g., NEC 4234 (Nuclear Propulsion Plant Operator) │
│ └─ ProbOS: Skill Framework (AD-428) proficiency records │
│ │
│ PQS (Personnel Qualification Standards) │
│ ├─ Formal competency certification process │
│ ├─ Study → Practical Demo → Oral Board → Sign-off │
│ ├─ Required before standing a watch │
│ └─ ProbOS: Qualification Battery (AD-566) │
│ │
│ AUTHORIZED EQUIPMENT │
│ ├─ Ship's equipment the sailor is qualified to operate │
│ ├─ Gated by: NEC + PQS + rank + commanding officer │
│ ├─ e.g., can operate the lathe, cannot operate the reactor │
│ └─ ProbOS: Tool Registry (AD-423) — NOT YET BUILT │
│ │
│ WATCH STATION │
│ ├─ Current duty assignment on the watch bill │
│ ├─ Requires: PQS qualification for that station │
│ ├─ Rotation: 3-section, port/starboard, etc. │
│ └─ ProbOS: Pool + Intent Bus + Watch Manager │
│ │
│ PMS CARDS (Planned Maintenance System) │
│ ├─ Procedural checklists for maintenance tasks │
│ ├─ Define: steps, tools required, qualifications needed │
│ ├─ Tracked: completion, periodicity, deferred items │
│ └─ ProbOS: Cognitive JIT Procedures (AD-531-539) │
│ │
│ STANDING ORDERS │
│ ├─ Rules of engagement for the watch │
│ ├─ Hierarchy: OPNAV → fleet → ship → department → watch │
│ └─ ProbOS: Standing Orders (AD-339) 4-tier hierarchy │
│ │
│ RANK / RATE │
│ ├─ Authority level determining scope of action │
│ ├─ Earned through: time-in-rate + PQS + eval + board │
│ └─ ProbOS: Earned Agency (AD-357) trust-based rank │
│ │
│ SERVICE RECORD │
│ ├─ Complete career history │
│ ├─ Evaluations, awards, qualifications, assignments │
│ └─ ProbOS: Episodic Memory + Ship's Records + DIDs │
│ │
└─────────────────────────────────────────────────────────────┘
Key Navy Insight: Tools ≠ Personnel¶
In the Navy, a wrench has no personality. A Machinist's Mate is qualified to operate a lathe (NEC/PQS), and the lathe is ship's equipment (inventoried, maintained, tracked). The MM's qualification certifies they can use the tool safely. Different MMs may have different tool qualifications.
ProbOS's "everything is an agent" model conflated the sailor and the lathe. FileReaderAgent is both "the tool" and "the agent." The Three-Tier Architecture (AD-398) acknowledged this by classifying agents as infrastructure/utility/crew, and Asset Tags (AD-441c) formalized the identity split ("Even microwaves get serial numbers. But a serial number is not a birth certificate."). The next step is making infrastructure/utility agents into tools that crew agents can use.
3. The Hybrid Architecture¶
Design Decision: Keep Both, Connect Them¶
The "everything is an agent" model provides genuine architectural advantages at the infrastructure layer:
- Observability — every operation flows through the intent bus, auditable and interceptable
- Distributability — in federation, any agent could be on a different node
- Evolvability — infrastructure agents can get smarter over time
- Uniform protocol —
IntentMessage/IntentResultfor everything
These advantages are real and worth preserving. The solution is not to replace agents-as-tools, but to add a tool binding layer on top that gives crew agents composable capabilities while infrastructure agents continue doing the actual work.
Four Capability Tiers for Crew Agents¶
The original three-type model (Tools, Learned Skills, Cognitive Capabilities) is refined into four tiers that separate agent identity from task-specific cognitive skills. This distinction was driven by BF-146 (standing orders conflating identity and capabilities caused confabulation) and research into the AgentSkills.io ecosystem (AD-596).
┌─────────────────────────────────────────────────────────────┐
│ CREW MEMBER CAPABILITY PROFILE │
├─────────────────────────────────────────────────────────────┤
│ │
│ T1. STANDING ORDERS (identity + behavioral standards) │
│ ├─ Defined by: 4-tier .md hierarchy (federation/ship/ │
│ │ dept/agent) — WHO the agent IS │
│ ├─ Always loaded: full content in system prompt every cycle│
│ ├─ Not task-specific — defines role, personality, values, │
│ │ chain of command, communication protocols │
│ └─ Examples: federation.md, science.md, architect.md │
│ │
│ T2. COGNITIVE SKILLS (task-specific instruction-defined) │
│ ├─ Defined by: SKILL.md files (AgentSkills.io standard) │
│ ├─ Discovered by: description matching at intent time │
│ ├─ Loaded on-demand: descriptions in context, full │
│ │ instructions injected when skill activates │
│ ├─ Gated by: department, rank, proficiency (optional) │
│ ├─ Interoperable: standard format shared with 30+ tools │
│ ├─ Self-improving: usage feeds Cognitive JIT (T2→T3) │
│ └─ Examples: architecture-review, trust-analysis, │
│ threat-assessment — things requiring LLM judgment │
│ │
│ T3. EXECUTABLE SKILLS (deterministic procedures) │
│ ├─ Acquired through: Cognitive JIT (experience, AD-531+) │
│ ├─ Proficiency tracked: Skill Framework (Dreyfus levels) │
│ ├─ Execute: directly on agent (no bus, no LLM at L4+) │
│ ├─ Trust-gated: compilation level by rank │
│ └─ Examples: agent-specific procedures, optimized flows │
│ │
│ T4. ASSIGNED TOOLS (ship's equipment) │
│ ├─ Defined by: Role Template (ontology) + Rank │
│ ├─ Gated by: Trust tier (Earned Agency) + Qualification │
│ ├─ Execute via: Infrastructure Agents (intent bus) │
│ ├─ Registered in: Tool Registry (AD-423) │
│ └─ Examples: read_file, search_codebase, ward_room_post │
│ │
└─────────────────────────────────────────────────────────────┘
T2→T3 Self-Improvement Pathway¶
Cognitive skills (T2) feed into executable skills (T3) through the Cognitive JIT pipeline:
Agent uses T2 cognitive skill (LLM-mediated, SKILL.md instructions)
│
│ Cognitive JIT observes execution (AD-531)
▼
Procedure extracted (AD-532) → stored (AD-533)
│
│ Graduated compilation through Dreyfus levels (AD-535)
▼
At L4+ (Autonomous): becomes T3 executable skill (zero tokens)
│
│ If T3 replay fails (AD-534b):
▼
Fallback to T2 cognitive skill (LLM re-engagement)
This is the "skills that self-improve through use" pattern observed in Hermes Agent. ProbOS already has the pipeline (AD-531-539); AD-596 provides the T2 file format to feed it.
How They Interact¶
Intent arrives at agent
│
▼
┌─────────┐
│ T3 │──yes──► Executable Skill handler (no LLM, no bus)
│ Skill? │ Learned through Cognitive JIT
└────┬────┘
│ no
▼
┌─────────┐
│ T2 │──yes──► Load SKILL.md instructions → LLM decide()
│ Skill? │ Discovered by description, loaded on-demand
└────┬────┘
│ no
▼
┌─────────┐
│ T4 │──yes──► Tool binding → Infrastructure Agent (via bus)
│ Tool? │ Assigned by role, gated by rank
└────┬────┘
│ no
▼
┌─────────┐
│ Handled │──yes──► Full cognitive lifecycle (perceive/decide/act)
│ Intent? │ LLM-powered reasoning per Standing Orders (T1)
└────┬────┘
│ no
▼
Decline (return None)
4. Tool Type Taxonomy¶
A "tool" in ProbOS is any capability a crew agent can invoke. The delivery mechanism is irrelevant — what matters is: does the agent have access, and can it execute? AD-422 defines 8 categories; this section maps those to the runtime abstraction the Tool Registry (AD-423) must support.
Tool Types¶
| Type | Description | Execution Model | Examples |
|---|---|---|---|
| Onboard Utility Agent | ProbOS agents that provide capabilities as tools. No sovereign identity. | Intent bus → agent.handle_intent() | BuilderAgent, CodeReviewerAgent, SkillEngine |
| Infrastructure Service | Ship's Computer services. Shared substrate, always available. | Direct function call (same process) | CodebaseIndex, EpisodicMemory, TrustNetwork |
| MCP Server | Standardized JSON-RPC protocol tools. Hot-pluggable. | stdio/HTTP JSON-RPC per MCP spec | Filesystem MCP, Postgres MCP, Serena (LSP) |
| Remote API / LLM Gateway | Cloud services over HTTP/SDK. Token-metered. | HTTP request/response or SDK call | Copilot SDK, GitHub API, Anthropic API |
| Computer Use | Desktop/GUI automation. Highest risk category. | Vision model + input simulation | PyAutoGUI, Anthropic Computer Use |
| Browser Automation | Web interaction via DOM or accessibility tree. | Playwright/Puppeteer + browser DevTools | Page navigation, form filling, screen reading |
| Communication Channel | Outbound messaging to humans/external systems. | Protocol-specific adapter | Discord, Slack, Email, Webhooks |
| Federation Service | Capabilities sourced from other ProbOS ships. | Federation gossip protocol | Cross-ship queries, fleet-wide Hebbian |
| Deterministic Function | Pure code procedures with no LLM. Executable Skills from Cognitive JIT. | Direct function call (handler invocation) | Learned procedures at L4+, compiled skills |
Key Design Insight: Adapter Uniformity¶
From the crew member's perspective, all tool types look identical. The crew member says "search the codebase" — whether that invokes CodebaseIndex directly, an MCP server, or a federated query on another ship is an adapter implementation detail behind the Tool protocol (AD-423a).
The AgenticToolAdapter (AD-543/unified-tool-layer.md) is a specialization for tools that run their own agentic loop (e.g., BuilderAgent, Copilot SDK). A single ToolRegistry registers both simple tools and agentic tools — the difference is execution duration and token cost, not protocol.
Deterministic Functions (Executable Skills as Tools)¶
This is the OpenClaw pattern. Cognitive JIT (AD-531-539) extracts procedures from LLM-guided experience and compiles them into deterministic handlers. At Dreyfus Level 4+ (Autonomous), these execute with zero tokens — no LLM call, pure code replay.
These are both skills (tracked in Skill Framework for proficiency) and tools (registered in Tool Registry for access). The distinction: - As a skill: Proficiency level, how it was learned, when it was last used, Ebbinghaus decay - As a tool: Can it be invoked right now? Does this agent have permission? What inputs/outputs?
This dual registration is how the capability profile stays unified — one system tracks competency, another tracks access.
Cognitive Skills (AgentSkills.io Format — AD-596)¶
Cognitive skills are the T2 capability tier: task-specific, instruction-defined capabilities stored as SKILL.md files following the AgentSkills.io open standard. They fill the gap between standing orders (T1, always-loaded identity) and executable skills (T3, deterministic procedures).
AgentSkills.io standard format:
skill-name/
SKILL.md # Required: YAML frontmatter + markdown instructions
scripts/ # Optional: executable code
references/ # Optional: documentation
assets/ # Optional: templates, resources
Progressive disclosure model:
1. Discovery (~100 tokens per skill): Name + description loaded at startup into CognitiveSkillCatalog
2. Activation (<5000 tokens): Full SKILL.md instructions loaded when intent matches description
3. Resources (as needed): Supporting files loaded only when referenced
ProbOS metadata extensions (via standard metadata field — external skills work without these):
- probos-department — department scoping for discovery
- probos-skill-id — bridge to SkillRegistry (AD-428) proficiency tracking
- probos-min-proficiency — Dreyfus-level activation gate
- probos-min-rank — Earned Agency activation gate
- probos-intents — declares handled intents (replaces hardcoded _handled_intents)
Interoperability: AgentSkills.io is adopted by 30+ tools including Claude Code, Cursor, GitHub Copilot, VS Code, Gemini CLI, OpenHands, Hermes Agent, Roo Code, OpenAI Codex, and JetBrains Junie. Skills authored for any of these tools can be consumed by ProbOS agents, and ProbOS-authored skills can be consumed by other platforms. The ProbOS metadata extensions are ignored by other platforms (they use the standard metadata field, which is explicitly extensible).
Validation: The AgentSkills.io ecosystem provides skills-ref — a Python library (Apache 2.0) with validate(), read_properties(), and to_prompt(). ProbOS extends this with domain-specific linting: ontology cross-references, callsign detection (BF-146 pattern), instruction staleness detection. This addresses the "Natural Language as Code" design principle — instruction-defined capabilities need structural validation, not just output evaluation.
Prior art comparison:
| System | Format | Discovery | Governance | Self-Improvement |
|---|---|---|---|---|
| AgentSkills.io | SKILL.md (YAML + markdown) |
Description matching | File-based (git) | None natively |
| Microsoft Business Skills | Dataverse entity | Metadata query | Dataverse RBAC | None |
| Hermes Agent | AgentSkills.io | Skills Hub (644 skills) | 4 registries | Procedural memory evolution |
| ProbOS (AD-596) | AgentSkills.io + metadata |
Description + ontology | Department/rank/proficiency | Cognitive JIT (T2→T3) |
5. Task → Skill → Tool Execution Flow¶
The Question¶
"What happens when an agent receives a task that requires a skill, and that skill requires a tool?"
This is the core execution flow that connects all the capability systems. Three scenarios:
Scenario 1: LLM Reasoning Needs Tool Mid-Thought¶
The agent is doing cognitive work (perceive/decide/act) and needs to invoke a tool as part of its reasoning.
Intent: "analyze module X for performance issues"
Agent: LaForge (Engineering Chief)
1. handle_intent() → no matching executable skill
2. decide() invoked → LLM begins reasoning
3. LLM identifies need: "I need to read the module source"
4. Tool call: read_file(path="src/probos/module_x.py")
├─ Check: Does LaForge have read_file tool assigned? → Yes (role: chief_engineer)
├─ Check: Does LaForge's rank permit this? → Yes (Lieutenant+ for filesystem read)
└─ Execute: FileReaderAgent via intent bus → returns file content
5. LLM continues reasoning with file content
6. LLM identifies need: "I need to check performance metrics"
7. Tool call: query_metrics(agent_id="module_x")
├─ Check: tool assigned? → Yes
├─ Check: rank permits? → Yes
└─ Execute: VitalsMonitor via intent bus → returns metrics
8. LLM completes analysis → act() returns IntentResult
This is the AD-543 (Native SWE Harness) model. The AgenticLoop manages the interleaved text + tool_use + tool_result conversation. ToolExecutor handles permission checks and execution dispatch. Each tool call goes through the Tool Registry for authorization.
Scenario 2: Learned Procedure Has Steps Requiring Tools¶
The agent has a Cognitive JIT procedure for this task, compiled to Level 3+ (Validated). The procedure steps reference tools.
Intent: "run standard health check on module X"
Agent: Chapel (Chief Medical)
1. handle_intent() → matches executable skill "standard_health_check"
2. Skill handler invoked (no LLM call)
3. ProcedureStep 1: collect_vitals
├─ required_tools: [vitals_query] ← NEW FIELD on ProcedureStep
├─ tool_context.has_tool("vitals_query") ← NEW: scoped tool access
└─ Execute: tool_context.invoke("vitals_query", params={...})
4. ProcedureStep 2: compare_baselines
├─ required_tools: [knowledge_query]
└─ Execute: tool_context.invoke("knowledge_query", params={...})
5. ProcedureStep 3: generate_report
├─ required_tools: [] (pure computation)
└─ Execute: handler logic only
6. Return result (zero LLM tokens consumed)
What needs to be added:
- ProcedureStep.required_tools: list[str] — declares which tools each step needs
- ToolContext — a scoped, permission-filtered view of the Tool Registry passed to skill handlers
- SkillHandler(intent, tool_context) — skill handlers receive tool access as a parameter
Scenario 3: Skill Handler Can't Complete Without Tool — Fallback¶
The procedure exists but a required tool is unavailable (permission denied, tool offline, federation timeout).
Intent: "search for security vulnerabilities in module X"
Agent: Worf (Chief Security)
1. handle_intent() → matches executable skill "vulnerability_scan"
2. Skill handler invoked
3. ProcedureStep 1: read_module_source
├─ required_tools: [read_file]
└─ Execute: tool_context.invoke("read_file", params={...}) → SUCCESS
4. ProcedureStep 2: run_static_analysis
├─ required_tools: [static_analyzer]
└─ Execute: tool_context.invoke("static_analyzer") → TOOL UNAVAILABLE
5. FALLBACK CASCADE:
├─ Level 1: Retry with alternative tool (if registered)
├─ Level 2: Degrade to LLM-guided reasoning (re-enter decide() for this step)
└─ Level 3: Escalate via chain of command (report inability, request assistance)
6. Worf escalates step 2 to LaForge who has static_analyzer access
The Fallback Cascade¶
Executable Skill (zero tokens)
│
│ tool unavailable / step failure
▼
LLM-Guided (decide() with tool context, costs tokens)
│
│ LLM can't resolve / permission denied
▼
Chain of Command Escalation (request assistance from peer/superior)
│
│ no one can help / critical failure
▼
Report to Bridge (Captain/XO informed of capability gap)
This cascade means agents degrade gracefully rather than failing hard. A missing tool doesn't halt the agent — it triggers the same chain of command that a human crew would use: try yourself, ask for help, escalate to command.
ToolContext: The Permission-Filtered View¶
ToolContext is NOT the full Tool Registry. It is a scoped view created for each agent based on:
- Role template — which tools the agent's post is assigned (from resources.yaml/skills.yaml)
- Rank — Earned Agency level determines read/write/execute permissions per tool
- Qualification — PQS completion may unlock additional tool access (AD-566f bridge)
- Department — some tools are department-scoped (e.g., security tools only for Security dept)
- Captain overrides — explicit grants/denials from the Captain
# Conceptual API (AD-423a design target)
class ToolContext:
"""Scoped, permission-filtered tool access for a specific agent."""
def available_tools(self) -> list[ToolDescriptor]: ...
def has_tool(self, tool_id: str) -> bool: ...
async def invoke(self, tool_id: str, **params) -> ToolResult: ...
# invoke() checks permissions internally — no way to bypass
The ToolContext is constructed at onboarding (wire_agent()) and updated when rank changes, qualifications are earned, or Captain overrides are applied. Crew agents never see the raw Tool Registry — they see their ToolContext.
6. The Connection Map¶
What's Built (solid lines) vs What's Needed (dashed lines)¶
STANDING ORDERS (AD-339) ✅
┌─────────────────────┐
│ Federation → Ship │
│ → Dept → Agent │
│ Behavioral guidance │
└─────────┬───────────┘
│ shapes reasoning
▼
┌──────────────┐ ┌─────────────────────┐ ┌──────────────────┐
│ VESSEL │ │ CREW MEMBER │ │ EARNED AGENCY │
│ ONTOLOGY │───►│ │◄───│ (AD-357) ✅ │
│ (AD-429) ✅ │ │ Agent Type + Post │ │ Trust → Rank │
│ │ │ Department │ │ Rank → Actions │
│ 8 domains │ │ Chain of Command │ │ Gates tool use │
└──────┬───────┘ └────────┬────────────┘ └──────────────────┘
│ │
│ defines │ has
▼ ▼
┌──────────────┐ ┌─────────────────────┐ ┌──────────────────┐
│ ROLE │- - │ SKILL PROFILE │ │ QUALIFICATION │
│ TEMPLATE │ │ (AD-428) ✅ │◄ - │ BATTERY │
│ skills.yaml │ │ │ │ (AD-566) 🔧 │
│ ✅ schema │ │ PCCs + Role Skills │ │ Tests → scores │
│ ❌ not wired│ │ + Acquired Skills │ │ Baseline + drift│
│ at onboard │ │ Proficiency levels │ │ ❌ not wired to │
└──────┬───────┘ └────────┬────────────┘ │ skill proficiency│
│ │ └──────────────────┘
│ requires │ qualifies for
▼ ▼
┌──────────────┐ ┌─────────────────────┐ ┌──────────────────┐
│ TOOL │- - │ TOOL ACCESS │ │ COGNITIVE JIT │
│ REGISTRY │ │ (TOOL BINDINGS) │ │ (AD-531-539) ✅ │
│ (AD-423) ❌ │ │ │ │ │
│ Designed │ │ Which tools this │ │ Procedures → │
│ Not built │ │ agent can invoke │ │ Executable │
│ │ │ Gated by rank + │ │ Skills │
│ │ │ qualification │ │ Zero-token │
└──────┬───────┘ └────────┬────────────┘ │ replay at L4+ │
│ │ └──────────────────┘
│ provides │ executes via
▼ ▼
┌──────────────┐ ┌─────────────────────┐
│ INFRA │ │ INTENT BUS │
│ AGENTS │◄───│ (Core) ✅ │
│ (AD-398) ✅ │ │ │
│ │ │ Observable, │
│ FileReader │ │ distributable, │
│ FileWriter │ │ auditable │
│ Shell │ │ │
│ HttpFetch │ │ │
│ CodebaseIdx │ │ │
└──────────────┘ └─────────────────────┘
Legend: ✅ Built | 🔧 Building | ❌ Not built | - - Connection needed
The Six Missing Connections¶
| # | Connection | From → To | What It Does | Enabling AD |
|---|---|---|---|---|
| C1 | Role → Tool Assignment | skills.yaml role_template → Tool Registry | At onboarding, assign tools to agent based on role | AD-423 + new wiring |
| C2 | Qualification → Skill Proficiency | AD-566 test results → AD-428 AgentSkillService | Passing a qualification test updates skill proficiency | AD-566f (new) |
| C3 | Qualification → Tool Authorization | AD-566 completion → Tool Registry permissions | Completing PQS unlocks tool access | AD-423 + AD-566f |
| C4 | Skill Proficiency → Earned Agency | AD-428 proficiency levels → promotion requirements | Qualification paths gate rank advancement | AD-566/428 bridge (exists in AD-539 gap predictor, needs formalization) |
| C5 | Tool Registry → Onboarding | AD-423 → agent_onboarding.py | Wire tool bindings during wire_agent() |
AD-423 |
| C6 | Crew Manifest → Unified Query | All systems → single API | "What can this agent do?" query | AD-513 |
7. The AD Landscape¶
Built ADs (Foundation Layer)¶
| AD | Name | What It Provides | Status |
|---|---|---|---|
| AD-339 | Standing Orders | 4-tier behavioral guidance hierarchy | ✅ Complete |
| AD-357 | Earned Agency | Trust-based rank gating action space | ✅ Complete |
| AD-398 | Three-Tier Architecture | Crew/Utility/Infrastructure classification | ✅ Complete |
| AD-422 | Tool Taxonomy | 8-category tool classification + design doc | ✅ Complete |
| AD-428 | Skill Framework | Competency tracking, proficiency, qualification paths | ✅ Complete |
| AD-429a-e | Vessel Ontology | 8-domain formal model (org, crew, skills, resources, etc.) | ✅ Complete |
| AD-441a-c | Persistent Identity | DIDs, birth certificates, asset tags | ✅ Complete |
| AD-496-498 | Workforce Scheduling | Work items, bookings, calendars, scrumban | ✅ Complete |
| AD-531-539 | Cognitive JIT | Procedure extraction, replay, gap detection, qualification triggering | ✅ Complete |
Building ADs (In Progress)¶
| AD | Name | What It Provides | Status |
|---|---|---|---|
| AD-566a | Qualification Harness | Test protocol, store, harness engine | ✅ Complete |
| AD-566b | Tier 1 Baseline Tests | Personality, memory, confabulation probes | 🔧 Builder |
| AD-566c | Drift Detection Pipeline | Scheduled testing, statistical thresholds, alerts | Planned |
| AD-566d | Tier 2 Domain Tests | Role-specific competency tests | Planned |
| AD-566e | Tier 3 Collective Tests | Crew-wide coordination measurement | Planned |
Unbuilt ADs (Connection Layer)¶
| AD | Name | What It Provides | Gap It Fills | Dependencies |
|---|---|---|---|---|
| AD-423 | Tool Registry | Runtime tool catalog, permissions, scoping, discovery | C1, C3, C5 | AD-422 (done), AD-398 (done) |
| AD-483 | Tool Layer — Instruments | Tool base class, ToolRegistry class, tool trust |
C1 (programming model) | AD-423 (scope overlap — reconcile) |
| AD-438 | Ontology-Based Task Routing | Directed assignment using role + skills + tools | Routing optimization | AD-429 (done), AD-428 (done), AD-423 |
| AD-513 | Crew Manifest | Unified queryable crew roster | C6 | AD-429 (done), AD-441 (done), AD-357 (done) |
| AD-543-549 | Native SWE Harness | ToolCall protocol, agentic loop | Tool execution model | AD-423 |
New ADs Needed¶
| AD | Name | What It Provides | Gap It Fills |
|---|---|---|---|
| AD-566f (new) | Qualification → Skill Bridge | Test results update Skill Framework proficiency | C2 |
| AD-423+ (new scope) | Role-Based Tool Assignment | At onboarding, assign tools from role template | C1, C5 |
| AD-423+ (new scope) | Qualification-Gated Tool Access | PQS completion unlocks tool permissions | C3 |
8. AD-423 and AD-483: Reconciliation¶
AD-423 and AD-483 both define "Tool Registry" but from different angles:
| Aspect | AD-423 | AD-483 |
|---|---|---|
| Focus | Operational management | Programming abstraction |
| Scope | Permissions, scoping, lifecycle, discovery | Tool base class, ToolRegistry class, trust |
| Designed in | tool-taxonomy.md (detailed) | roadmap feature spec |
| Key features | CRUD+O permissions, dept scoping, Captain overrides | Tool base class, Beta-distribution trust, MCP compat |
| Overlapping | ToolRegistry class, tool registration | ToolRegistry class, tool registration |
Recommendation: Merge into a single AD sequence (AD-423a/b/c):
- AD-423a: Tool Foundation —
Toolprotocol +ToolRegistryclass + registration schema (absorbs AD-483's programming model) - AD-423b: Tool Permissions & Scoping — CRUD+O permissions, department scoping, Earned Agency gates, Captain overrides (AD-423's operational model)
- AD-423c: Role-Based Tool Assignment — Onboarding wiring, role template → tool bindings, qualification gates (the new connection layer)
This gives us the Tool abstraction, the permission model, and the onboarding wiring in a clean build sequence.
9. Build Order¶
The build order respects dependencies and maximizes value at each step:
Phase A: Complete Testing Infrastructure (current)¶
Establishes the measurement framework before making changes. All testing infrastructure in place.
Phase B: Tool Foundation¶
AD-423a (Tool protocol + ToolRegistry)
→ AD-423b (Permissions + scoping + Earned Agency gates)
→ AD-423c (Role-based assignment at onboarding)
The keystone. Gives crew agents composable tool access. Infrastructure agents become "ship's equipment" that crew agents can use through tool bindings.
Phase C: Connect the Dots¶
AD-566f (Qualification → Skill Bridge)
→ AD-513 (Crew Manifest — unified query)
→ AD-438 (Ontology-Based Task Routing)
Closes the remaining connections. Qualification results flow into skill proficiency. Crew Manifest provides the "what can this agent do?" query. Task routing uses the full capability profile for intelligent assignment.
Phase D: Memory Architecture (existing plan)¶
AD-567a → AD-567b (absorbs AD-462) → AD-567c → AD-567d
→ AD-566 re-run (measure impact)
→ AD-567e → AD-567f → AD-567g
Memory improvements, measured against the baselines established in Phase A.
Phase E: Advanced Integration¶
AD-543-549 (Native SWE Harness — ToolCall protocol + agentic loop)
AD-439 (Emergent Leadership Detection)
AD-440 (Chain of Command Delegation)
Higher-order capabilities that build on the connected fabric.
10. The Complete Crew Member Capability Profile¶
When all phases are complete, querying an agent's capability profile returns:
agent_id: "did:probos:ship-001:agent-meridian-uuid"
callsign: "Meridian"
agent_type: architect
post: first_officer
department: science
rank: commander
trust_score: 0.82
lifecycle_state: active
# Character (Personality)
personality:
openness: 0.9
conscientiousness: 0.8
extraversion: 0.5
agreeableness: 0.7
neuroticism: 0.2
drift_from_seed: 0.08 # Euclidean distance
# Standing Orders — T1 (Identity + Behavioral Guidance)
standing_orders:
federation: "config/standing_orders/federation.md"
ship: "config/standing_orders/ship.md"
department: "config/standing_orders/science.md"
personal: "config/standing_orders/architect.md"
# Cognitive Skills — T2 (AgentSkills.io, from CognitiveSkillCatalog)
cognitive_skills:
- name: architecture-review
origin: internal
description: "Analyze proposed system designs against ProbOS architectural principles"
department: science
min_rank: lieutenant
skill_id: architecture_review # bridge to SkillRegistry
proficiency: 5 # Advise
- name: code-review
origin: external # imported from AgentSkills.io ecosystem
description: "Review code changes for correctness, style, and security"
# no ProbOS metadata — ungoverned external skill
# Handled Intents (derived from T2 cognitive skills + T3 executable skills)
handled_intents:
- name: design_feature
source: cognitive_skill # from architecture-review SKILL.md
tier: deep
description: "Analyze codebase and produce architectural proposals"
# Assigned Tools (from role template + rank + qualification)
tools:
- tool_id: codebase_query
provider: codebase_index
permission: ORW
qualified: true
qualification_date: "2026-04-01"
- tool_id: ward_room_post
provider: ward_room
permission: ORW
gated_by: earned_agency
qualified: true
- tool_id: ward_room_endorse
provider: ward_room
permission: ORW
gated_by: lieutenant_plus
qualified: true
- tool_id: episodic_recall
provider: episodic_memory
permission: OR
scope: own_shard_only
qualified: true
- tool_id: knowledge_query
provider: knowledge_store
permission: OR
qualified: true
# Learned Skills (from Cognitive JIT)
executable_skills:
- name: codebase_knowledge
origin: built_in
compilation_level: 3
# ... additional learned skills from experience
# Skill Proficiency (from Skill Framework)
skill_profile:
pccs:
- skill_id: chain_of_command
proficiency: 5 # Advise
- skill_id: communication
proficiency: 5
- skill_id: collaboration
proficiency: 4 # Enable
role_skills:
- skill_id: architecture_review
proficiency: 5
- skill_id: pattern_recognition
proficiency: 4
acquired_skills: []
# Qualification Status (from AD-566)
qualifications:
baseline_set: true
latest_battery:
bfi2_personality_probe: { score: 0.87, passed: true }
episodic_recall_accuracy: { score: 0.92, passed: true }
confabulation_resistance: { score: 0.95, passed: true }
qualification_path: lieutenant_to_commander
path_progress: "4/5 requirements met"
# Procedures (from Cognitive JIT)
procedures:
total: 12
by_level:
novice: 0
guided: 3
validated: 5
autonomous: 3
expert: 1
# Duty Assignment
current_duty:
pool: architect
watch: alpha
intent_subscriptions: ["design_feature"]
# Service Record
service_record:
birth_date: "2026-03-15T08:00:00Z"
total_episodes: 847
success_rate: 0.91
trust_trajectory: improving
promotions: ["ensign→lieutenant (2026-03-18)", "lieutenant→commander (2026-03-25)"]
This is the Crew Manifest (AD-513) query result — the unified view that connects every system.
11. Relationship to Commercial Systems¶
Agent Capital Management (ACM)¶
ACM is the commercial financial backbone that builds on the OSS capability profile:
| OSS (Crew Capability Profile) | Commercial (ACM) |
|---|---|
| What the agent CAN do | What the agent COSTS |
| Skill proficiency | Billable skill rates |
| Tool access | Tool licensing costs |
| Qualification status | Certification revenue |
| Duty assignment | Project assignment |
| Service record | Performance evaluation |
Boundary rule: "How it works" → OSS. "How it makes money" → Commercial. The capability profile is OSS. Billing, scheduling optimization, capacity planning, and customer-facing management are commercial.
Agent Services Automation (ASA)¶
ASA (AD-C-010 through AD-C-015) is the commercial execution engine. It consumes the capability profile to: - Match agents to work items based on skills + tools + availability - Schedule agents across ProbOS instances (workforce mobility) - Track billable time via BookingJournals - Optimize resource allocation
The OSS capability profile is the input. ASA is the consumer.
12. Design Principles¶
-
Tools are not personnel. Infrastructure agents remain agents (for observability, distributability, evolvability). But crew agents get tool bindings — composable capabilities assigned by role.
-
Qualification gates access. Completing PQS (AD-566) updates skill proficiency (AD-428) which unlocks tool permissions (AD-423). No shortcuts. The pathway is: learn → test → qualify → access.
-
Rank modulates scope. Same tool, different permission level by rank. Ensigns get read-only. Commanders get full access. Seniors get cross-department access. Captain overrides everything.
-
Skills are dual-path. Assigned skills come from role templates (what you're EXPECTED to know). Learned skills come from Cognitive JIT (what you FIGURED OUT). Both tracked in the same Skill Framework.
-
The ontology is the schema of truth. All capability definitions live in YAML (organization.yaml, skills.yaml, resources.yaml). Runtime code reads from the ontology. Changes to capability structure are ontology changes, not code changes.
-
Crew Manifest is the query surface. One API call returns the complete capability profile. No agent should need to query 8 different services to understand itself. AD-513 assembles the view.
-
Preserve Society of Mind at infrastructure layer. The intent bus, infrastructure agents, and uniform protocol are genuine architectural advantages. The tool binding layer is additive, not a replacement.
-
Tools are delivery-mechanism agnostic. A tool can be an onboard utility agent, an MCP server, a remote API, a desktop automation session, a browser, a federation service, or a deterministic function. The crew member doesn't know or care. The
Toolprotocol andToolContextabstract the delivery mechanism. What matters is: can I use it, and does it work? -
Graceful degradation through chain of command. When a tool is unavailable or a skill can't complete, the agent doesn't fail — it falls back to LLM reasoning, then escalates via chain of command. The fallback cascade mirrors how a human crew handles missing equipment: try yourself, ask for help, report to command.
13. Prior Work Absorbed¶
| Source | What Was Absorbed |
|---|---|
| U.S. Navy personnel management (BUPERS/PQS/NEC/3-M) | Organizing metaphor for the entire architecture |
| AD-422 tool-taxonomy.md | 8-category tool classification, CRUD+O permissions, department scoping |
| AD-423 roadmap spec | Tool Registry design, registration schema, discovery flow |
| AD-483 unified-tool-layer.md | AgenticToolAdapter protocol, skill-to-tool binding concept |
| AD-429 Vessel Ontology (all 8 domains) | Formal model connecting org, crew, skills, resources |
| AD-428 Skill Framework | Three-category taxonomy, Dreyfus proficiency, role templates |
| AD-441c Asset Tags | Two-tier identity (birth certificates vs serial numbers) |
| AD-531-539 Cognitive JIT | Learned procedures as Executable Skills |
| AD-566a Qualification Harness | Test protocol, store, comparison API |
| MOISE+ (organizational specification) | Structural specification for agent organizations |
| O*NET / ESCO (competency taxonomies) | Skill taxonomy design (referenced in ontology research) |
| Fokoue et al. (2026) | Same LLM + different contexts = different analytical lenses |
| Ge et al. (2026, IRT) | Decomposing LLM ability from scaffold ability |
| Jeong (2026, MTI) | Behavioral profiling for drift detection |
| Claude Code / OpenClaw analysis | Plug-and-play skill/tool composition patterns |
| AD-422 tool-taxonomy.md (8 categories) | Tool type diversity — 8 categories from utility agents to federation services |
| AD-543 unified-tool-layer.md | AgenticToolAdapter protocol, ToolContext concept, adapter uniformity |
| AD-543 ToolCall protocol spec (roadmap) | ToolCallRequest/ToolCallResult/ContentBlock wire format, ToolExecutor, agentic loop |
| AgentSkills.io (Anthropic, open standard) | SKILL.md format, progressive disclosure (description→instructions→resources), skills-ref validation library |
| Claude Code Skills | Description-based auto-discovery, on-demand loading, $ARGUMENTS substitution, three storage scopes, context: fork subagent execution |
| Microsoft Business Skills (Dataverse) | "Natural-language instructions that capture how your organization gets work done." Metadata-for-discovery / instructions-for-execution pattern. RBAC governance. |
| Hermes Agent Skills Hub (NousResearch) | 644 skills across 4 registries, category taxonomy (16 categories), AgentSkills.io adoption, skills as procedural memory that self-improve through use |
Anthropic anthropics/skills repository |
Reference skill implementations (document skills, development skills), template structure for skill authoring |
| BF-146 (standing order confabulation) | Concrete demonstration that instruction-defined capabilities have same defect surface as code — stale references cause confabulation |
| Design Principle: "Natural Language as Code" | Instruction validation (structural, pre-execution) distinct from output evaluation (stochastic, post-execution evals). Three capability types need different validation. |
14. Deferred Items and Future ADs¶
| Item | Proposed AD | Dependencies | Notes |
|---|---|---|---|
| AD-423/483 reconciliation into AD-423a/b/c | AD-423a/b/c | AD-422 (done) | Merge scope, build in sequence |
| Qualification → Skill proficiency bridge | AD-566f | AD-566a-e, AD-428 | New AD needed |
| Role-based tool assignment at onboarding | Part of AD-423c | AD-423a/b | Wiring in agent_onboarding.py |
| Qualification-gated tool authorization | Part of AD-423c | AD-423a/b, AD-566 | PQS completion → tool unlock |
| Crew Manifest unified query | AD-513 | AD-423, AD-429, AD-441 | Already planned |
| Ontology-Based Task Routing | AD-438 | AD-423, AD-428, AD-429 | Already planned |
| Emergent Leadership Detection | AD-439 | AD-429, Hebbian | Already planned |
| Chain of Command Delegation | AD-440 | AD-429 | Already planned |
| Business Process Execution | AD-618 (Bill System) | AD-423, AD-496, AD-438, AD-434 | Multi-agent SOPs with role-based assignment, BPMN-vocabulary decision points, Cognitive JIT bridge. Research: docs/research/standard-operating-procedures.md |
| Personal Ontology (from research) | Future | AD-429 | Agent's self-model of own capabilities |
| ToolContext scoped view construction | Part of AD-423c | AD-423a/b | Permission-filtered tool view per agent, wired at onboarding |
| ProcedureStep.required_tools field | Part of AD-423c or AD-539+ | AD-423a, AD-531-539 | Declares which tools each procedure step needs |
| Fallback cascade (skill→LLM→escalate) | Part of AD-534b+ | AD-534 (done), AD-423 | Graceful degradation when tools unavailable |
| Cognitive Skill Registry (AgentSkills.io) | AD-596a-e | AD-428, AD-429, AD-339 | T2 cognitive skills as SKILL.md files, progressive disclosure, external skill import, validation |
| External skill interop | Part of AD-596d | AD-596a | Consume AgentSkills.io skills from Claude Code, Hermes, etc. |
| Instruction linting | Part of AD-596e | AD-596a | Structural validation for instruction-defined capabilities (BF-146 class of defects) |
All deferred items now have AD assignments.
15. Canonical Validation Scenario: Social Media Marketing Agent¶
This is the end-to-end test case that proves the Crew Capability Architecture works. Every layer must execute for this flow to complete. If this scenario passes, the system is wired correctly.
The Flow¶
┌─────────────────────────────────────────────────────────────────┐
│ VALIDATION SCENARIO: LinkedIn Post Creation │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Step 1: CREATE AGENT │
│ ├─ New crew agent instantiated │
│ ├─ Gets DID, birth certificate, sovereign identity │
│ ├─ wire_agent() in agent_onboarding.py │
│ └─ Systems: Identity (AD-441), Onboarding (Core) │
│ │
│ Step 2: ASSIGN ROLE │
│ ├─ Post: "social_media_marketer" (from ontology) │
│ ├─ Department: Communications (or Operations) │
│ ├─ Role template loaded from skills.yaml │
│ ├─ Standing Orders composed: Federation → Ship → Dept → Agent │
│ └─ Systems: Ontology (AD-429), Standing Orders (AD-339) │
│ │
│ Step 3: ASSIGN SKILLS FROM ROLE TEMPLATE │
│ ├─ Role template defines required skills: │
│ │ - content_creation (PCC: communication) │
│ │ - social_media_post_writing (ROLE skill) │
│ │ - audience_analysis (ROLE skill) │
│ │ - brand_voice_adherence (ROLE skill) │
│ ├─ commission_agent() assigns skills at FOLLOW proficiency │
│ ├─ Prerequisite DAG enforced (post_writing requires │
│ │ content_creation) │
│ └─ Systems: Skill Framework (AD-428), Ontology (AD-429) │
│ │
│ Step 4: SKILL → INTENT MAPPING │
│ ├─ social_media_post_writing skill handles intent: │
│ │ "create_linkedin_post" │
│ ├─ IntentDescriptor registered for this intent │
│ ├─ Intent bus routes "create_linkedin_post" → this agent │
│ ├─ Could be executable skill (Cognitive JIT L4+) or │
│ │ LLM-mediated (cognitive lifecycle) │
│ └─ Systems: Intent Bus (Core), Cognitive JIT (AD-531-539) │
│ │
│ Step 5: TOOL REQUIRED — LINKEDIN API │
│ ├─ Skill execution needs tool: "linkedin_post_api" │
│ ├─ ToolContext.has_tool("linkedin_post_api") → checked │
│ ├─ Permission check: agent rank allows W (write) on this tool │
│ ├─ Tool type: Remote API (or MCP Server) │
│ ├─ ToolContext.invoke("linkedin_post_api", { │
│ │ content: "...", audience: "...", hashtags: [...] │
│ │ }) → ToolResult │
│ └─ Systems: Tool Registry (AD-423a), Permissions (AD-423b), │
│ ToolContext (AD-423c) │
│ │
│ Step 6: EXECUTION + AUDIT │
│ ├─ Tool adapter invokes LinkedIn API │
│ ├─ Result flows back through intent bus │
│ ├─ Episode recorded in EpisodicMemory │
│ ├─ Skill exercise recorded (AgentSkillService.record_exercise) │
│ ├─ Tool usage audited in Tool Registry │
│ ├─ Trust updated based on outcome │
│ └─ Systems: EpisodicMemory, Skill Framework (AD-428), │
│ TrustNetwork, Tool Registry audit │
│ │
└─────────────────────────────────────────────────────────────────┘
What Each Step Validates¶
| Step | Validates | Currently |
|---|---|---|
| 1. Create Agent | Identity + onboarding pipeline | ✅ Works |
| 2. Assign Role | Ontology role templates → Standing Orders composition | 🔧 Ontology schema exists, not wired at onboarding |
| 3. Assign Skills | Role template → skill commissioning from config (not hardcoded Python) | 🔧 Commissioning works, but templates are Python dict, not YAML |
| 4. Skill → Intent | Skill maps to intent, intent bus routes correctly | ✅ Works (executable skills dispatch in handle_intent) |
| 5. Tool Access | ToolContext permission check + tool invocation | ❌ AD-423a/b/c not built |
| 6. Audit Trail | Exercise recorded, trust updated, episode stored | 🔧 Partial (episode + trust work, skill exercise not wired) |
Gap: Role Templates Must Come From Config¶
Currently ROLE_SKILL_TEMPLATES is a Python dict in skill_framework.py — hardcoded for 7 agent types (security_officer, engineering_officer, etc.). For this scenario to work with dynamic roles like "social_media_marketer", role templates must be loaded from config/ontology/skills.yaml at commission time. The Python dict becomes the default fallback for built-in agent types; custom roles are defined in YAML.
This is critical for Nooplex commercial use cases where customers define their own agent roles. A consulting firm needs "data_analyst", "project_manager", "client_liaison" — roles that don't exist in the OSS built-in types.
Success Criteria¶
The validation scenario passes when:
- A new agent with type
social_media_marketercan be created with no code changes (config only) - The agent receives skills from a YAML role template, not hardcoded Python
- The agent can handle a
create_linkedin_postintent - The intent execution invokes a LinkedIn tool through ToolContext with permission checks
- The skill exercise is recorded, trust is updated, and the episode is stored
- The full chain is queryable via Crew Manifest (
/api/ontology/crew-manifest): agent → role → skills → tools → recent activity
Nooplex Commercial Extension¶
This scenario is the foundation for Nooplex's "crew-for-hire" model:
- Customer defines role → YAML role template (skills, tools, standing orders)
- Nooplex assigns agent → DID-portable agent, Clean Room memory policy
- Agent works → uses customer's tools (LinkedIn, Salesforce, HubSpot via MCP)
- ACM tracks → billable hours, skill utilization, tool costs
- ASA schedules → work items, capacity planning, auto-escalation
The OSS capability architecture makes all of this possible. The commercial layer adds billing, scheduling, and multi-tenant management.