Research Coverage System
Tracks analysis coverage across the codebase at a file and concept level. Enables systematic discovery of which parts of the codebase need investigation, testing, or documentation improvements.
The system discovers source files via git, seeds coverage records for each file+concept pair, detects file modifications via hash tracking, and selects coverage targets to guide where agent research should focus next.
Why it matters: In a multi-agent orchestration system, agents need guidance on what areas of the codebase need attention. Rather than random exploration, this system provides data-driven coverage prioritization.
Key source file: server/orchestrator/researchCoverage.ts
How It Works
1. File Discovery
Files are discovered using git ls-tree, not filesystem scanning:
// server/orchestrator/researchCoverage.ts:47-70
export function discoverFiles(projectPath: string): DiscoveredFile[] {
const output = gitExecSafe(
['ls-tree', '-r', 'HEAD', '--format=%(objectname)\t%(path)'],
{ cwd: projectPath },
)
// Filter by SOURCE_EXTENSIONS (.ts, .vue, .md)
// Skip SKIP_DIRS (node_modules, .git, dist, data, .nuxt, .output, etc.)
}- Queries git's object database directly (
git ls-tree -r HEAD) - Includes git object hash for each file (enables stale-file detection later)
- Filters to source files:
.ts,.vue,.md - Skips build artifacts and dependencies
Discovered files include folder derivation (e.g., server/orchestrator/dispatcher.ts → folder server/orchestrator).
2. Coverage Seeding
When a project is detected or synced, the system seeds coverage rows for each file+concept combo:
// server/orchestrator/researchCoverage.ts:89-126
export function seedCoverage(
db: AppDatabase,
projectId: string,
files: DiscoveredFile[],
): number {
const concepts = getKnownConcepts(db, projectId)
// For each file, insert a row for each concept (if not already present)
// Status: 'uncovered' (initial state)
}Seed Concepts (6 total):
test_coverage— Unit/integration tests, test patterns, coverage %error_handling— Exception handling, error boundaries, recoverytype_safety— TypeScript definitions, type inference, genericsperformance— Optimization, bottlenecks, memory/CPU usagesecurity— Injection, authentication, authorization, secretsfile_size— Complexity metrics, modularity, SRP adherence
Each seed is per-project, so new projects start fresh.
3. Stale Coverage Detection
Files are tracked by git hash. When a file changes, its coverage is reset:
// server/orchestrator/researchCoverage.ts:132-167
export function resetStaleCoverage(
db: AppDatabase,
projectId: string,
currentFiles: DiscoveredFile[],
): number {
// For each covered row, compare stored fileHash to current git hash
// If mismatch: reset to 'uncovered', clear coveredAt + fileHash
}Workflow:
- Agent researches a file, marks coverage as
covered+ storesfileHash - Developer commits changes to that file
- Next sync:
resetStaleCoverage()detects hash change, resets touncovered - File re-enters the research queue
4. Coverage Targeting
The orchestrator selects which file+concept combinations need research next:
// server/orchestrator/researchCoverage.ts:180-225
export function selectCoverageTargets(
db: AppDatabase,
projectId: string,
maxFallbacks: number = 4,
): { primary: CoverageTarget; fallbacks: CoverageTarget[] } {
// Group rows by (folder + concept)
// Calculate coverage % for each group
// Sort by ascending coverage % (lowest first)
// Return primary (worst coverage) + fallbacks (next-worst)
}Selection Strategy:
- Groups coverage by folder + concept (not individual files)
- Prioritizes lowest coverage first (e.g., if
server/orchestratorhas 0% test coverage, it ranks aboveapp/componentswith 50%) - Returns primary target (lowest coverage group) and fallback targets (next 4 worst)
Example: If server/orchestrator + test_coverage is 0% covered (all files uncovered), it becomes the primary target. The scanner agent can then explore this folder+concept combination.
Key Components
| Component | File | Responsibility |
|---|---|---|
discoverFiles | researchCoverage.ts:47 | Query git for source files, filter by extension, derive folders |
seedCoverage | researchCoverage.ts:89 | Insert uncovered rows for new file+concept pairs |
resetStaleCoverage | researchCoverage.ts:132 | Detect modified files (via hash), reset coverage to uncovered |
selectCoverageTargets | researchCoverage.ts:180 | Group by folder+concept, prioritize by lowest coverage %, return primary + fallbacks |
getKnownConcepts | researchCoverage.ts:76 | Fetch distinct concepts for a project (or return SEED_CONCEPTS) |
deriveFolder | researchCoverage.ts:27 | Derive logical folder name from file path (e.g., server/orchestrator) |
researchCoverage table | schema.ts:126 | Store coverage status per file+concept+project |
Data Model
researchCoverage Table
export const researchCoverage = sqliteTable('research_coverage', {
id: text('id').primaryKey(),
projectId: text('project_id').notNull(), // FK: projects
folder: text('folder').notNull(), // Derived folder (e.g., 'server/orchestrator')
filePath: text('file_path').notNull(), // Full path (e.g., 'server/orchestrator/dispatcher.ts')
concept: text('concept').notNull(), // One of SEED_CONCEPTS
status: text('status').default('uncovered'), // uncovered | covered | has_issues
coveredAt: text('covered_at'), // When marked as covered
fileHash: text('file_hash'), // Git hash at time of coverage
ticketId: text('ticket_id'), // Optional: ticket doing the research
notes: text('notes'), // Optional: research notes or findings
createdAt: text('created_at'),
updatedAt: text('updated_at'),
}, (table) => [
unique('idx_research_coverage_unique')
.on(table.projectId, table.filePath, table.concept),
])Unique constraint: One row per (projectId, filePath, concept). Prevents duplicates.
Design Decisions
Git-Based File Discovery
Why: More reliable than filesystem scanning. Respects .gitignore, includes git object hashes. Trade-off: Requires git repo; untracked files aren't discovered.
Folder Derivation Logic
Why: Logical grouping (e.g., server/orchestrator, not individual files). Balances granularity.
- Deep folders for server/app: two-segment derivation (
server/orchestrator) for meaningful grouping - Shallow for others: one-segment (
prompts,tests) to avoid over-fragmentation
Hash-Based Stale Detection
Why: Detects actual code changes without polling metadata or timestamps. Mechanism: Store fileHash when marking as covered. On next sync, compare current git hash; if mismatch, reset to uncovered.
Seed Concepts (Fixed Set)
Why: Stable concepts ensure consistent targeting across research cycles. Extensibility: Projects can add custom concepts via the getKnownConcepts() callback.
Lowest-Coverage Prioritization
Why: Ensures systematic coverage. Prevents "hot spots" from monopolizing research. Algorithm: Group by (folder, concept), calculate %, sort ascending. Primary = lowest %.