Skip to content

Research Coverage System

Tracks analysis coverage across the codebase at a file and concept level. Enables systematic discovery of which parts of the codebase need investigation, testing, or documentation improvements.

The system discovers source files via git, seeds coverage records for each file+concept pair, detects file modifications via hash tracking, and selects coverage targets to guide where agent research should focus next.

Why it matters: In a multi-agent orchestration system, agents need guidance on what areas of the codebase need attention. Rather than random exploration, this system provides data-driven coverage prioritization.

Key source file: server/orchestrator/researchCoverage.ts


How It Works

1. File Discovery

Files are discovered using git ls-tree, not filesystem scanning:

typescript
// server/orchestrator/researchCoverage.ts:47-70
export function discoverFiles(projectPath: string): DiscoveredFile[] {
  const output = gitExecSafe(
    ['ls-tree', '-r', 'HEAD', '--format=%(objectname)\t%(path)'],
    { cwd: projectPath },
  )
  // Filter by SOURCE_EXTENSIONS (.ts, .vue, .md)
  // Skip SKIP_DIRS (node_modules, .git, dist, data, .nuxt, .output, etc.)
}
  • Queries git's object database directly (git ls-tree -r HEAD)
  • Includes git object hash for each file (enables stale-file detection later)
  • Filters to source files: .ts, .vue, .md
  • Skips build artifacts and dependencies

Discovered files include folder derivation (e.g., server/orchestrator/dispatcher.ts → folder server/orchestrator).

2. Coverage Seeding

When a project is detected or synced, the system seeds coverage rows for each file+concept combo:

typescript
// server/orchestrator/researchCoverage.ts:89-126
export function seedCoverage(
  db: AppDatabase,
  projectId: string,
  files: DiscoveredFile[],
): number {
  const concepts = getKnownConcepts(db, projectId)
  // For each file, insert a row for each concept (if not already present)
  // Status: 'uncovered' (initial state)
}

Seed Concepts (6 total):

  1. test_coverage — Unit/integration tests, test patterns, coverage %
  2. error_handling — Exception handling, error boundaries, recovery
  3. type_safety — TypeScript definitions, type inference, generics
  4. performance — Optimization, bottlenecks, memory/CPU usage
  5. security — Injection, authentication, authorization, secrets
  6. file_size — Complexity metrics, modularity, SRP adherence

Each seed is per-project, so new projects start fresh.

3. Stale Coverage Detection

Files are tracked by git hash. When a file changes, its coverage is reset:

typescript
// server/orchestrator/researchCoverage.ts:132-167
export function resetStaleCoverage(
  db: AppDatabase,
  projectId: string,
  currentFiles: DiscoveredFile[],
): number {
  // For each covered row, compare stored fileHash to current git hash
  // If mismatch: reset to 'uncovered', clear coveredAt + fileHash
}

Workflow:

  1. Agent researches a file, marks coverage as covered + stores fileHash
  2. Developer commits changes to that file
  3. Next sync: resetStaleCoverage() detects hash change, resets to uncovered
  4. File re-enters the research queue

4. Coverage Targeting

The orchestrator selects which file+concept combinations need research next:

typescript
// server/orchestrator/researchCoverage.ts:180-225
export function selectCoverageTargets(
  db: AppDatabase,
  projectId: string,
  maxFallbacks: number = 4,
): { primary: CoverageTarget; fallbacks: CoverageTarget[] } {
  // Group rows by (folder + concept)
  // Calculate coverage % for each group
  // Sort by ascending coverage % (lowest first)
  // Return primary (worst coverage) + fallbacks (next-worst)
}

Selection Strategy:

  • Groups coverage by folder + concept (not individual files)
  • Prioritizes lowest coverage first (e.g., if server/orchestrator has 0% test coverage, it ranks above app/components with 50%)
  • Returns primary target (lowest coverage group) and fallback targets (next 4 worst)

Example: If server/orchestrator + test_coverage is 0% covered (all files uncovered), it becomes the primary target. The scanner agent can then explore this folder+concept combination.


Key Components

ComponentFileResponsibility
discoverFilesresearchCoverage.ts:47Query git for source files, filter by extension, derive folders
seedCoverageresearchCoverage.ts:89Insert uncovered rows for new file+concept pairs
resetStaleCoverageresearchCoverage.ts:132Detect modified files (via hash), reset coverage to uncovered
selectCoverageTargetsresearchCoverage.ts:180Group by folder+concept, prioritize by lowest coverage %, return primary + fallbacks
getKnownConceptsresearchCoverage.ts:76Fetch distinct concepts for a project (or return SEED_CONCEPTS)
deriveFolderresearchCoverage.ts:27Derive logical folder name from file path (e.g., server/orchestrator)
researchCoverage tableschema.ts:126Store coverage status per file+concept+project

Data Model

researchCoverage Table

typescript
export const researchCoverage = sqliteTable('research_coverage', {
  id: text('id').primaryKey(),
  projectId: text('project_id').notNull(),           // FK: projects
  folder: text('folder').notNull(),                  // Derived folder (e.g., 'server/orchestrator')
  filePath: text('file_path').notNull(),             // Full path (e.g., 'server/orchestrator/dispatcher.ts')
  concept: text('concept').notNull(),                // One of SEED_CONCEPTS
  status: text('status').default('uncovered'),       // uncovered | covered | has_issues
  coveredAt: text('covered_at'),                     // When marked as covered
  fileHash: text('file_hash'),                       // Git hash at time of coverage
  ticketId: text('ticket_id'),                       // Optional: ticket doing the research
  notes: text('notes'),                              // Optional: research notes or findings
  createdAt: text('created_at'),
  updatedAt: text('updated_at'),
}, (table) => [
  unique('idx_research_coverage_unique')
    .on(table.projectId, table.filePath, table.concept),
])

Unique constraint: One row per (projectId, filePath, concept). Prevents duplicates.


Design Decisions

Git-Based File Discovery

Why: More reliable than filesystem scanning. Respects .gitignore, includes git object hashes. Trade-off: Requires git repo; untracked files aren't discovered.

Folder Derivation Logic

Why: Logical grouping (e.g., server/orchestrator, not individual files). Balances granularity.

  • Deep folders for server/app: two-segment derivation (server/orchestrator) for meaningful grouping
  • Shallow for others: one-segment (prompts, tests) to avoid over-fragmentation

Hash-Based Stale Detection

Why: Detects actual code changes without polling metadata or timestamps. Mechanism: Store fileHash when marking as covered. On next sync, compare current git hash; if mismatch, reset to uncovered.

Seed Concepts (Fixed Set)

Why: Stable concepts ensure consistent targeting across research cycles. Extensibility: Projects can add custom concepts via the getKnownConcepts() callback.

Lowest-Coverage Prioritization

Why: Ensures systematic coverage. Prevents "hot spots" from monopolizing research. Algorithm: Group by (folder, concept), calculate %, sort ascending. Primary = lowest %.