AI Agents for Legacy Code Analysis: A Practical Guide -

Start Your Project

Artificial Intelligence

AI Agents for Legacy Code Analysis: A Practical Guide

Artificial Intelligence

Dec 17, 2025

9 min

Palahepitiya Gamage Amila

AEO Summary Box

How can AI agents analyse legacy code?

AI agents can systematically analyse legacy codebases by combining code parsing with large language model reasoning. The most effective approach involves three phases: automated discovery (mapping files, dependencies, and entry points), responsibility mapping (identifying what each component does and why), and knowledge synthesis (generating documentation, diagrams, and migration recommendations). Our agents have processed 200k+ line codebases in hours, producing documentation that would take weeks manually. The key is structuring agent prompts to focus on business logic understanding, not just code description.

The Documentation Gap

Every legacy system has the same problem: the people who built it aren't around to explain it.

Documentation, if it exists, is outdated. Comments describe what the code did three versions ago. Architecture diagrams reflect intentions rather than reality. The only accurate documentation is the code itself—and understanding that code requires reading thousands of files.

This matters because legacy modernisation projects fail when teams don't understand what they're replacing. They build new systems that miss edge cases, break integrations, or fail to replicate business logic that users depend on.

We built AI agents to close this gap. Not to generate code, but to understand it.

What AI Agents Do Well

AI agents excel at tasks that require processing large volumes of text while maintaining context. Legacy code analysis fits this perfectly:

Pattern recognition across files. An agent can identify that the same business rule is implemented in five different places, or that a particular database table is accessed by forty different functions.

Natural language synthesis. Agents can describe what code does in terms that business stakeholders understand, not just technical translation of the syntax.

Consistency at scale. Unlike human reviewers who fatigue after hours of code reading, agents maintain consistent attention across the entire codebase.

Cross-referencing. Agents can hold entire codebases in context (with appropriate chunking strategies), identifying connections that would require extensive manual searching.

Our Agent Architecture

We use a multi-phase approach:

Phase 1: Automated Discovery

The agent first maps the codebase structure:

File and directory organisation
Language and framework identification
Dependency analysis (internal and external)
Entry points and API surfaces
Database schema and data models
Configuration and environment dependencies

This phase is mostly deterministic—parsing ASTs, reading package manifests, analysing imports. AI helps with ambiguous cases (is this file dead code or a critical utility?) but most discovery uses traditional tooling.

Phase 2: Responsibility Mapping

This is where AI provides the most value. For each major component, the agent determines:

What it does. Not a line-by-line description, but a functional summary. "This service handles user authentication, including OAuth integration, session management, and permission checking."

Why it exists. Business context that explains design decisions. "The complex permission model supports multi-tenant access where users can belong to multiple organisations with different roles in each."

What it depends on. Runtime dependencies, data requirements, and integration points. "Requires Redis for session storage, calls the billing API for subscription checks, and queries the organisations table for permission resolution."

What depends on it. Downstream consumers and their expectations. "Used by all authenticated API endpoints and the admin dashboard. Session tokens are also validated by the background job processor."

The agent works through the codebase systematically, building up this responsibility map. Each component's analysis informs understanding of related components.

Phase 3: Knowledge Synthesis

Raw analysis becomes useful documentation:

Architecture documentation. System overview, component relationships, data flow diagrams. Written for someone joining the team, not someone already familiar with the code.

Integration guides. How to call each service, what data formats to expect, error handling patterns. Extracted from actual code behaviour, not theoretical design docs.

Migration recommendations. Which components are tightly coupled, where the natural seams exist, what order makes sense for incremental modernisation using the strangler fig pattern.

Risk assessment. Code quality issues, security concerns, performance bottlenecks. Not a style guide critique, but genuine risks that affect modernisation planning.

Real Results

We deployed this approach on a .NET Framework monolith—approximately 200,000 lines of code, built over 12 years by multiple teams.

Time investment: 6 hours of agent runtime, plus 4 hours of human review and refinement.

Output:

Complete service catalogue (47 major components identified)
Data model documentation with relationship mapping
API surface documentation (312 endpoints)
Dependency graph showing component coupling
Migration roadmap with recommended phases

Comparison: Manual analysis of a similar codebase previously took 6-8 weeks with a senior engineer working full-time.

The agent output wasn't perfect—human review caught misunderstandings and added business context that wasn't in the code. But it provided a foundation that dramatically accelerated the documentation process.

Prompt Engineering for Code Analysis

The effectiveness of AI code analysis depends heavily on how you prompt the agent. Key principles:

Focus on Intent, Not Syntax

Weak prompt: "Describe what this function does."

Strong prompt: "Explain the business purpose of this function. What user problem does it solve? Why might it have been implemented this way?"

The weak prompt gets you a paraphrase of the code. The strong prompt gets you understanding.

Provide Domain Context

Agents perform better when they understand the business domain. Front-loading context about the industry, user types, and business model helps the agent interpret ambiguous code correctly.

Example context: "This is a B2B SaaS application for managing logistics operations. Users are dispatchers who coordinate delivery drivers. Key concepts include routes (planned sequences of deliveries), manifests (paperwork for each delivery), and proof-of-delivery (confirmation that items were received)."

Request Structured Output

Ask for specific output formats that can be processed programmatically:

JSON for data that will feed into other tools
Markdown with consistent heading structure for documentation
Mermaid or PlantUML for diagrams

Structured output is easier to review, refine, and integrate into existing documentation systems.

Iterate with Clarifying Questions

Configure agents to ask clarifying questions when they encounter ambiguity. "I found two implementations of order validation. Is this intentional duplication, or should one be considered deprecated?"

This surfaces decisions that require human judgment rather than having the agent guess.

Limitations and Mitigations

Context Window Constraints

Even large context windows can't hold an entire enterprise codebase. Our approach:

Hierarchical analysis: understand the system structure first, then dive into components
Smart chunking: group related files rather than arbitrary splits
Summary propagation: pass summaries of analysed components as context for analysing dependent components

Missing Business Context

Code doesn't capture why decisions were made. The agent can see that there's a special case for customer ID 47, but not that customer 47 is an enterprise client with a custom contract.

Mitigation: combine agent analysis with stakeholder interviews. Use agent output as a starting point for conversations, not a replacement for them.

Hallucination Risk

Agents sometimes confidently describe functionality that doesn't exist, or misinterpret code behaviour.

Mitigation: always verify agent claims against the actual code. Use the agent for discovery and hypothesis generation, but validate before acting on analysis.

Outdated Training Data

Agents may not understand newer frameworks or language features if their training data predates them.

Mitigation: provide relevant documentation in context, or use agents with more recent training cutoffs for modern codebases.

When to Use This Approach

AI-assisted code analysis is most valuable when:

The codebase is large (50k+ lines) and manual review is prohibitively time-consuming
Original authors are unavailable to explain design decisions
Documentation is absent or outdated
You're planning modernisation and need to understand what you're working with
New team members need onboarding faster than traditional knowledge transfer allows

It's less valuable for small codebases (where reading the code directly is faster), actively maintained systems (where the team already has context), or when you just need to fix a specific bug (where targeted debugging is more efficient).

Frequently Asked Questions

Can agents generate migration code automatically?

We don't recommend it. Agent analysis informs migration planning, but the actual code should be written by engineers who understand the business context and can make judgment calls. Agent-generated code requires such extensive review that writing it directly is often faster.

What about proprietary or sensitive codebases?

We run analysis using Claude API with appropriate data handling agreements. For highly sensitive codebases, on-premises deployment options exist but require different infrastructure.

How does this compare to static analysis tools?

Static analysis tools are excellent for specific technical checks (security vulnerabilities, code style, type errors). AI agents excel at semantic understanding—what the code means in business terms. They're complementary, not competing approaches.

What's the cost?

Token costs for analysing a large codebase typically run GBP 50-200, depending on codebase size and analysis depth. The time savings versus manual analysis make this economical for any serious modernisation effort. For more details on managing AI costs, see our guide on AI production cost control.