How to Audit a Legacy Rails Codebase Without Getting Lost

The most useful question in a legacy audit is not "how many lines of code are there?" - It is closer to "What are you afraid to change?"

That question turns the audit from a static inspection into an operational map. If three developers independently mention the same checkout flow, billing job, import pipeline, or admin screen, that is not gossip. That is a hotspot. It may be beautifully written code. It may even have tests. But if the team cannot safely explain it, deploy it, or recover from changes to it, the risk is real.

This is where many audits get too mechanical. They produce a spreadsheet of outdated gems and large files, then quietly miss the fact that the team has stopped promising a feature to customers because nobody trusts the implementation. The spreadsheet is not wrong. It is just not yet useful.

Before cloning the repository, ask when the team last deployed on a Friday. Ask what broke in production during the last quarter. Ask which area only one person understands. Ask what feature has been delayed for a year because every estimate turns into a negotiation with the past.

The answers tell you where to spend your attention.

Read the Rails map before running the machines

The Gemfile tells you what responsibilities the app has outsourced, duplicated, or accumulated. Two authorization gems usually mean history. Two upload systems usually mean migration scar tissue. A payment gem pinned to a version from years ago may be more than dependency neglect. It may be a sign that money flow is too risky to touch.

The schema tells an even plainer story. A table with eighty nullable columns is rarely just ugly. It often means several business concepts were pressed into one record because that was the fastest way to ship. A transactions table with columns for card payments, bank transfers, refunds, reconciliation, and legacy provider IDs is not a table anymore. It is a museum.

Routes tell you whether the app still thinks in resources or whether every workflow became its own special door. Custom routes are not inherently bad. Hundreds of one-off endpoints are a signal that the domain model and the user workflows may no longer agree.

I like doing this pass before running automated tools because it gives the tools context. Without context, rubycritic says "This file is complex." With context, you can say "This file is complex, changes every week, has no coverage around refunds, and the business cannot invoice customers without it."

That second sentence is an audit finding.

Build a tiny audit ledger

During the first pass, I prefer a simple ledger over a long report. Every finding should connect a code signal to a business or delivery risk. If it cannot do that, it may still be true, but it probably does not belong in the first conversation.

Here is the shape I mean:

AuditFinding = Data.define(
  :area,
  :signal,
  :risk,
  :next_step,
  :time_horizon
)

findings = [
  AuditFinding.new(
    area: "Billing",
    signal: "Payment has 47 callbacks and no direct model specs",
    risk: "Small changes can create production-only money bugs",
    next_step: "Characterize the current behavior before extracting anything",
    time_horizon: "this quarter"
  ),
  AuditFinding.new(
    area: "Deploys",
    signal: "Rollback exists in theory but has not been practiced",
    risk: "Every release has avoidable operational risk",
    next_step: "Run a rollback rehearsal in staging",
    time_horizon: "this week"
  )
]

findings.group_by(&:time_horizon)

The point is not the Ruby object. The point is the discipline. "Large model" is not a finding. "Payment has 47 callbacks and no direct model specs, so small edits can create production-only money bugs" is a finding.

That framing also keeps the audit from becoming performative. Legacy Rails apps can always produce enough lint, coverage, and complexity noise to fill a document. The job is to decide what matters first.

Run tools after you have a thesis

Once you know what you are looking for, run the mechanical checks hard.

For security, I still want bundle audit and Brakeman early. A critical advisory in authentication, file upload, or payment code changes the priority stack immediately. Brakeman warnings also need judgment. One high-confidence warning in a public controller can matter more than a long tail of low-confidence warnings in internal screens.

For dependencies, I want the Rails version, Ruby version, EOL status, pinned gems, and the upgrade distance. "Old" is vague. "Rails 6.1 is past end of life and this app depends on gems that block Rails 7.2" is something leadership can act on.

For complexity, I care less about the raw number and more about churn. A 900-line model that nobody touches is a different problem from a 400-line model edited every sprint. The first may be ugly sediment. The second is an active source of delivery risk.

The command sequence can stay boring:

bundle audit check --update
bundle exec brakeman --format html -o tmp/brakeman.html
bundle outdated
cloc app/
bundle exec rubycritic app/
bundle exec rails routes | wc -l
time bundle exec rspec
COVERAGE=true bundle exec rspec
git log --oneline -1 -- spec/

Do not let the tools decide the story. Use them to confirm, sharpen, or disprove the story you started forming from interviews and Rails structure.

Coverage is not confidence

Coverage numbers are famously seductive. An app with 82 percent coverage can still have no meaningful tests around the code that keeps the business alive. A low-coverage app can still have a few excellent characterization tests around the riskiest flows.

In an audit, I want to know what the tests protect. If checkout, subscription renewal, data imports, authorization, and background jobs are thinly tested, the global percentage is trivia. If the test suite takes forty minutes and fails randomly, the team will stop running it locally. At that point, the suite exists more as ceremony than feedback.

Look for the files with high churn and low coverage. Look for commented-out specs. Look for factories that create half the application just to build one record. Look for tests that assert implementation details while the important business outcome is left implied.

That is where refactoring should start. Not with the prettiest extraction, but with the place where a small test can buy the team confidence.

AI can accelerate the audit, but it cannot own it

AI is useful in this work when the scope is narrow. Give it one model and ask for distinct responsibilities. Give it a callback-heavy class and ask for side effects. Give it a controller and ask which branches deserve request specs.

That can save real time. It can also create the illusion that the audit is more complete than it is.

The model does not know which complexity is accidental and which is load-bearing. It does not know that the ugly import job exists because a hospital partner sends malformed CSVs every Tuesday. It does not know that the strange conditional in checkout is the only thing preventing duplicate invoices for a legacy enterprise customer.

Use AI to speed up reading. Do not outsource judgment.

Deliver one page, not a museum

The first-week deliverable should not be an encyclopedia of everything wrong with the codebase. That kind of document feels thorough, but it gives the prioritization work back to the client.

I prefer a short triage with three real sections.

The first section is what to fix this week. These are security, compliance, rollback, data integrity, and production visibility risks. They are not always hard, but they are urgent.

The second section is what to fix this quarter. These are architecture and testability problems that slow delivery every sprint. The work belongs on the roadmap because ignoring it has a compounding cost.

The third section is what not to worry about yet. This may be the most valuable section. It gives the team permission to stop feeling guilty about every ugly corner. Some code is ugly and stable. Some code is weird because the business is weird. Some code can wait.

The strongest audit ends with an opinion: if this team could only fix one thing this year, what should it be?

That is the sentence people remember. Not the number of routes. Not the coverage percentage. Not the giant list of TODOs. The useful thing is the judgment that turns a legacy Rails app from a haunted house into a map.

Happy auditing!

Read the Rails map before running the machines

Build a tiny audit ledger

Run tools after you have a thesis

Coverage is not confidence

AI can accelerate the audit, but it cannot own it

Deliver one page, not a museum

Read next

How to build Rails AI code quality workflow

Getting structured input and output from Ollama

AI Agents System Architecture - Possible strategies

Structured logging in Rails 8.1 with Rails.event