Code review processes usually start as "quality control" and end as "ego battles." As teams grow, two traps become clear: PRs languish for weeks waiting for feedback, or every comment lands as personal criticism. Both stem from the same root cause—rules that aren't measurable. Working with 15+ people across different disciplines at Roibase for 8 years has taught us something simple: unless you anchor review culture in numerical criteria, personal judgment becomes unavoidable. When you systematize metrics like time-to-review, comment density, and PR size, quality rises and conflict falls.
Review Speed: Time-to-Review SLA
Every PR has a lifecycle. The span from opening to the first comment—time-to-first-review—is the team's first signal of discipline. At Roibase, we capped this at 4 hours maximum (during business hours). Why 4 hours? It's the sweet spot in async work models: protects deep work blocks while keeping feedback loops tight.
The rule: within 4 hours of PR opening, at least one assigned reviewer must provide initial feedback. The enforcement mechanism isn't a Slack reminder—it's a GitHub Actions workflow. When a PR opens, it gets auto-tagged; 4 hours later, assigned reviewers get a Slack mention. This soft nudge eliminates "forgotten" reviews.
The time-to-merge metric cuts deeper. The span from PR opening to merge into main—say, max 24 hours for backend changes. Frontend changes get 48 hours. Why the difference? Backend merges usually need less visual sign-off; they can deploy behind feature flags. Frontend involves design QA and cross-device testing, which take time.
Metric Dashboard: Linear Integration
We tie GitHub to Linear so every PR auto-maps to a Linear ticket. The ticket status updates as the PR moves through its lifecycle. End of sprint: we look at average time-to-merge. If the team average breaks 36 hours, that's a retrospective conversation—usually pointing to PR size or reviewer load.
PR Size: The 400-Line Rule
Large PRs can't be reviewed properly. It's industry consensus, rarely codified. Roibase standard: max 400 lines of change (additions + deletions combined). Where does this number come from? It's what a single reviewer can hold in their head for a focused 30-minute review.
To enforce it, we use a GitHub branch protection rule: PRs over 400 lines auto-label as "needs-split" and can't merge. There are exceptions—dependency updates, migration scripts. Those need manual override with a GitHub comment justification.
How do large refactors work? Stacked PRs. First PR: interface change. Second: implementation. Third: old code removal. Each under 400 lines, each independently reviewable. Does this take longer? Yes. Does merge conflict risk spike? Slightly. But review quality multiplies—because the reviewer actually has the mental bandwidth to think through each change.
# GitHub Actions — PR size check
name: PR Size Check
on: pull_request
jobs:
size_check:
runs-on: ubuntu-latest
steps:
- name: Check PR size
run: |
ADDITIONS=$(jq '.pull_request.additions' "$GITHUB_EVENT_PATH")
DELETIONS=$(jq '.pull_request.deletions' "$GITHUB_EVENT_PATH")
TOTAL=$((ADDITIONS + DELETIONS))
if [ $TOTAL -gt 400 ]; then
echo "PR too large: $TOTAL lines"
gh pr edit --add-label needs-split
exit 1
fi
Comment Density: The Nitpick Threshold
Not every comment weighs equally. "This could be refactored" is different from "This causes a null pointer exception." Our review template categorizes comments by severity:
| Category | Label | Example |
|---|---|---|
| Blocker | 🔴 BLOCKER | Security vulnerability, runtime crash |
| Major | 🟠 MAJOR | Performance regression, logic error |
| Minor | 🟡 MINOR | Naming convention, test coverage gap |
| Nitpick | 🔵 NITPICK | Preference, subjective taste |
The rule: nitpicks should not exceed 30% of comments. If a PR has 10 comments, 3 can be nitpicks; the rest must be blocker/major/minor. Why? Nitpick-heavy reviews tank author motivation and make reviewers look unnecessarily pedantic.
Comment density metric: average comments per PR. At Roibase, that's 3–5. Double digits usually means the PR should've been split. Zero comments signals rubber-stamp review—unwanted.
Template Discipline
Every reviewer starts from a GitHub PR template:
## Review Checklist
- [ ] Code logic correct?
- [ ] Test coverage above 80%?
- [ ] Breaking change? (CHANGELOG updated?)
- [ ] Performance impact measured? (benchmarks/)
## Comments
**🔴 BLOCKER:**
-
**🟠 MAJOR:**
-
**🟡 MINOR:**
-
**🔵 NITPICK:**
-
This template serves two purposes: forces reviewers to categorize, lets authors instantly identify which comments are blocking.
Async Review: The Sync Meeting Trap
Code review should never happen in a sync meeting. There's no "review call" at Roibase—all review is async on GitHub. Why? Our team spans 3 time zones; protecting deep work blocks is critical.
Async review discipline works like this: a reviewer examines a PR during their own focused window (usually 9 AM–12 PM). They write comments, approve or request changes. Author gets notified on their own calendar, makes fixes, re-requests review. This cycle repeats 2–3 times on average.
Exception: review deadlock—author and reviewer can't align after 3 rounds. Then a 15-minute sync call opens. But this happens 5–6 times a year, purely exceptional. Even Roibase's branding reflects async-first culture—documentation-first, meeting-last.
Ownership vs. Gatekeeping
Code review ensures quality but shouldn't become gatekeeping. We require minimum 1, maximum 2 reviewers per PR. Why 2 as the ceiling? Because waiting for 3+ approvals costs more time than the quality gains.
Reviewer selection isn't automatic—the author picks. Rule: at least one from CODEOWNERS, the other anyone. This keeps ownership with the author. The question "who should approve?" stays the author's responsibility, not leadership's.
CODEOWNERS looks like this:
# Backend
/backend/ @backend-team
/api/ @backend-team
# Frontend
/web/ @frontend-team
/mobile/ @mobile-team
# Infrastructure
/terraform/ @devops-team
/.github/ @devops-team
Every file change needs review from the relevant team—but the author still picks the person.
Retrospective: Review Metrics
Every sprint (bi-weekly), we review the numbers. Linear dashboard shows:
- Average time-to-merge (target: 36 hours)
- PR size distribution (target: 90% under 400 lines)
- Comment density (target: 3–5 per PR)
- Nitpick ratio (target: <30%)
- Review bottleneck (which reviewer has the longest queue?)
These numbers come up in retro, but without personal blame. Not "Ali's reviews are slow" but "Backend PRs average 48 hours—should we expand the reviewer pool?"
Pulling code review culture out of personal judgment and into systemic discipline isn't hard—it just needs measurable rules. Time-to-review SLAs, the 400-line rule, comment categories, async-first practice—these are the concrete tools that let Roibase maintain quality while growing for 8 years. If your review process still runs on "gut feel" and "depends," put numbers to it. Make it systematic. Quality will rise as conflict falls.