January 23, 2026 · By Jenny Guanni Qu

Who Writes the Bugs? A Deeper Look at 125,000 Kernel Vulnerabilities

Abstract

Part 2 of our kernel vulnerability analysis. Part 1 covered bug lifetimes and VulnBERT. This post digs into the human side: who introduces vulnerabilities, when they do it, and what we can do about it.

Keywords

Linux, kernel, security, vulnerabilities, VulnBERT


Part 2 of our kernel vulnerability analysis. Part 1 covered bug lifetimes and VulnBERT. This post digs into the human side: who introduces vulnerabilities, when they do it, and what we can do about it.


In Part 1, I analyzed 125,000 kernel bugs and found they hide for 2.1 years on average, with race conditions surviving over twice as long as other bug types. I built VulnBERT to catch them at commit time.

But that analysis treated bugs as abstract data points. This time I asked different questions: Who writes the buggy code? When do they write it? And who are the super-reviewers who catch bugs faster than anyone else?

The answers surprised me. Weekend commits are actually less likely to introduce vulnerabilities, but they take 45% longer to fix. Intel contributes the most bugs because they contribute the most code. And 117 "super-reviewers" catch bugs nearly twice as fast as everyone else.

The findings point toward concrete process improvements that could reduce average bug lifetime by 35%.

Key findings at a glance
117Super-reviewers who catch bugs 47% faster than average
0.88 yearsSelf-fix lifetime (vs 2.59 years for cross-fixes)
-8%Weekend commits are LESS likely to be vulnerable
+45%But weekend bugs take longer to fix (review coverage)
5.0 yearsAverage lifetime for race conditions (2x deadlocks)
~35%Estimated reduction possible with process improvements

The Super-Reviewers

Not all bug-fixers are created equal. When I ranked everyone who had fixed at least 50 bugs, a clear pattern emerged: some people consistently find bugs faster than others.

I defined a super-reviewer as someone who:

  • Has fixed 100+ bugs
  • Has an average fix lifetime 20%+ below the global average

117 people meet this criteria. And the impact is dramatic:

MetricValue
Global average bug lifetime2.1 years
Super-reviewer average1.1 years
Difference47% faster

Top 20 bug fixers Top 20 bug fixers. Green = super-reviewers (100+ fixes, 20% faster than average). The scatter plot shows volume vs speed super-reviewers cluster in the bottom-right (high volume, low lifetime).

The top 10 super-reviewers:

RankNameBugs FixedAvg Lifetime
1Arnd Bergmann2,7861.2 years
2Dan Carpenter2,7071.7 years
3Chris Wilson1,2420.5 years
4Geert Uytterhoeven1,0891.5 years
5Colin Ian King7871.5 years
6Takashi Iwai7621.3 years
7Johannes Berg6031.4 years
8Jakub Kicinski5461.5 years
9Nathan Chancellor5441.5 years
10Ville Syrjälä5220.8 years

Chris Wilson is extraordinary: 1,242 fixes at 0.5 years average. He's finding bugs 4x faster than the global average. Ville Syrjälä is similar at 0.8 years.

A note on Dan Carpenter: Beyond his 2,707 bug fixes, Dan invented the Fixes: tag convention that makes this entire analysis possible. When a kernel developer fixes a bug, they add a line like Fixes: abc123def ("original commit subject") to link the fix back to the introducing commit. This simple convention, now standard practice, created the 125,000-pair dataset we're analyzing. Dan didn't just fix more bugs than almost anyone, he built the system that lets us track them.

What makes them different? Looking at their fix patterns:

  • Both focus heavily on GPU/i915, a subsystem with strong fuzzing infrastructure
  • Both are deeply embedded in their subsystems (Chris Wilson was a lead i915 developer)
  • They fix bugs they didn't introduce. They're actively hunting, not just cleaning up their own messes

Super-reviewers analysis Super-reviewers (green) fix bugs from more unique authors, focus on drivers/gpu, and their coverage has increased from 15% to 30% over the past decade.

The coverage trend is encouraging: In 2010, super-reviewers fixed ~15% of bugs. By 2020-2024, that's risen to ~30%. More super-reviewers = faster bug discovery for the whole kernel.


Self-Fixes Are 3x Faster

Here's a finding that should change how we think about code ownership:

Fix TypeAverage Lifetime
Self-fixed (same author)0.88 years
Fixed by someone else2.59 years

Bugs fixed by the original author are found 3x faster.

Author analysis Bottom-left: Self-fixes (green) have dramatically shorter lifetimes than cross-fixes (red). Bottom-right: Top cross-company fix relationships.

The self-fix rate is 29.7%, nearly a third of bugs are eventually fixed by the person who introduced them. These developers have institutional knowledge: they understand the code's invariants, they recognize when something feels wrong, and they're watching their subsystems.

Implication: Code ownership matters. Encouraging developers to maintain long-term ownership of their code (rather than fire-and-forget contributions) could significantly reduce bug lifetimes.


The Corporate Landscape: Who Really Builds the Kernel?

Before we talk about who introduces bugs, let's look at who writes the kernel in the first place. I analyzed 1.4 million commits to map the corporate landscape.

Corporate dashboard The Linux kernel's corporate ecosystem. Intel dominates with 8.4% of commits, but independent contributors still account for 50% of all development.

The numbers are striking:

OrganizationCommitsShare
Independent/Personal706,22450.1%
Intel118,0738.4%
Red Hat71,8915.1%
Linux Foundation49,6963.5%
Linaro43,1923.1%
SUSE42,0183.0%
AMD41,2502.9%
Google39,1182.8%

Half the kernel is still built by individuals: people using gmail.com, personal domains, or university emails. The "corporate takeover" narrative is overstated. Companies contribute heavily, but the kernel remains a genuinely collaborative project.

Corporate vs independent Corporate share has grown from ~25% in 2005 to ~55% today, but independent contributors still matter enormously.

The trend is real: corporate contributions have doubled as a share of commits since 2005. But this reflects growth in total kernel development, not displacement of volunteers. Independent contributor counts have remained relatively stable.

Corporate market share Intel's dominance peaked around 2016 at 11% and has stabilized around 9-10%. AMD's share has grown dramatically since 2020.

Intel's journey is interesting. They peaked at 11% market share around 2016, driven by i915 GPU driver development. Since then, they've stabilized around 9-10% while AMD has grown from near-zero to ~5%, reflecting AMD's CPU and GPU resurgence.


Intel Introduces the Most Bugs (Because They Write the Most Code)

When I mapped author emails to organizations:

OrganizationBugs Introduced
Intel14,000
Independent (gmail.com)10,000
Red Hat7,000
kernel.org maintainers6,000
Linaro5,500
Google5,000
AMD4,000

Intel's dominance reflects contribution volume, not code quality. They maintain massive driver subsystems (i915 GPU, networking, platform code). More code = more bugs, even at the same defect rate.

In fact, Intel's bug rate is roughly proportional to their commit share. They contribute 8.4% of commits and ~11% of bugs which is slightly elevated but not dramatically. The difference may reflect the complexity of driver code vs. simpler patches.

The cross-company fix patterns are revealing:

RelationshipFixes
Mellanox → NVIDIA600+
Independent → kernel.org500+
Intel → Red Hat420+
Red Hat → Google400+

Mellanox → NVIDIA makes sense. NVIDIA acquired Mellanox, so they inherited the codebase. Independent → kernel.org shows maintainers cleaning up community contributions. The Intel ↔ Red Hat relationship reflects their collaboration on shared infrastructure (virtualization, networking).


The Weekend Effect: Fewer Bugs, But They Hide Longer

I expected late-night commits to be buggier. The data told a more nuanced story.

Vulnerability rate analysis Vulnerability introduction RATE by time. Weekend commits are actually LESS likely to be vulnerable. But when they are, the bugs take longer to find.

The counterintuitive finding: Weekend commits have a lower vulnerability introduction rate:

PeriodVulnerability Rate
Weekday average8.94%
Weekend average8.25%
Difference-8% fewer bugs

But the weekend story is more nuanced. Here's the full day-of-week breakdown:

DayVuln Rate
Monday9.14%
Tuesday8.96%
Wednesday8.37%
Thursday9.11%
Friday9.15%
Saturday9.55% (highest)
Sunday6.99% (lowest)

Saturday is actually the most dangerous day, while Sunday is remarkably safe. The weekend average only looks good because Sunday's low rate (6.99%) pulls down Saturday's high rate (9.55%).

Why the difference? Saturday may capture rushed end-of-week pushes with developers trying to finish something before the weekend truly begins. Sunday contributors, by contrast, are doing deliberate, focused work with no deadline pressure. They're choosing to code on a Sunday, not racing against a Friday cutoff.

But here's the catch: when weekend commits do introduce bugs, those bugs take much longer to fix:

PeriodBug Lifetime
Monday-Friday1.97 years
Saturday-Sunday2.87 years
Difference+45% longer

Weekend vulnerability analysis Weekend commits are 8% less likely to introduce vulnerabilities, but the bugs they do introduce take 45% longer to fix.

What's going on? Two effects:

  1. Selection effect: People who work weekends are often more experienced contributors doing careful, thoughtful work. They introduce fewer bugs per commit.
  2. Deadline effect: Saturday commits may represent rushed end-of-week pushes, explaining Saturday's elevated rate despite being a "weekend" day.
  3. Review coverage effect: But when they do introduce bugs, there are fewer reviewers around to catch them quickly. Monday morning reviewers are triaging a backlog, not scrutinizing each weekend patch.

The peak danger zone isn't weekends, it's 5 AM UTC. That's when the vulnerability rate hits 32%, more than 6x the 11 AM low of 5.1%. This corresponds to late-night coding in the Americas or very early morning in Europe.

Temporal patterns Top-left: Hourly patterns show late-night/early-morning commits have the highest vulnerability rates. Monthly patterns show April merge window peaks.

The monthly pattern is also interesting: April has the highest vulnerability rate (9.64%). This correlates with the Linux kernel merge window. Large feature branches land after the merge window opens, and the rush to get code merged before the window closes leads to more bugs.


Race Conditions: The Hardest Bugs to Find

This finding from Part 1 becomes even more striking with the full visualization:

Bug type analysis Right panel: Bug types ranked by average lifetime. Race conditions (5.0 years) are dramatically harder to find than deadlocks (2.4 years).

Bug TypeCountAvg Lifetime
race-condition1,1945.0 years
info-leak714.5 years
integer-overflow2994.2 years
use-after-free2,9783.3 years
memory-leak2,8603.2 years
null-deref4,9692.5 years
deadlock1,6912.4 years

Race conditions survive twice as long as deadlocks. This might be because deadlocks eventually hang the system and someone notices. Race conditions might corrupt state silently, triggering failures far from the actual bug.

This validates the ML approach since traditional fuzzers struggle with race conditions because they're non-deterministic. A model trained on code structure can flag suspicious patterns before they ever execute.

Also notable: "unknown" dominates the count (68,000 bugs). Our bug-type classifier is leaving signal on the table. Improving this classification could enable better-targeted detection.


The Forgotten Corners

Some subsystems are bustling cities with active maintainers. Others are ghost towns:

Subsystem analysis Left: Top subsystems by count. Right: Subsystems by average lifetime (n≥100). Red = longer-lived, green = shorter-lived.

SubsystemAvg LifetimeSample Size
drivers/can4.2 years447
networking/sctp4.0 years279
fs/ext43.7 years405
networking/ipv43.4 years1,687
usb3.3 years2,519

CAN bus drivers top the list. These are used in automotive and industrial systems. Critical infrastructure with few maintainers watching. SCTP is a niche networking protocol with similar dynamics.

Meanwhile, gpu/i915 (Intel graphics) has bugs found in ~1.4 years on average. They have dedicated fuzzing infrastructure and active super-reviewers like Chris Wilson and Ville Syrjälä.

The implication: We should prioritize scanning low-attention subsystems. A vulnerability in drivers/can that sits for 4 years is more dangerous than one in gpu/i915 that gets caught in 6 months.


Subsystem-Specific Models: 5-15% Improvement Potential

Different subsystems have different bug patterns. Should we train specialized models?

Subsystem specific analysis Top-left: Bug type distribution varies by subsystem. Top-right: Distinctiveness score (divergence from global distribution). Higher = more different = may benefit from specialized model.

SubsystemDistinctivenessRecommendation
arch/arm640.46Specialized model
networking0.37Specialized model
tools0.33Specialized model
gpu/i9150.27General model OK
drivers/net0.20General model OK

arch/arm64 and networking have the most distinct bug patterns. Their bug type distributions diverge significantly from the global average, suggesting a specialized VulnBERT trained on networking-only data might outperform the general model on networking commits.

The bottom-left panel shows lifetime variance by subsystem. Networking has both high average lifetime AND high variance, it's unpredictable. A specialized model could help.

Expected improvement: 5-15% recall increase in specialized subsystems.


Clustering Reveals Hidden Structure

I embedded 10,000 commit messages using a sentence transformer and visualized with UMAP:

Semantic clusters UMAP Commits about similar things cluster together. Top-right: Colored by lifetime. Some semantic regions have systematically longer-lived bugs (red).

The model learned, without being told, that certain types of commits indicate longer-lived bugs:

ClusterAvg LifetimeSample Commits
Cluster 2 (longest)2.4 years"afs: Handle lock rpc ops failing...", "NFSv4: Fix free of uninitialized..."
Cluster 8 (shortest)1.2 years"drm/i915/gt: Cancel the preemption timeout...", "drm/radeon: Fix spurious unplug..."

Commits mentioning "race", "refcount", "use-after-free" cluster together and have longer lifetimes. Commits about "typo", "warning", "build" cluster separately and are fixed quickly.

This suggests commit message content is predictive. A model that understands "this sounds like a race condition fix" can infer the original bug was probably hard to find.


Putting It Together: 35% Reduction in Bug Lifetime

╔══════════════════════════════════════════════════════════════════════════════╗
║            KERNEL VULNERABILITY ANALYSIS: KEY RECOMMENDATIONS                ║
╚══════════════════════════════════════════════════════════════════════════════╝

┌──────────────────────────────────────────────────────────────────────────────┐
│ 1. SUPER-REVIEWER PROGRAM                                                    │
├──────────────────────────────────────────────────────────────────────────────┤
│  FINDING: 117 super-reviewers catch bugs 47% faster than average.            │
│  ACTION: Route high-risk commits to super-reviewers first.                   │
│  EXPECTED IMPACT: 20-30% reduction in bug lifetime for reviewed commits.     │
└──────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────┐
│ 2. COMMIT MESSAGE QUALITY GATE                                               │
├──────────────────────────────────────────────────────────────────────────────┤
│  FINDING: Sparse commits (<30 chars) have ~20% longer-lived bugs.            │
│  ACTION: Flag commits with quality_score < 40 for mandatory extra review.    │
│  EXPECTED IMPACT: Catch 10-15% more bugs before merge.                       │
└──────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────┐
│ 3. SUBSYSTEM-SPECIFIC MODELS                                                 │
├──────────────────────────────────────────────────────────────────────────────┤
│  FINDING: Subsystems have distinct bug patterns (divergence up to 0.46).     │
│  ACTION: Train specialized VulnBERT for: networking, drivers                 │
│  EXPECTED IMPACT: 5-15% recall improvement in specialized subsystems.        │
└──────────────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────────────┐
│ 4. TEMPORAL CI/CD TUNING                                                     │
├──────────────────────────────────────────────────────────────────────────────┤
│  FINDING: Weekend commits have +45% longer lifetimes.                        │
│  ACTION: Dynamic thresholds - stricter during merge windows and weekends.    │
│  EXPECTED IMPACT: 15-20% reduction in escaped bugs during high-risk periods. │
└──────────────────────────────────────────────────────────────────────────────┘

╔══════════════════════════════════════════════════════════════════════════════╗
║ COMBINED EXPECTED IMPACT                                                     │
╠══════════════════════════════════════════════════════════════════════════════╣
║  Current average bug lifetime:    2.1 years                                  ║
║  With all recommendations:        ~1.4 years (estimated 35% reduction)       ║
╚══════════════════════════════════════════════════════════════════════════════╝

Combining all findings:

RecommendationFindingExpected Impact
Super-reviewer routing117 super-reviewers catch bugs 47% faster20-30% reduction for routed commits
Commit quality gateSparse commits have -20% longer-lived bugsCatch 10-15% more bugs before merge
Subsystem-specific modelsDistinctiveness up to 0.465-15% recall improvement
Temporal CI tuningWeekend commits +45% longer lifetimes15-20% reduction during high-risk periods

Combined estimate: Average bug lifetime from 2.1 years → ~1.4 years (35% reduction).


Concrete Actions

For kernel maintainers:

  1. Route high-risk commits to super-reviewers first
  2. Require extra review for weekend commits
  3. Increase CI aggressiveness during merge windows (April, July, October)
  4. Flag commits with quality_score < 40 for extra scrutiny

For security teams:

  1. Prioritize scanning low-attention subsystems (drivers/can, networking/sctp)
  2. Focus on race condition detection because they hide longest (2x other bugs)
  3. Use VulnBERT as a triage tool to prioritize human review
  4. Pay extra attention to code from the 5 AM danger zone

For researchers:

  1. Train subsystem-specific models for networking and arch/arm64
  2. Improve bug-type classification (68K "unknown" bugs)
  3. Study super-reviewer patterns for replication
  4. Investigate the late-night vulnerability spike

Limitations: What Git Doesn't Capture

This analysis is based on git commit history, but kernel development actually happens in a much older system: email.

The Linux Kernel Mailing List (LKML) receives roughly 1,400 messages per day, patches, reviews, debates, and flame wars. Developers submit patches by email (often using git send-email), maintainers review them inline, and only the final accepted version gets committed to git. The review discussion, multiple patch iterations, and rejected submissions live entirely in the mailing list archives at lore.kernel.org.

This means our dataset captures outcomes, not process:

What we seeWhat we don't see
Final committed patchesRejected patch attempts
Bug-fix commitsReview discussions that caught issues
Fixes: tag relationshipsMultiple iterations before acceptance
Author/committer timestampsTime spent in email review

Implications for this analysis:

  1. Super-reviewer metrics are understated. Our data shows who commits fixes, but reviewers who catch bugs before they're merged don't appear in git history. The real impact of good review practices is likely much larger than what we measure.
  2. Bug introduction rates may reflect review coverage, not just author skill. A patch that gets heavy LKML scrutiny before merge is less likely to contain bugs than one that slips through with minimal review but both look identical in git.
  3. Some security fixes are deliberately quiet. Researchers have found that some commits never appeared on public mailing lists before being merged often because they were discussed on the private kernel security list. Our dataset likely includes stealth security fixes we can't distinguish from routine patches.

The mailing list workflow is sometimes criticized as outdated compared to GitHub-style pull requests, but it scales remarkably well. The kernel has accepted over 1.4 million commits from thousands of contributors over 20+ years. For researchers, lore.kernel.org provides a searchable archive of the full development history, though correlating email threads to git commits remains challenging.

Future work could mine the mailing list archives directly to measure review intensity and its effect on bug rates.

Thanks to Nikolai Kondrashov (Red Hat CKI) for helpful discussions on the mailing list workflow.


The Dataset

All analysis scripts and visualizations are available:

Dataset: huggingface.co/datasets/pebblebed/kernel-vuln-dataset

Repository: github.com/quguanni/kernel-archaeology


Key Takeaways

  1. 117 super-reviewers catch bugs 47% faster. Route high-risk commits to them.
  2. Self-fixes are 3x faster. Code ownership matters.
  3. Weekend commits are 8% LESS likely to be vulnerable but take 45% longer to fix. Saturday is actually the riskiest day (9.55%); Sunday is the safest (6.99%). Review coverage matters more than individual vigilance.
  4. Race conditions hide twice as long as other bugs. Traditional fuzzers miss them. This validates ML-based detection.
  5. Subsystems have distinct patterns. Specialized models could improve recall by 5-15%.
  6. Combined impact: 35% reduction in average bug lifetime is achievable with process improvements.

Part 1: Kernel bugs hide for 2 years on average. Some hide for 20.

Questions? jenny@pebblebed.com