# Kernel bugs hide for 2 years on average. Some hide for 20.

*January 7, 2026 · By Jenny Guanni Qu*

> A mining expedition through two decades of Linux kernel commits to understand how long bugs really hide.

There are bugs in your kernel right now that won't be found for years. I know because I analyzed 125,183 of them, every bug with a traceable `Fixes:` tag in the Linux kernel's 20-year git history.

The average kernel bug lives **2.1 years** before discovery. But some subsystems are far worse: CAN bus drivers average **4.2 years**, SCTP networking **4.0 years**. The longest-lived bug in my dataset, a buffer overflow in ethtool, sat in the kernel for **20.7 years**. The one which I'll dissect in detail is refcount leak in netfilter, and it lasted **19 years**.

I built a tool that catches 92% of historical bugs in a held-out test set at commit time. Here's what I learned.

| Key findings at a glance | |
|---|---|
| **125,183** | Bug-fix pairs with traceable `Fixes:` tags |
| **123,696** | Valid records after filtering (0 < lifetime < 27 years) |
| **2.1 years** | Average time a bug hides before discovery |
| **20.7 years** | Longest-lived bug (ethtool buffer overflow) |
| **0% → 69%** | Bugs found within 1 year (2010 vs 2022) |
| **92.2%** | Recall of VulnBERT on held-out 2024 test set |
| **1.2%** | False positive rate (vs 48% for vanilla CodeBERT) |

## The initial discovery

I started by mining the most recent 10,000 commits with `Fixes:` tags from the Linux kernel. After filtering out invalid references (commits that pointed to hashes outside the repo, malformed tags, or merge commits), I had **9,876 valid vulnerability records**. For the lifetime analysis, I excluded 27 same-day fixes (bugs introduced and fixed within hours), leaving **9,849 bugs** with meaningful lifetimes.

The results were striking:

| Metric | Value |
|---|---|
| Bugs analyzed | 9,876 |
| Average lifetime | **2.8 years** |
| Median lifetime | 1.0 year |
| Maximum | 20.7 years |

Almost 20% of bugs had been hiding for **5+ years**. The networking subsystem looked particularly bad at **5.1 years** average. I found a refcount leak in netfilter that had been in the kernel for **19 years**.

![Initial Bug Lifetime Distribution](/blog/kernel-bugs/bug_lifetime_histogram.png)
*Initial findings: Half of bugs found within a year, but 20% hide for 5+ years.*

But something nagged at me: my dataset only contained fixes from 2025. Was I seeing the full picture, or just the tip of the iceberg?

## Going deeper: Mining the full history

I rewrote my miner to capture **every** `Fixes:` tag since Linux moved to git in 2005. Six hours later, I had 125,183 vulnerability records which was 12x larger than my initial dataset.

The numbers changed significantly:

| Metric | 2025 Only | Full History (2005-2025) |
|---|---|---|
| **Bugs analyzed** | 9,876 | **125,183** |
| **Average lifetime** | 2.8 years | **2.1 years** |
| **Median lifetime** | 1.0 year | **0.7 years** |
| **5+ year bugs** | 19.4% | **13.5%** |
| **10+ year bugs** | 6.6% | **4.2%** |

![Full Dataset Bug Lifetime Distribution](/blog/kernel-bugs/bug_lifetime_histogram_full.png)
*Full history: 57% of bugs found within a year. The long tail is smaller than it first appeared.*

**Why the difference?** My initial 2025-only dataset was biased. Fixes in 2025 include:

- **New bugs** introduced recently and caught quickly
- **Ancient bugs** that finally got discovered after years of hiding

The ancient bugs skewed the average upward. When you include the full history with all the bugs that were introduced AND fixed within the same year, the average drops from 2.8 to 2.1 years.

## The real story: We're getting faster (but it's complicated)

The most striking finding from the full dataset: **bugs introduced in recent years appear to get fixed much faster**.

| Year Introduced | Bugs | Avg Lifetime | % Found <1yr |
|---|---|---|---|
| 2010 | 1,033 | 9.9 years | 0% |
| 2014 | 3,991 | 3.9 years | 31% |
| 2018 | 11,334 | 1.7 years | 54% |
| 2022 | 11,090 | 0.8 years | 69% |

Bugs introduced in 2010 took nearly 10 years to find and bugs introduced in 2024 are found in 5 months. At first glance it looks like a 20x improvement!

But here's the catch: **this data is right-censored**. Bugs introduced in 2022 *can't* have a 10-year lifetime yet since we're only in 2026. We might find more 2022 bugs in 2030 that bring the average up.

The fairer comparison is "% found within 1 year" and that IS improving: from 0% (2010) to 69% (2022). That's real progress, likely driven by:

- Syzkaller (released 2015)
- KASAN, KMSAN, KCSAN sanitizers
- Better static analysis
- More contributors reviewing code

**But there's a backlog.** When I look at just the bugs fixed in 2024-2025:

- 60% were introduced in the last 2 years (new bugs, caught quickly)
- 18% were introduced 5-10 years ago
- **6.5% were introduced 10+ years ago**

We're simultaneously catching new bugs faster AND slowly working through ~5,400 ancient bugs that have been hiding for over 5 years.

## The methodology

The kernel has a convention: when a commit fixes a bug, it includes a `Fixes:` tag pointing to the commit that introduced the bug.

```c
commit de788b2e6227
Author: Florian Westphal <fw@strlen.de>
Date:   Fri Aug 1 17:25:08 2025 +0200

    netfilter: ctnetlink: fix refcount leak on table dump

    Fixes: d205dc40798d ("netfilter: ctnetlink: ...")
```

I wrote a miner that:

1. Runs `git log --grep="Fixes:"` to find all fixing commits
2. Extracts the referenced commit hash from the `Fixes:` tag
3. Pulls dates from both commits
4. Classifies subsystem from file paths (70+ patterns)
5. Detects bug type from commit message keywords
6. Calculates the lifetime

```python
fixes_pattern = r'Fixes:\s*([0-9a-f]{12,40})'
match = re.search(fixes_pattern, commit_message)
if match:
    introducing_hash = match.group(1)
    lifetime_days = (fixing_date - introducing_date).days
```

**Dataset details:**

| Parameter | Value |
|---|---|
| Kernel version | v6.19-rc3 |
| Mining date | January 6, 2026 |
| Fixes mined since | 2005-04-16 (git epoch) |
| Total records | 125,183 |
| Unique fixing commits | 119,449 |
| Unique bug-introducing authors | 9,159 |
| With CVE ID | 158 |
| With Cc: stable | 27,875 (22%) |

**Coverage note:** The kernel has ~448,000 commits mentioning "fix" in some form, but only ~124,000 (28%) use proper `Fixes:` tags. My dataset captures the well-documented bugs aka the ones where maintainers traced the root cause.

## It varies by subsystem

Some subsystems have bugs that persist far longer than others:

| Subsystem | Bug Count | Avg Lifetime |
|---|---|---|
| drivers/can | 446 | **4.2 years** |
| networking/sctp | 279 | **4.0 years** |
| networking/ipv4 | 1,661 | **3.6 years** |
| usb | 2,505 | 3.5 years |
| tty | 1,033 | 3.5 years |
| netfilter | 1,181 | 2.9 years |
| networking | 6,079 | 2.9 years |
| memory | 2,459 | 1.8 years |
| gpu | 5,212 | 1.4 years |
| bpf | 959 | **1.1 years** |

![Bug Lifetime by Subsystem](/blog/kernel-bugs/subsystem_comparison_full.png)
*CAN bus and SCTP bugs persist longest. BPF and GPU bugs get caught fastest.*

CAN bus drivers and SCTP networking have bugs that persist longest probably because both are niche protocols with less testing coverage. GPU (especially Intel i915) and BPF bugs get caught fastest, probably thanks to dedicated fuzzing infrastructure.

**Interesting finding from comparing 2025-only vs full history:**

| Subsystem | 2025-only Avg | Full History Avg | Difference |
|---|---|---|---|
| networking | 5.2 years | 2.9 years | **-2.3 years** |
| filesystem | 3.8 years | 2.6 years | -1.2 years |
| drivers/net | 3.3 years | 2.2 years | -1.1 years |
| gpu | 1.4 years | 1.4 years | 0 years |

Networking looked terrible in the 2025-only data (5.2 years!) but is actually closer to average in the full history (2.9 years). The 2025 fixes were catching a backlog of ancient networking bugs. GPU looks the same either way, and those bugs get caught consistently fast.

## Some bug types hide longer than others

Race conditions are the hardest to find, averaging **5.1 years** to discovery:

| Bug Type | Count | Avg Lifetime | Median |
|---|---|---|---|
| race-condition | 1,188 | **5.1 years** | 2.6 years |
| integer-overflow | 298 | **3.9 years** | 2.2 years |
| use-after-free | 2,963 | 3.2 years | 1.4 years |
| memory-leak | 2,846 | 3.1 years | 1.4 years |
| buffer-overflow | 399 | 3.1 years | 1.5 years |
| refcount | 2,209 | 2.8 years | 1.3 years |
| null-deref | 4,931 | 2.2 years | 0.7 years |
| deadlock | 1,683 | 2.2 years | 0.8 years |

Why do race conditions hide so long? They're non-deterministic and only trigger under specific timing conditions that might occur once per million executions. Even sanitizers like KCSAN can only flag races they observe.

**30% of bugs are self-fixes** where the same person who introduced the bug eventually fixed it. I guess code ownership matters.

## Why some bugs hide longer

**Less fuzzing coverage.** Syzkaller excels at syscall fuzzing but struggles with stateful protocols. Fuzzing netfilter effectively requires generating valid packet sequences that traverse specific connection tracking states.

**Harder to trigger.** Many networking bugs require:

- Specific packet sequences
- Race conditions between concurrent flows
- Memory pressure during table operations
- Particular NUMA topologies

**Older code with fewer eyes.** Core networking infrastructure like `nf_conntrack` was written in the mid-2000s. It works, so nobody rewrites it. But "stable" means fewer developers actively reviewing.

## Case study: 19 years in the kernel

One of the oldest networking bug in my dataset was introduced in **August 2006** and fixed in **August 2025**:

```c
// ctnetlink_dump_table() - the buggy code path
if (res < 0) {
    nf_conntrack_get(&ct->ct_general);  // increments refcount
    cb->args[1] = (unsigned long)ct;
    break;
}
```

**The irony:** Commit `d205dc40798d` was itself a fix: "[NETFILTER]: ctnetlink: fix deadlock in table dumping". Patrick McHardy was fixing a deadlock by removing a `_put()` call. In doing so, he introduced a refcount leak that would persist for 19 years.

The bug: the code doesn't check if `ct == last`. If the current entry is the same as the one we already saved, we've now incremented its refcount twice but will only decrement it once. The object never gets freed.

```c
// What should have been checked:
if (res < 0) {
    if (ct != last)  // <-- this check was missing for 19 years
        nf_conntrack_get(&ct->ct_general);
    cb->args[1] = (unsigned long)ct;
    break;
}
```

**The consequence:** Memory leaks accumulate. Eventually `nf_conntrack_cleanup_net_list()` waits forever for the refcount to hit zero. The netns teardown hangs. If you're using containers, this blocks container cleanup indefinitely.

**Why it took 19 years:** You had to run `conntrack_resize.sh` in a loop for ~20 minutes under memory pressure. The fix commit says: "This can be reproduced by running conntrack_resize.sh selftest in a loop. It takes ~20 minutes for me on a preemptible kernel." Nobody ran that specific test sequence for two decades.

## Incomplete fixes are common

Here's a pattern I keep seeing: someone notices undefined behavior, ships a fix, but the fix doesn't fully close the hole.

**Case study: netfilter set field validation**

| Date | Commit | What happened |
|---|---|---|
| Jan 2020 | `f3a2181e16f1` | Stefano Brivio adds support for sets with multiple ranged fields. Introduces `NFTA_SET_DESC_CONCAT` for specifying field lengths. |
| Jan 2024 | `3ce67e3793f4` | Pablo Neira notices the code doesn't validate that field lengths sum to the key length. Ships a fix. Commit message: "I did not manage to crash nft_set_pipapo with mismatch fields and set key length so far, but this is UB which must be disallowed." |
| Jan 2025 | `1b9335a8000f` | Security researcher finds a bypass. The 2024 fix was incomplete—there were still code paths that could mismatch. Real fix shipped. |

The 2024 fix was an acknowledgment that something was wrong, but Pablo couldn't find a crash, so the fix was conservative. A year later, someone found the crash.

**This pattern suggests a detection opportunity:** commits that say things like "this is undefined behavior" or "I couldn't trigger this but..." are flags. The author knows something is wrong but hasn't fully characterized the bug. These deserve extra scrutiny.

## The anatomy of a long-lived bug

Looking at the bugs that survive 10+ years, I see common patterns:

**1. Reference counting errors**

```c
kref_get(&obj->ref);
// ... error path returns without kref_put()
```

These don't crash immediately. They leak memory slowly. In a long-running system, you might not notice until months later when OOM killer starts firing.

**2. Missing NULL checks after dereference**

```c
struct foo *f = get_foo();
f->bar = 1;              // dereference happens first
if (!f) return -EINVAL;  // check comes too late
```

The compiler might optimize away the NULL check since you already dereferenced. These survive because the pointer is rarely NULL in practice.

**3. Integer overflow in size calculations**

```c
size_t total = n_elements * element_size;  // can overflow
buf = kmalloc(total, GFP_KERNEL);
memcpy(buf, src, n_elements * element_size);  // copies more than allocated
```

If `n_elements` comes from userspace, an attacker can cause allocation of a small buffer followed by a large copy.

**4. Race conditions in state machines**

```c
spin_lock(&lock);
if (state == READY) {
    spin_unlock(&lock);
    // window here where another thread can change state
    do_operation();  // assumes state is still READY
}
```

These require precise timing to hit. They might manifest as rare crashes that nobody can reproduce.

## Can we catch these bugs automatically?

Every day a bug lives in the kernel is another day millions of devices are vulnerable. Android phones, servers, embedded systems, cloud infrastructure, all running kernel code with bugs that won't be found for years.

I built VulnBERT, a model that predicts whether a commit introduces a vulnerability.

**Model evolution:**

| Model | Recall | FPR | F1 | Notes |
|---|---|---|---|---|
| Random Forest | 76.8% | 15.9% | 0.80 | Hand-crafted features only |
| CodeBERT (fine-tuned) | 89.2% | 48.1% | 0.65 | High recall, unusable FPR |
| **VulnBERT** | **92.2%** | **1.2%** | **0.95** | Best of both approaches |

**The problem with vanilla CodeBERT:** I first tried fine-tuning CodeBERT directly. Results: 89% recall but **48% false positive rate** (measured on the same test set). Unusable, flagging half of all commits.

Why so bad? CodeBERT learns shortcuts: "big diff = dangerous", "lots of pointers = risky". These correlations exist in training data but don't generalize. The model pattern-matches on surface features, not actual bug patterns.

**The VulnBERT approach:** Combine neural pattern recognition with human domain expertise.

```
┌─────────────────────────────────────────────────────────────────────┐
│                            INPUT: Git Diff                          │
└───────────────────────────────┬─────────────────────────────────────┘
                                │
                ┌───────────────┴───────────────┐
                ▼                               ▼
┌───────────────────────────┐   ┌───────────────────────────────────┐
│   Chunked Diff Encoder    │   │   Handcrafted Feature Extractor   │
│   (CodeBERT + Attention)  │   │   (51 engineered features)        │
└─────────────┬─────────────┘   └─────────────────┬─────────────────┘
              │ [768-dim]                         │ [51-dim]
              └───────────────┬───────────────────┘
                              ▼
              ┌───────────────────────────────┐
              │     Cross-Attention Fusion    │
              │     "When code looks like X,  │
              │      feature Y matters more"  │
              └───────────────┬───────────────┘
                              ▼
              ┌───────────────────────────────┐
              │        Risk Classifier        │
              └───────────────────────────────┘
```
**Three innovations that drove performance:**

**1. Chunked encoding for long diffs.** CodeBERT's 512-token limit truncates most kernel diffs (often 2000+ tokens). I split into chunks, encode each, then use learned attention to aggregate:

```python
# Learnable attention over chunks
chunk_attention = nn.Sequential(
    nn.Linear(hidden_size, hidden_size // 4),
    nn.Tanh(),
    nn.Linear(hidden_size // 4, 1)
)
attention_weights = F.softmax(chunk_attention(chunk_embeddings), dim=1)
pooled = (attention_weights * chunk_embeddings).sum(dim=1)
```

The model learns which chunks matter aka the one with `spin_lock` without `spin_unlock`, not the boilerplate.

**2. Feature fusion via cross-attention.** Neural networks miss domain-specific patterns. I extract 51 handcrafted features using regex and AST-like analysis of the diff:

| Category | Features |
|---|---|
| **Basic (4)** | `lines_added`, `lines_removed`, `files_changed`, `hunks_count` |
| **Memory (3)** | `has_kmalloc`, `has_kfree`, `has_alloc_no_free` |
| **Refcount (5)** | `has_get`, `has_put`, `get_count`, `put_count`, `unbalanced_refcount` |
| **Locking (5)** | `has_lock`, `has_unlock`, `lock_count`, `unlock_count`, `unbalanced_lock` |
| **Pointers (4)** | `has_deref`, `deref_count`, `has_null_check`, `has_deref_no_null_check` |
| **Error handling (6)** | `has_goto`, `goto_count`, `has_error_return`, `has_error_label`, `error_return_count`, `has_early_return` |
| **Semantic (13)** | `var_after_loop`, `iterator_modified_in_loop`, `list_iteration`, `list_del_in_loop`, `has_container_of`, `has_cast`, `cast_count`, `sizeof_type`, `sizeof_ptr`, `has_arithmetic`, `has_shift`, `has_copy`, `copy_count` |
| **Structural (11)** | `if_count`, `else_count`, `switch_count`, `case_count`, `loop_count`, `ternary_count`, `cyclomatic_complexity`, `max_nesting_depth`, `function_call_count`, `unique_functions_called`, `function_definitions` |

The key bug-pattern features:

```python
'unbalanced_refcount': 1,    # kref_get without kref_put → leak
'unbalanced_lock': 1,        # spin_lock without spin_unlock → deadlock
'has_deref_no_null_check': 0,# *ptr without if(!ptr) → null deref
'has_alloc_no_free': 0,      # kmalloc without kfree → memory leak
```

Cross-attention learns *conditional* relationships. When CodeBERT sees locking patterns AND `unbalanced_lock=1`, that's HIGH risk. Neither signal alone is sufficient, it's the combination.

```python
# Feature fusion via cross-attention
feature_embedding = feature_projection(handcrafted_features)  # 51 → 768
attended, _ = cross_attention(
    query=code_embedding,      # What patterns does the code have?
    key=feature_embedding,     # What do the hand-crafted features say?
    value=feature_embedding
)
fused = fusion_layer(torch.cat([code_embedding, attended], dim=-1))
```

**3. Focal loss for hard examples.** The training data is imbalanced where most commits are safe. Standard cross-entropy wastes gradient updates on easy examples. Focal loss:

```
Standard loss when p=0.95 (easy):  0.05
Focal loss when p=0.95:            0.000125  (400x smaller)
```

The model focuses on ambiguous commits: the hard 5% that matter.

**Impact of each component** (estimated from ablation experiments):

| Component | F1 Score |
|---|---|
| CodeBERT baseline | ~76% |
| + Focal loss | ~80% |
| + Feature fusion | ~88% |
| + Contrastive learning | ~91% |
| **Full VulnBERT** | **95.4%** |

Note: Individual component impacts are approximate; interactions between components make precise attribution difficult.

The key insight: neither neural networks nor hand-crafted rules alone achieve the best results. The combination does.

**Results on temporal validation** (train ≤2023, test 2024):

| Metric | Target | Result |
|---|---|---|
| Recall | 90% | **92.2%** ✓ |
| FPR | <10% | **1.2%** ✓ |
| Precision | — | 98.7% |
| F1 | — | 95.4% |
| AUC | — | 98.4% |

*What these metrics mean:*

- **Recall (92.2%)**: Of all actual bug-introducing commits, we catch 92.2%. Missing 7.8% of bugs.
- **False Positive Rate (1.2%)**: Of all safe commits, we incorrectly flag 1.2%. Low FPR = fewer false alarms.
- **Precision (98.7%)**: Of commits we flag as risky, 98.7% actually are. When we raise an alarm, we're almost always right.
- **F1 (95.4%)**: Harmonic mean of precision and recall. Single number summarizing overall performance.
- **AUC (98.4%)**: Area under ROC curve. Measures ranking quality—how well the model separates bugs from safe commits across all thresholds.

The model correctly differentiates the **same bug** at different stages:

| Commit | Description | Risk |
|---|---|---|
| `acf44a2361b8` | **Fix** for UAF in xe_vfio | 12.4% LOW ✓ |
| `1f5556ec8b9e` | **Introduced** the UAF | 83.8% HIGH ✓ |

### What the model sees: The 19-year bug

When analyzing the bug-introducing commit `d205dc40798d`:

```diff
-    if (ct == last) {
-        nf_conntrack_put(&last->ct_general);  // removed!
-    }
+    if (ct == last) {
+        last = NULL;
         continue;
     }
     if (ctnetlink_fill_info(...) < 0) {
         nf_conntrack_get(&ct->ct_general);  // still here
```

**Extracted features:**

| Feature | Value | Signal |
|---|---|---|
| `get_count` | 1 | `nf_conntrack_get()` present |
| `put_count` | 0 | `nf_conntrack_put()` was removed |
| `unbalanced_refcount` | **1** | Mismatch detected |
| `has_lock` | 1 | Uses `read_lock_bh()` |
| `list_iteration` | 1 | Uses `list_for_each_prev()` |

**Model prediction:** 72% risk: HIGH

The `unbalanced_refcount` feature fires because `_put()` was removed but `_get()` remains. Classic refcount leak pattern.

## Limitations

**Dataset limitations:**

- Only captures bugs with `Fixes:` tags (~28% of fix commits). Selection bias: well-documented bugs tend to be more serious.
- Mainline only, doesn't include stable-branch-only fixes or vendor patches
- Subsystem classification is heuristic-based (regex on file paths)
- Bug type detection based on keyword matching in commit messages and many bugs are "unknown" type
- Lifetime calculation uses author dates, not commit dates, rebasing can skew timestamps
- Some "bugs" may be theoretical (comments like "fix possible race" without confirmed trigger)

**Model limitations:**

- 92.2% recall is on a **held-out 2024 test set**, not a guarantee for future bugs
- Can't catch semantic bugs (logic errors with no syntactic signal)
- Cross-function blind spots (bug spans multiple files)
- Training data bias (learns patterns from bugs that *were found*, novel patterns may be missed)
- False positives on intentional patterns (init/cleanup in different commits)
- Tested only on Linux kernel code, may not generalize to other codebases

**Statistical limitations:**

- Survivorship bias in year-over-year comparisons (recent bugs can't have long lifetimes yet)
- Correlation ≠ causation for subsystem/bug-type lifetime differences

**What this means:** VulnBERT is a triage tool, not a guarantee. It catches 92% of bugs with recognizable patterns. The remaining 8% and novel bug classes still need human review and fuzzing.

## What's next

92.2% recall with 1.2% FPR is production-ready. But there's more to do:

- **RL-based exploration**: Instead of static pattern matching, train an agent to explore code paths and find bugs autonomously. The current model predicts risk; an RL agent could *generate* triggering inputs.
- **Syzkaller integration**: Use fuzzer coverage as a reward signal. If the model flags a commit and Syzkaller finds a crash in that code path, that's strong positive signal.
- **Subsystem-specific models**: Networking bugs have different patterns than driver bugs. A model fine-tuned on netfilter might outperform the general model on netfilter commits.

The goal isn't to replace human reviewers but to point them at the 10% of commits most likely to be problematic, so they can focus attention where it matters.

## Reproducing this

The dataset extraction uses the kernel's `Fixes:` tag convention. Here's the core logic:

```python
def extract_fixes_tag(commit_msg: str) -> Optional[str]:
    """Extract the commit ID from a Fixes: tag"""
    pattern = r'Fixes:\s*([a-f0-9]{12,40})'
    match = re.search(pattern, commit_msg, re.IGNORECASE)
    return match.group(1) if match else None

# Mine all Fixes: tags from git history
git log --since="2005-04-16" --grep="Fixes:" --format="%H"

# For each fixing commit:
#   - Extract introducing commit hash
#   - Get dates from both commits
#   - Calculate lifetime
#   - Classify subsystem from file paths
```

Full miner code and dataset: [github.com/quguanni/kernel-vuln-data](https://github.com/quguanni/kernel-vuln-data)

---

## TL;DR

- **125,183 bugs** analyzed from 20 years of Linux kernel git history (123,696 with valid lifetimes)
- **Average bug lifetime:** 2.1 years (2.8 years in 2025-only data due to survivorship bias in recent fixes)
- **0% → 69%** of bugs found within 1 year (2010 vs 2022) (real improvement from better tooling)
- **13.5% of bugs hide for 5+ years** (these are the dangerous ones)
- Race conditions hide longest (**5.1 years** average)
- **VulnBERT catches 92.2%** of bugs on held-out 2024 test set with only 1.2% FPR (98.4% AUC)
- **Dataset:** [github.com/quguanni/kernel-vuln-data](https://github.com/quguanni/kernel-vuln-data)

---

*If you're working on kernel security, vulnerability detection, or ML for code analysis, I'd love to talk: jenny@pebblebed.com*

---

**Keywords:** Linux, kernel, security