June 10, 2026

Blog

The Future of Visual Field Loss Staging Is Data-Driven — And We Have the Data to Build It

Jeremy Barlow

Glaucoma is the world’s leading cause of irreversible blindness, and the visual field test is how we measure the vision it steals. So here’s a question worth sitting with: if two glaucoma specialists look at the same visual field, will they agree on how severe it is? Often, they won’t. We built something to fix that and then we pointed it at more than three million real-world tests to see what it would tell us. Some of what we found surprised even us.

The problem we kept running into

Visual field testing is the gold standard for measuring functional vision loss. But the way we classify that loss into “mild,” “moderate,” or “advanced” has stayed stubbornly subjective. Most clinicians have relied on the Hodapp-Parrish-Anderson framework since 1993, but its thresholds were set by expert consensus, not derived from data. They reflect what a group of specialists agreed on three decades ago, not how vision loss actually distributes across real patients.

That subjectivity has real consequences given the same test can produce different staging depending on who reads it. This leads to inconsistent referrals, makes progression harder to track, and makes it nearly impossible to compare outcomes across clinics or clinical studies. Layer on rising disease rates and a shortage of specialists, and the variability stops being an academic concern and it becomes a real barrier to patient care. And notably, the legacy framework starts at “mild.” It has no concept of “normal,” so it was never built to help catch disease early or to give clinicians a clean baseline to monitor from.

Why we built a Severity Index

We wanted a way to measure visual field loss severity that gives the same answer no matter who is reading the test or where it was performed. At this point, it is worth noting what this is not: a replacement for clinical judgment. A diagnosis still depends on intraocular pressure, imaging, and the full exam. The Carrot severity Index is a consistent, objective, automatic label that travels with the result. Something a clinician can trust to mean the same thing on every visit, for every patient, in every office.

Just as important, we wanted it to start at normal. If a tool can recognize a healthy field with the same rigor it recognizes an advanced one, it can help optometrists rule out early disease with more confidence, flag patients trending toward loss before they cross a threshold, and give everyone a cleaner baseline for tracking change over time.

How we built it — the science, in plain language

We took a deliberately hands-off approach: let the data define the categories, instead of telling the data what we expected to find. It’s a two-stage machine learning method with zero human labeling.

First, we analyzed 50,000 visual fields from 50,000 eyes captured on the Carrot platform and let an unsupervised algorithm (K-means) group them by their underlying patterns of loss. We didn’t tell it how many groups to find. It found four groups, which held up under objective quality checks and was 99% reproducible when we re-ran it on random slices of the data. The four groups lined up cleanly with what clinicians would call normal, early, moderate, and advanced disease.

Second, we used a Bayes minimum-error classifier to find the mathematically optimal cut-points between those groups, landing on mean-deviation thresholds of −2.3, −9.6, and −17.3 dB. The result is a fixed, deterministic algorithm: the same test always produces the same stage. No variability, no “generative AI,” no hallucinations — just math calibrated to our device. Reassuringly, our thresholds land close to what clinicians have used by intuition for years, which tells us we didn’t reinvent the wheel. We just replaced the guesswork underneath it.

What three million tests told us

Once the index was in place, we ran one of the largest real-world visual-field analyses we’re aware of: more than 3 million reliable tests from over 965,000 patients and 1.9 million eyes. A few findings stood out.

55% of eyes were already abnormal at their very first test. And 1 in 13 were already at the advanced stage before we ever saw them.

This is the number I can’t stop thinking about. More than half of eyes were already showing measurable loss at their first test, and 7.4% were already advanced, climbing to 10.4% of patients when you look at their worse eye. People are arriving late. That’s the strongest argument I’ve seen for making visual field screening more accessible and more routine.

Severity stage at patients’ first recorded visual field (1.9 million eyes).

Late presentation gets worse with age, which is what you’d expect, but the scale is stark. Advanced disease at first test rose from 4.5% of eyes in patients under 40 to 13% in those 80 and older. We also saw real geographic spread: in some states, 12- 13% of eyes presented at the advanced stage, versus around 7% nationally. This is a reminder that access and referral patterns vary enormously by region.

Because the index also tracks change over time, we could watch how severity moves across hundreds of thousands of eyes with repeat testing. One clinically intuitive pattern fell right out of the data: the more severe an eye’s disease, the more frequently it was tested. Median time between tests tightened from about 360 days for normal eyes to roughly 200 days for advanced ones. A consistent severity label simply makes that triage easier to do well.

Where this goes next

None of this replaces a clinician. The severity index is one input in a broader picture that still includes pressure, imaging, and the exam itself. But consistency is a quiet superpower in medicine. When a measurement means the same thing every time, you can compare it across visits, across providers, and across years. You can finally have an honest conversation about whether a patient is getting better, holding steady, or slipping.

We set out to take the subjectivity out of visual field loss tracking. Millions of tests later, the data is telling us something bigger: too many people are being caught too late. If an objective, automatic severity label can help even a fraction of them get seen sooner, that’s a problem worth solving.