Cookie	Type	Description
ckOConsentList	Service supply	Stores consent for each "tracker" type cookie.
oleaAuthForDownload	Service supply	Stores authentication related to case report downloads
wordpress_*	User authentication	Customer area only. This cookie will not be set if you do not log in to your account.
wordpress_logged_in_*	User authentication	Customer area only. This cookie will not be set if you do not log in to your account.
wordpress_sec_*	User authentication	Customer area only. This cookie will not be set if you do not log in to your account.
wp-settings-*	User authentication	Customer area only. This cookie will not be set if you do not log in to your account.
wp-settings-time-*	User authentication	Customer area only. This cookie will not be set if you do not log in to your account.
wp-postpass_*	User authentication	Customer area only. This cookie will not be set if you do not log in to your account.
wp-wpml_current_language	Service supply	Stores the current language
_ga	Tracking	Google analytics
_gid	Tracking	Google analytics
_gat_gtag_	Tracking	Google analytics

The limits of visual assessment
Why atrophy is difficult to interpret
From observation to measurement: the contribution of volumetry
What quantification changes (and its limits)

Brain Atrophy: Moving Beyond Visual Assessment

Vidal Laura

Biomedical engineer and radiologic technologist, clinical marketing specialist

On the screen, the image is sharp. A sliced brain, gray and white, silent, almost elegant in its geometry. The radiologist zooms in, compares, steps back, comes back. He has seen thousands of MRIs, he knows the pitfalls, the variants, the “false friends.” And yet he hesitates over a word. “Mild atrophy.” “Moderate atrophy.” “Appearance compatible with age.” One adjective too many can alarm a family. One adjective too few can delay care. In the medicine of the brain, that hesitation is not a stylistic detail: it is the heart of the problem.

Because brain atrophy is not an object one points to like a fracture. It is an erasure. A loss of volume that sets in, often slowly, sometimes in patches, sometimes diffusely. And in neurodegenerative diseases, what matters is not only “what one sees” at a given moment, but the trajectory: where it begins, how quickly it progresses, and according to what pattern.

For a long time, imaging served first and foremost to exclude: a tumor, a hematoma, hydrocephalus, a stroke. Today, it is increasingly called upon to qualify, monitor, and objectify. European recommendations and clinical practice converge on one point: structural imaging (CT or MRI, and preferably MRI when available) is recommended at least once in the work-up of a patient presenting with cognitive impairment. But that “at least once” conceals a difficulty: when imaging becomes central, one can no longer be satisfied with impressionistic language.

That is where the question arises head-on: can we go on judging brain atrophy “by eye,” the way one might gauge the weather by looking at the sky? Or must we, as other disciplines have done, move beyond the adjective and enter into measurement?

The reign of the gaze and its gray zones

Visual reading is not archaic. It is fast. It is embedded in the medical act. It rests on expertise that, in many cases, is remarkable: an experienced neuroradiologist recognizes suggestive atrophy patterns, distinguishes mesial temporal atrophy from ordinary aging, spots asymmetric frontal involvement, posterior predominance, a cortex thinning in places.

To make that expertise more shareable, medicine has created visual scales: grids that transform “impression” into a score. The most famous is probably the Scheltens scale for medial temporal atrophy (often used when Alzheimer’s disease is suspected), which proposes a multi-level rating based on coronal images. This kind of tool has a simple ambition: reduce variability in descriptions and allow different readers to speak a common language.

But visual scales do not abolish human variability. They frame it, without always resolving it.

As early as the 1990s, studies were measuring agreement between readers on these scales. In Scheltens’ foundational 1995 study on the visual rating of medial temporal atrophy, full agreement among four raters was achieved only in part of the scans, and the very aim of the article was to estimate that inter-observer variability. A few years later, another study from the same group, evaluating a qualitative atrophy scale in several regions, concluded that reproducibility between raters was globally insufficient in an elderly population that included demented and non-demented patients.

Put more simply: even with a grid, two specialists may not tell exactly the same story from the same image. And it is precisely in early, subtle, borderline forms that these discrepancies matter most.

Why atrophy is so hard to “see”

The human eye is very good at recognizing patterns, and less good at measuring fine differences when they are spread across space and time.

The first trap is normal aging. On average, the brain loses volume with age. Sulci become more marked, ventricles appear wider. This evolution is not identical for everyone: genetics, vascular factors, level of education, general health – everything plays a role. There is therefore a vast gray zone between “normal” and “pathological,” especially before symptoms become obvious.

The second trap is implicit comparison. When a radiologist writes “moderate atrophy,” that wording necessarily compares the image with something: the radiologist’s experience, other patients, cases already seen. That is valuable knowledge, but it is not necessarily stable from one reader to another, nor from one center to another.

The third trap is time. Atrophy is often more meaningful in “difference” than in a “snapshot.” Yet in real life, scans are not always performed on the same machine, with the same protocol, at regular intervals. Visually comparing two MRIs taken several years apart can be misleading: a change in sequence, a slight slice offset, a different image quality can create the illusion of progression or conceal real progression.

Finally, the fourth trap is the coexistence of lesions. Many older patients have, in addition to possible neurodegeneration, vascular damage: white-matter hyperintensities (often described by the Fazekas score), micro-infarcts, micro-hemorrhages. These elements can influence the overall appearance, complicate interpretation, and above all contribute to cognitive impairment. In real patients, the brain is often “mixed,” not textbook.

All of this makes one thing obvious: visual assessment is not “wrong.” It is simply limited, because it rests on a material reality that increasingly demands measurement.

The uncomfortable finding: our reports are often vaguer than our tools

Another dimension, discussed less publicly, concerns the radiology report itself. In many hospitals, brain MRI still results in a narrative, free-form description mentioning what appears important. That format has its advantages: it allows context, nuance, and prioritization. But it has a structural flaw: it is not standardized.

Recent work has studied precisely the gap between “narrative” reports, structured visual scales, and quantification software. A 2025 study published in Diagnostics suggests that everyday reports may diverge significantly from standardized assessments, particularly in early or subtle presentations, and that visual scales, though more standardized, may still underestimate atrophy compared with quantitative tools.

This is not an accusation against radiologists. It is the consequence of a system: we are asking a literary description to carry information that, from now on, ought to be quantified, comparable, and trackable.

The promise of measurement: when MRI becomes a “biology of the brain”

Measuring atrophy does not mean reducing a brain to a number. It means objectifying volumes, thicknesses, asymmetries, and placing them relative to a reference.

Concretely, automated volumetry often uses a 3D anatomical MRI (for example a T1 sequence) and segmentation algorithms that identify different structures: hippocampus, ventricles, lobes, cortex. The software then calculates volumes, sometimes cortical thickness, and compares those values with a normative database, taking age and often sex into account. This yields results such as: “hippocampus at the 3rd percentile” or “ventricular volume at the 90th percentile.” In other words, one no longer says merely “small” or “large,” but “smaller than 97% of people the same age,” or the reverse.

The idea of percentiles is not some exotic invention: it is exactly what pediatrics has long done with growth curves. But applying it to the brain profoundly changes the clinical conversation.

A 2021 study on norms for automatic estimation of hippocampal atrophy also recalls the common use of a threshold (for example below the 5th percentile) to consider hippocampal volume abnormal relative to a reference population.

Volumetry is not perfect, but it introduces something the eye alone does not possess: the ability to place a patient within a distribution, and then track movement within that distribution over time.

Visual rating versus volumetry: a rivalry that need not exist

The temptation is strong to frame the debate as a duel: the radiologist against the algorithm, the eye against the machine. Reality is more interesting: volumetry has long served to validate, refine, and frame clinical observation, not to replace it.

As early as 1999, studies were comparing visual rating of medial temporal atrophy with volumetric measurements, seeking to establish the validity of visual assessment against “absolute” volumes. More recently, longitudinal studies have compared the evolution of visually rated atrophy with the evolution measured by volumetry over several years. A 2020 study, for example, compared visual rating and volumetry over a period of about five years in an elderly population, focusing on what each method reveals about aging and pathology.

And in 2025, an exploratory study published in Scientific Reports asked precisely the question of “progression”: is expert visual assessment of worsening atrophy reflected by automated volumetric analyses on successive MRIs under real-life conditions? The simple fact that this question is being studied “in the real world” says a great deal: we are no longer dealing with a laboratory debate, but with the issue of routine clinical use.

What these studies suggest is less an opposition than a complementarity: measurement can confirm a pattern, or reveal a subtle deviation where the eye hesitates. And the eye remains essential for interpretation, because a number without context can lie.

When the visual scale becomes a world: the example of medial temporal atrophy

Alzheimer’s disease is often the entry point into this debate, because it has a relatively well-described pattern of involvement: a frequent beginning in the hippocampus and medial temporal structures, before spreading to other regions.

The MTA scale (Scheltens) was designed precisely to make that pattern readable. But its use reveals both the strength and the limitation of the visual approach.

Its strength is that it makes reading more systematic. Its use has been evaluated for reliability and validity in memory-clinic cohorts by comparing the score with quantitative hippocampal volumes and with clinical diagnoses.

Its limitation is that even with a scale, a part of the reader’s style remains. A 2022 study, for instance, showed that radiologists with varying levels of experience could achieve satisfactory overall agreement using the MTA scale on different modalities, but that performance varied between readers, possibly because of different rating styles.

That nuance is crucial: the scale improves standardization, but it does not abolish the human factor. And in neurodegeneration, the challenge is to reduce noise as much as possible, because the signal may be faint.

Moving beyond “moderate”: what quantification changes in concrete terms

This change is less spectacular than a new drug, but it transforms three key moments in the pathway: diagnosis, follow-up, and decision-making.

At the moment of diagnosis, quantification makes it possible to situate atrophy instead of merely qualifying it. It answers, at least in part, the question families ask themselves: is this “more than age”?

At the moment of follow-up, it makes it possible to objectify progression. In practice, many patients live in an in-between state: cognitive complaints, tests that are sometimes borderline, variable symptoms, anxiety, fatigue. Physicians must decide whether follow-up should be intensified, whether to investigate further (biomarkers, PET, lumbar puncture), whether reassurance is appropriate. A number does not decide the matter, but it helps move beyond approximation.

At the moment of decision-making, finally, it becomes a stratification tool. Not only for organizing follow-up, but also for access to certain research pathways, or even, in some countries, to certain treatments. In recent years, research frameworks have moreover evolved toward a more “biological” definition of Alzheimer’s disease based on biomarkers (amyloid, tau, neurodegeneration), which places structural imaging as one element among others in categorization.

Even if this biomarker framework remains more relevant in research than in routine care, it illustrates a broader movement: atrophy is no longer merely a clinical impression, but a measurable variable within a body of evidence.

A brain is not a growth curve: the pitfalls of quantification

Here again, moving beyond visual assessment does not mean falling into the illusion of the number. Volumetry carries risks of misinterpretation that must be clearly named, especially when addressing a non-specialist audience.

The first risk is technical. Measurements depend on MRI quality: patient motion, artifacts, resolution, sequences. They also depend on the software: different segmentation methods do not yield exactly the same volumes. At the individual level, small differences can be normal.

The second risk concerns longitudinal comparison. Comparing two MRIs obtained under different conditions can produce false differences. This is precisely why robustness “in the real world” is itself a research topic, as shown by the 2025 study on agreement between visual progression and volumetric progression in non-ideal follow-up settings.

The third risk is the risk of the norm. A normative database is never neutral: it depends on who was included, on how representative the populations are, on health criteria, on demographic factors. The percentile is useful, but it must be interpreted as a reference point, not a verdict.

The fourth risk is human, paradoxically. A number impresses. It can create the illusion of certainty. Yet atrophy is only one element among others. One person may have relatively marked atrophy and function well, while another may be deeply impaired with imaging that is relatively unremarkable. The brain is plastic; the human being is more complex than a volume.

To avoid these pitfalls, the best approach is not “visual or quantitative.” It is visual and quantitative, framed by clinical judgment.

What visual assessment misses most often: the beginnings, asymmetries, atypical profiles

There are three situations in which visual assessment is put particularly to the test.

The first is the beginnings. The earliest changes may be too subtle to be described with confidence. The radiologist senses something, but hesitates to write it down. It is often there that reports become cautious, and therefore vague.

The second is asymmetry. Some diseases (such as certain frontotemporal dementias) may begin asymmetrically, with one hemisphere more affected. The eye may perceive the asymmetry, but quantification can make it much more explicit.

The third is mixed profiles. An older patient with a cognitive complaint may combine moderate hippocampal atrophy, vascular white-matter hyperintensities, and micro-lacunes. In that mosaic, the eye may be captured by what stands out most and underweight the rest. Resorting to several structured markers, including visual scores and volumetric measurements, helps make the mosaic more readable.

It is no accident that multiple scales exist (medial temporal atrophy, posterior parietal atrophy, frontal atrophy, vascular damage). Some recent methods are in fact seeking to automate not only volumes, but also “visual scores,” in order to benefit from the framework of scales while reducing human variability. One well-known example is AVRA, proposed in 2019, which aims to produce automatic atrophy ratings comparable to those of radiologists.

The real challenge: standardize without dehumanizing

It would be easy to conclude: “let’s put volumetry everywhere.” But medicine is not deployed through slogans. It unfolds within constraints of time, budget, training, and responsibility.

The on-call radiologist does not always have access to a volumetric tool. The neurologist does not always have a harmonized MRI protocol. The patient moves. The machines are old. The results arrive at the wrong time. The report has to be signed out.

What is really happening resembles what has happened in other fields: a transition toward more standardized imaging, in which the report becomes less a piece of prose and more a decision-making tool.

Publications and practice feedback converge: the shift from free narration to more structured elements (scales, measurements, normative comparisons) increases consistency and aids interpretation, without removing the need for human synthesis.

The goal is not to produce incomprehensible reports, but on the contrary to make information easier to share. A patient has difficulty understanding “moderate atrophy.” The patient understands better “this volume is lower than what we would expect for your age,” provided it is said with tact and caution.

Moving beyond visual assessment means accepting a new responsibility

There is finally an ethical dimension. When we quantify, we change the status of the information. An adjective leaves room for interpretation. A number, by contrast, can become a label. It can be copied, interpreted out of context, become a source of anxiety.

That is why quantification must not be a “raw result.” It must be accompanied by an explanation: what it measures, what it does not measure, how robust it is, how it fits into the clinical story. Best practices also emphasize the need for imaging performed according to protocols suited to the cognitive work-up, and for high-quality clinical information provided at the time of the request, in order to guide interpretation.

Put plainly: the more we measure, the more we must explain. Standardization is not a simplification of care; it is a stronger demand.

A discreet but decisive transition

If we look at it coolly, visual assessment of atrophy is a historical solution to a modern problem: how, with limited means, to make a complex image speak in an uncertain clinical context.

But as neurodegenerative diseases become a major public-health challenge, as pathways become more structured, as medicine calls for trajectories rather than snapshots, the adjective is no longer enough.

Visual scales were a first step: they gave grammar to the gaze. Quantification is a next step: it gives that grammar a measurement, and above all a possibility of follow-up.

Moving beyond visual assessment does not mean abandoning clinical observation. It means framing it, making it shareable, and extending it through objective reference points. In a medicine where the brain fades by nuances, this is not a luxury. It is a condition for speaking accurately, deciding better, and sometimes for avoiding leaving patients alone with a word that is too vague.

For any suggestion or recommendation, please feel free to contact us at this email address:

communication@olea-medical.com