The Allure and Limits of “H&E → Transcriptome” Models
Why Tissue Imaging Must Move Beyond Predicting What’s Already Known
Recent years have seen a surge of papers claiming that deep learning on routine H&E-stained slides can reliably predict underlying gene expression or even immune phenotypes. Models like HE2RNA or HistoTME train on paired histology and transcriptomics to infer dozens of immune and tumor markers from morphology alone. These approaches are impressive at rediscovering known expression patterns (e.g. virtual CD3/CD20 maps). But this trend has a catch: it repackages existing transcriptomic information rather than revealing new biology. H&E images are used to regress gene-signature levels, yet tissue morphology harbors far richer signals, cell contacts and shapes, textures and spatial organization, that go beyond bulk RNA profiles. In practice, even the best “H&E→gene” models achieve only modest correlations (e.g. ~0.5 Pearson on held-out data), highlighting that much of the tumor microenvironment remains hidden when we limit our analysis to gene-expression alone.
Missing the Biological Big Picture
H&E slides are invaluable for pathology, but they cannot distinguish many immune subsets or states. For example, CD4+ versus CD8+ T cells (or B cells) are virtually indistinguishable by routine morphology. In a recent study pairing same-slide H&E and high-plex IF, researchers showed that “CD4+ and CD8+ T cell and B cell lineages look similar by H&E but were distinguishable by IF”. Many functionally critical features, T cell exhaustion, macrophage polarization, follicular dendritic niches, simply leave no telltale pattern in H&E. Likewise, spatial context is lost when inferring genes: knowing that a region has high CXCL9 or PDCD1 expression tells us nothing about where in the tumor immune cells are clustering or what exactly they are interacting with. In short, H&E-based gene prediction ignores the cellular neighborhoods and morphological phenotypes that define the immune microenvironment.
Immune cell morphology: Neutrophils, eosinophils, mitotic figures and stromal cells can be spotted by H&E, but lymphocyte subtypes and activation states cannot.
Spatial interactions: Immune infiltrates can form clusters or niches (e.g. tertiary lymphoid structures, immunosuppressive perivascular cuffs) that have prognostic importance. H&E alone has no way to highlight these patterns, whereas multiplex imaging readily identifies them.
Microenvironment context: The tumor-stroma boundary is a dynamic interface, but “cold” versus “hot” tumors cannot be fully characterized without marker data. Simply predicting gene levels from morphology omits how cells are organized into functional units in space.
Because of these limits, an H&E-based model may correctly infer that IFNG or CD8A are elevated in a sample with many lymphocytes, but it misses why, for example, that those T cells are exhausted and sequestered away from tumor nests (as in the CODEX study below). In effect, these models often rediscover known correlates of gene signatures instead of uncovering new biology. As one recent benchmarking study notes, image-based gene prediction “highlight[s] areas that can be addressed to support the advancement of this emerging field,” implying that existing tools are still rudimentary.
Lessons from High-Plex Imaging
To unlock hidden signals in the tissue, researchers are now turning to high-plex spatial methods that directly capture protein or RNA markers in situ. Technologies like CODEX (iterative barcoded fluorescence), MIBI/IMC (multiplex ion beam or mass cytometry), and spatial transcriptomics reveal dozens to hundreds of markers at single-cell resolution. These methods preserve the where and who of each cell, exposing the microenvironment’s architecture. For example, Goltsev et al. (Cell 2018) used CODEX to map mouse spleens in lupus, finding “a profound impact of the cellular neighborhood on the expression of protein receptors on immune cells.” In other words, local cell–cell interactions dramatically changed marker expression, patterns impossible to infer from H&E alone. They observed “extensive and previously uncharacterized splenic cell-interaction dynamics in the healthy versus diseased state” (Fig. 1) , demonstrating that multiplex imaging can uncover biology invisible to standard stains.
Similarly, MIBI was introduced by Angelo et al. (Nat. Med. 2014) precisely to “provide new insights by integrating tissue microarchitecture with highly multiplexed protein expression patterns”. In early studies, MIBI imaged ~10–40 markers on breast tumors and clearly delineated how immune cells were arranged relative to tumor nests. For example, Keren et al. (Commun. Biol. 2021) used Imaging Mass Cytometry on triple-negative breast cancer to profile 36 proteins. They found that B cells co-occur with CD4+ and CD8+ T cells and that cells self-segregate into “tumor–tumor” vs “immune–immune” neighborhoods. Such compartmentalization echoes findings from spatial transcriptomics, but it was directly visualized at subcellular scale. In hepatocellular carcinoma, Sheng et al. applied a 36-plex IMC panel to discover distinct tumor–stroma–immune clusters: Kupffer macrophages (liver resident cells) accumulated at the tumor edge and created a local immunosuppressive niche (high PD-1 on adjacent T cells). These are exactly the sort of “hidden layers” one can only see with multiplex imaging.
Figure: Multiplexed imaging pipelines convert high-dimensional tissue scans into spatial features. (Adapted from Keren et al.) In a MIBI study, each slide yields hundreds of ‘channels’ (one per antibody). Image stacks (a) are segmented into single cells (b), tessellated into Voronoi plots (c), and used to compute cell–cell interaction matrices (e–f) for downstream analysis.
These examples underscore that context matters. In a 2024 Cancer Research study, Marchesi et al. combined AI-based H&E segmentation with high-plex IMC on lung cancer. While their AI accurately picked out tumor vs stroma, the multiplex data revealed new immune features, dozens of macrophage subtypes and spatial distance measures, that were “previously unexplored”. They point out that tools like CODEX and IMC allow “multiple markers on the same slide and [let us] appreciate a previously unexplored immune heterogeneity” (e.g. local density, distance metrics). In other words, only by staining for many proteins simultaneously do we learn that, for instance, PD-1^hi T cells cluster in specific niches interacting with myeloid cells, knowledge that H&E cannot provide.
Spatial Transcriptomics: Genes in Their Place
Protein imaging isn’t the only new frontier. Spatial transcriptomics (e.g. 10x Visium, MERFISH, SeqFISH) maps RNA at micron-scale, marrying gene and image data. By retaining location, it answers questions no bulk assay can. In oral and breast cancers, spatial transcriptomics has uncovered conserved “core” vs “edge” expression programs, revealed pockets of immune suppression at the tumor boundary, and linked spatial gene patterns to patient outcomes. These studies show that the same gene signature can have different meaning depending on its location. Importantly, many of the top transcripts driving prognosis are immune-related: for example, one study found gradients of CXCL10 and HLA genes peaking at tumor–immune borders. None of that spatial nuance would be captured by simply predicting CXCL10 from an H&E texture.
Recent benchmarks also highlight that spatial information adds value. A 2025 review noted that while dozens of methods now infer “spatial gene expression” from histology, their translational utility depends on incorporating location-specific biology. In practice, many spatial inference models still struggle to predict survival or treatment response, suggesting that novel biology lives beyond what models can learn from H&E alone. By contrast, true spatial assays are beginning to reveal entirely new cell states and interactions. For example, multiplex IF of cutaneous T-cell lymphomas (56 markers) discovered regulatory T cell neighborhoods undetectable by any bulk signature. Comprehensive multi-modal atlases are now cataloging how immune, stromal, and tumor cells cohabit, resources impossible to build by image alone.
Implications for Drug Discovery and Precision Medicine
All of the above matters for translational research. If we rely only on H&E-derived gene surrogates, we risk overlooking therapeutic targets and biomarkers tied to spatial cell behaviors. High-plex imaging has already identified candidate immune checkpoints and cytokine niches that guide therapy. For instance, Patwa et al. showed that spatial patterns of PD-1/PD-L1 and IDO interactions, quantified by MIBI, predicted outcomes in TNBC beyond any single marker. Multiplex platforms are revealing drug resistance pathways: one CODEX study linked exhausted T cell clusters to therapy-resistant tumor subclones. In early-phase trials, companies are using spatial proteogenomics to find combinatorial signatures of response that H&E cannot suggest.
In short, the tumor microenvironment’s complexity can inform drug discovery in ways that go well beyond gene lists. It tells us where to hit, which cell niches to target with bispecific antibodies, where to deliver cytokines, or how to remodel suppressive regions. The founders of MIBI even envisioned it as “valuable for basic research, drug discovery and clinical diagnostics” because it links microarchitecture with molecular expression. That vision is coming true: pharma pipelines increasingly leverage multiplex imaging to stratify responders or find new checkpoint combinations. For example, imaging mass cytometry revealed that responders to certain immunotherapies had higher densities of a macrophage subset in direct contact with T cells, a detail that bulk RNA-seq would entirely miss.
Moving Beyond Old Paradigms
The bottom line for immunologists and biotech leaders is that H&E-based AI should not be an end-goal, but a stop-gap. Predicting gene expression from morphology is an interesting exercise, but it merely scratches the surface. The real breakthroughs will come from assays that generate richer phenotypic readouts. That means investing in multiplex IHC/IF, imaging mass cytometry, spatial transcriptomics, and even emerging metabolomic imaging. These technologies demand more upfront effort than sliding a digital slide into a neural net, but they yield fundamentally new insights.
Indeed, a consortium effort recently proposed that spatial multi-omics studies and deep cell phenotyping are “key to understanding molecular principles in health and disease”. As one leader notes, “ultra-deep cell phenotyping” can uncover “clinically relevant tumor components” that drive immunosuppression or metastasis. Leveraging these methods will position biotech R&D at the forefront of precision oncology, for example, by discovering a novel T-cell exhaustion signature or spatial biomarker that guides combination therapy. In contrast, clinging to H&E-to-gene tricks risks re-using old data in new packaging without real added value.
In summary, while AI on H&E is a hot topic, the field must pivot from indirectly inferring biology to directly measuring it. Multiplex spatial profiling is revealing layers of immune context that are invisible in standard pathology. By focusing on those rich signals, from cellular neighborhoods to multi-gene spatial programs, scientists will unlock deeper immune insights and drive more effective drug discovery.
Sources: Recent literature in spatial biology and digital pathology (Schmauch et al. 2020 , Patkar et al. 2024 , Marchesi et al. 2024 , Angelo et al. 2018 , Angelo et al. 2014 , Keren et al. 2021 , Scheuermann et al. 2024 , among others).


