Key morphologic features leading to discrepancies between AI and radiologist recall in breast cancer screening

Key morphologic features leading to discrepancies between AI and radiologist recall in breast cancer screening

Product: Transpara Company: ScreenPoint Medical


An Exploration of Discrepant Recalls Between AI and Human Readers of Malignant Lesions in Digital Mammography Screening

diagnostics, 2025

Abstract

Background

The integration of artificial intelligence (AI) in digital mammography (DM) screening holds promise for early breast cancer detection, potentially enhancing accuracy and efficiency. However, AI performance is not identical to that of human observers. We aimed to identify common morphological image characteristics of true cancers that are missed by either AI or human screening when their interpretations are discrepant.

Methods

Twenty-six breast cancer-positive cases, identified from a large retrospective multi-institutional digital mammography dataset based on discrepant AI and human interpretations, were included in a reader study. Ground truth was confirmed by histopathology or ≥1-year follow-up. Fourteen radiologists assessed lesion visibility, morphological features, and likelihood of malignancy. AI performance was evaluated using receiver operating characteristic (ROC) analysis and area under the curve (AUC). The reader study results were analyzed using interobserver agreement measures and descriptive statistics.

Results

AI demonstrated high discriminative capability in the full dataset, with AUCs ranging from 0.903 (95% CI: 0.862-0.944) to 0.946 (95% CI: 0.896-0.996). Cancers missed by AI had a significantly smaller median size (9.0 mm, IQR 6.5-12.0) compared to those missed by human readers (21.0 mm, IQR 10.5-41.0) (p = 0.0014). Cancers in discrepant cases were often described as having 'low visibility', 'indistinct margins', or 'irregular shape'. Calcifications were observed in 27% of human-missed cancers (42/154) versus 18% of AI-missed cancers (38/210). A very high likelihood of malignancy was assigned in 32.5% (50/154) of human-missed cancers compared to 19.5% (41/210) of AI-missed cancers. Overall inter-rater agreement was poor to fair (<0.40), indicating interpretation challenges of the selected images. Among the human-missed cancers, calcifications were more frequent (42/154; 27%) than among the AI-missed cancers (38/210; 18%) (p = 0.396). Furthermore, 50/154 (32.5%) human-missed cancers were deemed to have a very high likelihood of malignancy, compared to 41/210 (19.5%) AI-missed cancers (p = 0.8). Overall inter-rater agreement on the items assessed during the reader study was poor to fair (<0.40), suggesting that interpretation of the selected images was challenging.

Conclusions

Lesions missed by AI were smaller and less often calcified than cancers missed by human readers. Cancers missed by AI tended to show lower levels of suspicion than those missed by human readers. While definitive conclusions are premature, the findings highlight the complementary roles of AI and human readers in mammographic interpretation.

Read full study