【学术分享第1期】利用深度学习对非小细胞肺癌组织病理学图像

2022

04/07

江丰生物智慧病理

A-

A+

首先，我们开发了能将数字切片图像自动分为正常肺、肺腺癌或肺鳞....

97841649315168698

ResultsA deep-learning framework for the automatic analysis of histopathology images.

The purpose of this study was to develop a deep-learning model for the automatic analysis of tumor slides using publicly available whole-slide images available in TCGA and to subsequently test our models on independent cohorts collected at our institution. The TCGA dataset characteristics and our overall computational strategy are summarized in Fig. 1. We used 1,634 whole-slide images from the Genomic Data Commons database: 1,176 tumor tissues and 459 normal tissues (Fig. 1a). The 1,634 whole-slide images were split into three sets: training, validation and testing (Fig. 1b). Importantly, this ensures that our model is never trained and tested on tiles obtained from the same tumor sample. Because the sizes of the whole-slide images are too large to be used as direct input to a neural network (Fig. 1c), the network was instead trained, validated and tested using 512×512 pixel tiles, obtained from non overlapping ‘patches’ of the whole slide images. This resulted in tens to thousands of tiles per slide, depending on the original size (Fig. 1d).

结果组织病理学图像自动分析的深度学习框架。

本研究的目的是利用TCGA（Grossman, R. L. et al. Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375, 1109–1112 (2016).）中公开提供的全视野数字切片图像，开发一种用于自动分析肿瘤切片的深度学习模型，并随后在我们收集的独立队列中进行测试。TCGA数据集特征和我们总体的计算策略如图1所示。我们使用了来自基因组共享数据库的1634张全视野数字切片图像：1176个肿瘤组织数字切片和459个正常组织数字切片(图1a)。1634张全视野数字切片图像被分成三组：训练、验证和测试(图1b)。重要的是，这确保了我们的模型从未在取自同一肿瘤样本的图块上进行训练和测试。因为整个数字切片图像的尺寸太大，不能直接用于神经网络训练、验证和测试(图1c)，所以我们使用512×512的像素图块取而代之。这些图块是来自于完整数字切片图像的非重叠图块，而每张切片根据其不同的原始尺寸，有数千到数万个图块(图1d)。

61201649315188637

Based on the computational strategy outlined in Fig. 1, we present two main results. First, we develop classification models that classify whole-slide images into normal lung, LUAD or LUSC with an accuracy that is significantly higher than previous work (AUC of 0.97 compared to 0.75 and 0.83 and comparable to results from pathologists. Unlike previous work12,13, the performance of our classification models was tested on several independent datasets: biopsies and surgical resection specimens either prepared as frozen sections or as FFPE tissue sections. Second, starting with the LUAD regions as predicted by the LUAD versus LUSC versus normal classification model, we utilize the same computational pipeline (Fig. 1) to train a new model in order to predict the mutational status of frequently mutated genes in lung adenocarcinoma using whole-slide images as the only input. The entire workflow of our computational analysis is summarized in Supplementary Fig. 1.

基于图1中概述的计算策略，我们给出了两个主要结果。首先，我们开发了能将数字切片图像自动分为正常肺、肺腺癌或肺鳞癌的分类模型，其准确性显著高于其他的模型（AUC为0.97，相比之下其他模型分别为0.75和0.83），与病理学家的诊断结果类似（注：AUC越接近1.0，检测方法真实性越高）。不同于之前的模型，我们的分类模型是在几个独立的数据集上进行测试的：活检切片，手术切除标本做成的冷冻切片，和福尔马林固定石蜡包埋的组织切片。其次，从训练肺腺癌区域开始，包括正常肺、肺腺癌或肺鳞癌的区域，我们使用相同的计算线路（图1）和多张数字切片图像来训练新模型，以预测经常性突变基因在肺腺癌的突变状态。我们计算分析的整个工作流程总结在图1中。

Deep-learning models generate accurate diagnosis of lung histopathology images. Using the computational pipeline of Fig. 1, we first trained inception v3 to recognize tumor versus normal. To assess the accuracy on the test set, the per-tile classification results were aggregated on a per-slide basis either by averaging the probabilities obtained on each tile or by counting the percentage of tiles positively classified, thus generating a per-slide classification. The former approach yielded an AUC of 0.990, and the latter yielded 0.993 for normal-versus-tumor classification, outperforming the AUC of ~0.85 achieved by the feature-based approach of Yu et al.12, of ~0.94 achieved by plasma DNA analysis14 and comparable or better than molecular profiling data .

深度学习模型生成了肺组织病理学图像的准确诊断。使用图1的计算线路，我们首先训练inception v3（Google的Inception卷积神经网络第三版）来识别肿瘤与正常组织。为了评估测试集的准确性，每个图块的分类结果以每张切片为基础进行聚合，方法是平均每个图块的概率，或者计算阳性图块的百分比，从而对切片进行分类。前者方法的AUC为0.990，后者方法的AUC为0.993，优于基于特征模型的AUC 0.85，优于或等同于分子图谱数据的AUC 0.94。

Next, we tested the performance of our approach on the more challenging task of distinguishing LUAD and LUSC. To do this, we first tested whether convolutional neural networks can outperform the published feature-based approach, even when plain transfer learning is used. For this purpose, the values of the last layer of inception v3—previously trained on the ImageNet dataset to identify 1,000 different classes—were initialized randomly and then trained for our classification task. After aggregating the statistics on a per-slide basis (Supplementary Fig. 2b), this process resulted in an AUC of 0.847 (Supplementary Table 1); i.e., a gain of ~0.1 in AUC compared to the best results obtained by Yu et al.12 using image features combined with random forest classifier. The performance can be further improved by fully training inception v3, leading to an AUC of 0.950 when the aggregation is done by averaging the pertile probabilities (Supplementary Fig. 2c). These AUC values are improved by another 0.002 when the tiles previously classified as ‘normal’ by the first classifier are not included in the aggregation process (Supplementary Table 1).

接下来，我们在区分肺腺癌或肺鳞癌上测试了我们的方法。为了做到这一点，我们首先测试了即使是在使用纯迁移学习的情况下，卷积神经网络是否能优于基于特征的方法。为此目的，inception v3最后一层的值(之前在ImageNet数据集上训练以识别1000个不同的类)被随机初始化，然后对分类任务进行训练。将每一张切片的统计数据进行汇总后，该过程的AUC为0.847；即与使用图像特征结合随机森林分类器获得的最佳结果相比，AUC增益约为0.1。通过全训练inception v3可以进一步提高性能，平均每个图块概率进行聚合，AUC为0.950。如果先前被第一个分类器归类为“正常”的图块不包含在聚合过程中，那么AUC值又提高了0.002。

We further evaluated the performance of the deep-learning model by training and testing the network on a direct three-way classification into the three types of images (normal, LUAD, LUSC). Such an approach resulted in the highest performance with all the AUCs improved to at least 0.968 (Supplementary Fig. 2d and Supplementary Table 1). In addition to working with tiles at 20× magnification, we investigated the impact of the magnification and field of view of the tiles on the performance of our models. As low-resolution features (nests of cells, circular patterns) may also be useful for classification of lung cancer type, we used slides showing a larger field of view to train the model by creating 512-×512-pixel tiles of images at 5× magnification. The binary and three-way networks trained on such slides led to similar results (Supplementary Fig. 2e,f and Supplementary Table 1).

通过训练和测试网络对三类图像(正常，肺腺癌，肺鳞癌)进行的分类，我们进一步评估了深度学习模型的性能。这种方法下，最高性能的AUC至少提高到0.968，除了使用20倍放大的图块外，我们也调查了图块的放大和视野对我们模型性能的影响。由于低分辨率特征（细胞巢、圆形图案）也可能对肺癌分类有用，我们也使用更大视野的切片来训练模型，即创建5倍放大下512×512像素的图块。在这些切片上训练的二向和三向网络也得到了相似的结果。

Visualization of features identified by the three-way classifier in high-confidence tiles.

We present examples of LUSC and LUAD slides, together with heatmaps generated by our algorithm, in which the color of each tile corresponds to the class assigned by our algorithm (LUAD, LUSC or normal), and the color shade is proportional to the classification probability. The LUSC image shows most of its tiles with a strong true positive probability for LUSC classification, while in the LUAD image, the largest regions indeed have strong LUAD features, with normal cells on the side (as confirmed by our pathologist), and some light blue tiles indicating the existence of LUSC-like features in this tumor.

三类分类器识别高置信度图块特征的可视化

我们也给出了肺腺癌和肺鳞癌切片的例子，以及由我们的算法生成的热力图，其中每个图块的颜色对应了我们的算法分配的类别（正常，肺腺癌，肺鳞癌），颜色阴影与分类概率成正比。肺鳞癌图像显示，其大部分的图块为真阳性肺鳞癌可能性较大。肺腺癌图像显示，大多区域确实有明显的肺腺癌特性，也有一些正常细胞（据我们的病理学家证实），和表明在这个肿瘤中有疑似肺腺癌特性存在的一些淡蓝色图块。

For tiles associated with LUSC, we note a predominance of areas of keratinization and dyskeratotic cells as well as rare foci of cells with prominent intracellular bridging. Among the tiles denoted LUAD, the predominant feature noted is the presence of distinct gland forming histological patterns, such as lepidic and acinar (well differentiated) and micropapillary (poorly differentiated). These include well-differentiated patterns (lepidic and acinar) as well as poorly differentiated types (micropapillary). At the center of the t-SNE, regions that cannot be clearly associated with either LUAD or LUSC are composed of tiles with conspicuous preservation artifact, minute foci of tumor, or areas of interstitial/septal fibrosis. Then, the area designated as normal is composed of tiles showing benign lung parenchyma, focal fibrosis or inflammation, as well as rare LUAD with preservation artifacts. Interestingly, the area with tiles which could not be designated normal, LUAD or LUSC with high confidence, shows both benign and malignant lung tissue in a background of dense fibrosis and/or inflammation.

对于与肺鳞癌相关的图块，我们注意到角化区和角化不良细胞数，以及细胞内桥突出的罕见病灶细胞量居多。肺腺癌图块的突出特征是，其存在不同腺体形成的组织学形态，如鳞状和腺泡状（分化良好）和微乳头状（分化不良），其中包括高分化型（鳞状和腺泡）以及低分化型（微乳头状）。在T-分布邻域嵌入算法中心，有一些区域不能与肺腺癌或肺鳞癌明确关联，这些区域由具有明显保存痕迹的图块、微小的肿瘤病灶或间质/间隔纤维化区域组成。正常组织区域由良性肺实质细胞、局灶性纤维化或炎症的图块，以及少数保存痕迹图块组成。有趣的是，不能被高度确定为正常、肺腺癌或肺鳞癌的图块区域，在致密纤维化和/或炎症背景下同时具有良性和恶性的肺组织存在。

【欢迎关注订阅号KFBIO2021获取更多学术内容，“学术分享“板块仅供公益交流，不代表本订阅号立场】