首页
快讯
关注
资讯
- 健康
- 科技
- 热点
- 视频
- 产业
- 政策
- 护理
- 投资
- 医改
- 养老
- 疫情
- 人物
- 医保
- 疾病
- 管理
- English
- 临床
- 心血管
- 肿瘤
- 内分泌
- 妇儿
- 感染
专题
专区
知识

欢迎登录体验更多功能

搜索

学术分享|使用多任务深度神经网络进行大规模胃癌筛查和定位

2022

04/24

江丰生物智慧病理

A-

A+

为减少病理医生的工作量，病理图像分析已成为广泛研究的焦点。

01 Introduction

Gastric cancer remains important cancer worldwide and is responsible for over 1, 000, 000 new cases in 2018 and an estimated 783, 000 deaths (equating to 1 in every 12 deaths globally), making it the fifth most frequently diagnosed cancer and the third leading cause of cancer death. Biopsy of the gastric mucosa is one of the most effective methods of early detection of gastric cancer. It is estimated that there are hundreds of millions of gastric biopsy slides need to be examined in China each year, while the number of certified pathologists is only about 10 thousand, which causes excessive workloads on these pathologists.

01 介绍

胃癌仍是世界上的重点癌症之一。2018年，超过100万例新发胃癌病例，估计78.3万例死亡(相当于全球每12例死亡中就有1例为胃癌)。这使其成为第五大最常见癌症和第三大致死癌症。胃粘膜活检是早期检出胃癌最有效的方法之一。据估计，中国每年需检测的胃活检切片数以亿计，而已注册病理学家的数量仅1万左右，这导致病理学家工作量过重。

To reduce the workload of the pathologists, there are extensive studies focus on pathology image analysis. In recent years, deep neural network techniques have achieved remarkable performance on a wide range of computer vision tasks, such as image classification, object detection, semantic segmentation, etc. These techniques have been applied in automated pathology image analysis in the past few years. Unlike natural images, digital pathology images, named whole-slide images (WSIs), are extremely large whose width and height often exceed 100, 000 pixels. On the other hand, histological diagnosis requires high accuracy since it is commonly considered as the gold standard. As a result, some of the studies focus on the selected regions of interest (ROIs), while there are several attempts on analyzing WSI.

为减少病理医生的工作量，病理图像分析已成为广泛研究的焦点。

近年来，深度神经网络技术在实现计算机视觉任务上表现卓越，例如图像分类，目标检测，语义分割等。近年来，这些技术已被应用于病理图像自动分析。与一般图像不同，全视野切片数字成像(WSIs)的数字病理图像非常大，其宽度和高度往往超过10万像素。另一方面，组织学诊断准确性要求较高，因为它通常被认为是金标准。

因此，一些研究侧重于挑选部分感兴趣区域(ROIs)，当然也有一些研究更侧重于WSI的全片分析。

Besides the difficulties in applying the deep neural network on gigapixel resolution images, the main challenge in examining the WSIs is that the diagnostic results labeled by the pathologists are usually on the slide level in most of the publicly available datasets, while the lesion regions that draw the pathologists’ attention are extremely small compared with the size of the WSI. It is tough to train a deep neural network to locate those regions and make the correct decision only using slide level labels such “positive/negative”. Therefore, we collect a large dataset that not only has the slide level annotation but also carries the lesion region annotation and design a framework leveraging the detailed supervised information.

除了在10亿像素分辨率的图像上应用深度神经网络有困难之外，全视野切片数字图像全片检测的主要挑战是，在大多数公开可用的数据集中，病理学家标记的诊断结果通常是切片级别的，而引起病理学家注意的病变区域与WSI全片大小相比非常小。训练一个深度神经网络来定位这些区域，并仅仅使用“阳性/阴性”这样的切片级别标签来做出正确的决定是很困难的。

因此，我们收集了一个大数据集，其中不仅包含切片标注，还包含病变区域标注，并设计了一个充分利用详细监督信息的框架。

To our best knowledge, there have been no studies on automated pathology image analysis with lesion region annotation for gastric cancer. We propose an automated screening framework that could not only provide the screening results, i.e., positive/negative, but also show the suspicious areas to pathologists for further reference.

据我们所知，目前还没有关于标注胃癌病变区域的自动病理图像分析的研究。我们提出了一个自动化筛查框架，不仅可以提供筛查结果，如阳性/阴性，还可以显示可疑区域，供病理学家进一步参考。

87761650765831674

Our main contributions:

1We collect a large-scale dataset for gastric cancer screening and develop a semi-automated annotation system to help obtain the detailed lesion region annotation.

2 We take advantage of the region annotation by proposing a multi-task network structure which could provide the classification label (screening result) as well as the segmentation mask (suspicious region) simultaneously.

3 We design a practical framework consisting of 3 networks to process the high-resolution WSIs, and employ the deformable convolution operation based on the observation of the characteristics of the pathology images.

我们的主要贡献

1 我们收集了一个大规模的数据集用于胃癌筛查，并开发了一个半自动标注系统以便于获取详细的病变区域标注。

2 我们利用区域标注的优势，开发了一个多任务网络结构，可以同时提供分类标签(筛选结果)和分割掩码(可疑区域)。

3我们设计了一个由3个网络组成的实用框架来处理高分辨率的全视野数字切片，并在观察病理图像特征的基础上采用了可变形卷积运算算法。

63361650765846804

Evaluation in real-world scenario

Also, we test our best model on our large-scale real-world set collected from 4 medical centers. Table 2 shows the numbers of images of the collected data.

在现实场景中的评估

此外，我们还在4个医疗中心收集的大型真实数据集上测试了我们的最佳模型。表2显示了所收集图像数据的数量。

All the training images with lesion region annotation are from SGH in the year 2018. Besides those training images, we further collect 3, 207 images in that year, 3, 670 images in 2019 as the most recent samples, and 2, 083 images in 2015 as the old samples since they did not use too many automated devices for fixation, sectioning, and staining at that time. Moreover, to test the generalization ability of our proposed model, we collect 1, 356 images from 3 other hospitals, i.e., SHSSD, SDCH, and ZPPH. The devices and procedures in making histology slides are different in these hospitals which may affect the final WSIs. Overall, we have 10, 315 WSIs from 4 hospitals and years, and the positive ratio is less than 5%.

所有标注病变区域的训练图像均来自2018年的SGH。除了这些训练图像，我们还进一步收集了2018年的3207幅图像，2019年的3670幅作为最近的样本，2015年的2083幅作为老的样本，因为它们当时没有使用太多的自动化设备进行固定、切片和染色。此外，为了检验我们提出的模型的泛化能力，我们收集了其他3家医院(SHSSD、SDCH和ZPPH)的1,356张图像。这些医院制作组织学切片的设备和程序不同，可能会影响最终的全视野数字切片图像（WSI）。总体而言，我们用来自4家医院和4年的10, 315个WSI进行训练，其阳性率低于5%。

We apply our best model on these data and present sensitivity, and specificity in Table 3. We achieve the sensitivity of 100.00% in 5 testing set, 99.67% in the other, and the specificity is around 75% in most of the cases.

我们把我们最好的模型应用到这些数据中，表3为其灵敏度和特异性。我们在5个测试集中达到了100.00%的敏感性，在另一个测试集中达到了99.67%，在大多数病例中特异性在75%左右。

However, the data distribution in the real-world scenario is different from our training and validation dataset, i.e., less positive samples and more outliers such as out of focus samples.

然而，真实场景中的数据分布与我们的训练和验证数据集不同，即，较少的阳性样本和更多的异常值，如样本模糊不清。

Moreover, the procedures in slide preparation also cause the differences. The sectioned tissues of SDCH are thicker than others, and they are stained much darker, which lead to the lowest specificity. WSIs from SHSSD are more blueish than SGH. Besides SGH, ZPPH also uses the machines for automatic sectioning and staining, so the performance on ZPPH data is higher than the other two institutes.

此外，制片工艺也造成了差异。SDCH切片组织较其他组织厚，染色较深，特异性最低。SHSSD的切片比SGH更蓝。除SGH外，ZPPH还使用了自动切片和染色的机器，因此ZPPH对数据的处理性能高于其他两家。

36231650765874375