Implementing and Experimenting with the Multiscale Multi-head Self-attention Ensemble Network for cancer detection
Introduction
Breast cancer is the most diagnosed cancer and one of the leading cancer-related deaths among women. To obtain definitive diagnosis a probe is extracted from the patient´s tissue. Through a process called digital pathology scanning, recording the tissue at a very high resolution, Whole Slide Images (WSIs) are acquired. Skilled physicians are able to make a diagnosis based on examining these images. By leveraging deep learning techniques this process can be assisted or even automated. To this end R. Ge et al. propose the Multiscale Multi-head Self-attention Ensemble Network. It is a heterogeneous deep ensemble learning approach. The intermediate feature vectors produced by VGG16 and DenseNet121, pretrained on the ImageNet1k dataset, are combined using a self-attention layer, followed by gloabal avearge pooling. Deep ensemble learning techniques show better generalization capabilities in general.
The model is trained on the PCam benchmark dataset, a binary classification task, images are labeled as 1 if cancerous tissue is present, 0 else. This dataset comprises 400 WSIs from Radboud University Medical Center (RUMC) and University Medical Center Utrecht (UMCU). The dataset underwent expert pathological analysis for the extraction and labeling of diagnostic patches.
Model Architecture
Model Training and Evaluation
As a loss function Binary Cross Entropy is employed (BCE). Given a model and data points , the loss function is computed as follows
To update the weights of the model the Adam optimizer is employed and gradients are computed using backpropagation. The authors propose a custom learning rate schedule
where , , is the current epoch and is the overall number of epochs.
In order to assess the performance of the model, 7 metrics are employed. Critical for the calculation of these metrics are the following four quantities.
The true positives (TP), that is the number of positive samples in the test set, that have been correctly classified as positive by the model.
The true negatives (TN), the number of negative samples, that have been correctly classified as negative.
The false positives (FP), the negative samples that have been falsely classified as positive by the model.
The false negatives (FN), the positive samples that have been falsely classified as negative.
- ROC - AUC,
Precision measures the proportion of positive predictions that are actually correct
Sensitivity,
Specifity,
F1 - score,
B - acc,
MCC,