Hybrid Convolutional Neural Network-Based Diagnosis System for Intracranial Hemorrhage

: Early diagnosis of intracranial hemorrhage significantly reduces mortality. Hemorrhage is diagnosed by using various imaging methods and the most time-efficient one among them is computed tomography (CT). However, it is clear that accurate CT scans requires time, diligence, and experience. Computer-aided design methods are vital for the treatment because they facilitate early diagnosis of intracranial hemorrhage. At this point, deep learning can provide effective outcomes through an automated diagnosis way. However, as different from the known solutions, diagnosis of five different hemorrhage subtypes is a critical problem to be solved.This study focused on deep learning methods and employed cranial computed tomography scans in order to detect intracranial hemorrhage. The diagnosis approach in the study aimed to detect five subtypes of hemorrhage. In detail, EfficientNet-B3 and ResNet-Inception-V2 architectures were used for diagnosis purposes. Eventually, the study also proposed a two-architecture hybrid method for the diagnosis purpose. The obtained findings by the hybrid method were evaluated in terms of a comparative perspective.Results showed that the newly designed hybrid method was quite effective in terms of increasing classification rates of detecting intracranial hemorrhage according to the subtypes. Briefly, an accuracy of 98.5%, which is higher than those of the EfficientNet-B3 and the Inception-ResNet-V2, were obtained thanks to the developed hybrid method.


Introduction
Intracranial hemorrhage is a type of bleeding, which is caused by the rupture of a blood vessel in the brain. That situation results to the leaking of blood into the brain tissue or the space between the cortex and the skull. So, it is a serious health problem that requires rapid and intensive treatment as there may be different reasons (trauma, stroke, aneurysm, vascular malformations, high blood pressure, illegal drugs, and blood clotting disorders) for that and may result in serious health problems and even death (Gardner et al., 2012). There are five different subtypes of intracranial hemorrhage as intraventricular (bleeding into the ventricles inside the brain), intraparenchymal (bleeding into the brain tissue), subarachnoid (bleeding into the space between two of the membranes surrounding the brain), subdural (bleeding into the space between the outermost meninx and the arachnoid), and epidural (bleeding into the space between the outermost meninx and the skull) (Gardner et al., 2012).
Intracranial hemorrhage correspond to approximately ten percent of all strokes in the U.S. Stroke is the fifth-leading cause of death in the U.S, as resulting 129,000 deaths each year (Jauch et al., 2013). Essential imaging methods used in the diagnosis of intracranial hemorrhage are positron emission tomography (PET), cerebral angiography, computed tomographyangiography (CT-A), magnetic resonance imaging (MRI), and magnetic resonance angiography (MRA) (Muir et al., 2006). Although there is a remarkable variety of imaging methods as it can be seen, all of them take considerably long time to be processed. For example, cerebral angiography takes about three hours. Therefore, cranial computed tomography is the best method for the diagnosis of acute intracranial hemorrhage because it takes only 1 to 5 minutes (Phong et al., 2017). Computed tomography can detect hemorrhage for more than 98% of patients, as considering the first two days of hemorrhage. However, the duration of diagnosis depends on how long it takes the expert to complete and interpret a cranial CT scan (Medicine Hospital, n.d.).
In recent years, deep learning has been very popular in classification and segmentation of medical images. Since its first appearance in the context of artificial intelligence field, it has been proven to be very effective in medical image oriented applications. Therefore, there is also a growing body of research on the use of deep learning in classification of intracranial computed tomography scans (Web of Science, n.d.). Arbabshirani et al. (2018) (Szegedy et al., 2017) deep learning architectures for intracranial hemorrhage detection and its five subtypes. The study compared the high-dimensional dataset and the latest deep neural network architectures and eventually, proposed a two-architecture hybrid method. The developed hybrid method was compared with alternative archictures and the results showed that this newly designed method / architecture was more successful than the latest deep neural network architectures.

Dataset
The dataset employed in this study was derived from Kaggle, which is an open-source database provided by the RSNA. The dataset consists of 752.803 head CT scans of six labels, which are respectively no hemorrhage and five subtypes of hemorrhage (subdural, epidural, subarachnoid, intraparenchymal, and intraventricular) (RSNA Intracranial Hemorrhage Detection, n.d.). Figure 1 shows some CT scans from the dataset and the distribution state according to the classes. The dataset was divided into two groups: training (90%) and test (10%). However, as seen in Figure 1, the dataset has imbalanced distribution. Epidural images are very few. For this reason, another test was conducted by reducing the number of images of

Image Preprocessing
Image processing phase had several steps to prepare the images for better diagnosis. In this context, the scans were pretreated several times to train the deep learning architectures. First, they were resized to 300x300 for EfficientNet-B3 (Tan & Le, 2019) and 299x299 for Inception-ResNet-V2 (Szegedy et al., 2017) models, as considering the input layer dimensions of the deep neural network architectures. Following to that, as suggested by Chilamkurthy et al. (2018), the entire dynamic range of CT densities was defined as three separate windows, which are brain (length: 40 -weight: 80), subdural (length: 80 -weight: 200), and soft tissue (length: 40 -weight: 380). That is because a fracture in a bone window indicates the presence of an extra hemorrhage in a brain window, or a fracture in a subdural window indicates the presence of a hemorrhage, which is indistinguishable in a skull and normal brain window. Before the architectures were trained, the training images were flipped horizontally and vertically for the data augmentation.

Deep Learning with Transfer Learning
Deep learning is a branch of machine learning that is run with algorithms similar to the deep hierarchical architecture of the brain. Deep learning is based on deep artificial neural networks (Şeker et al., 2017). Neural networks have numerous layers and even layers within layers, hence owning the name of deep. There is a growing body of research on deep learning methods in the field of biomedicine ( Figure 2) because they are more time-and cost-effective and also better in the detection and diagnosis of diseases than state-of-the-art methods (Web of Science, n.d.). In the context of this study, two ready-to-use CNN-based deep neural network models were trained with ImageNet (n.d.) and eventually made ready accordingly for object classification. These deep CNN architectures were used to solve the problem characterized by a transfer learning method. By allowing for faster training and better performance, transfer learning has much better results because of the size of the dataset. Transfer learning is a solution that focuses on storing knowledge gained while solving a problem and applying that knowledge to a different but related problem. In other words, transfer learning is a deep learning method by which an artificial neural network model trained for a task is redesigned for a different but related task. In order to classify brain CT scans, this study employed two most recent CNN models trained with the ImageNet (n.d.), which is one of the largest image databases. The two CNN models employed in this context were EfficientNet-B3 (Tan & Le, 2019) and Inception-ResNet-V2 (Szegedy et al., 2017) respectively. After the training phase, the classification performance of the two models was separately compared, and the class-based average values of their classification probabilities were compared.

EfficientNet-B3
At the International Conference on Machine Learning (ICML) in 2019, Google introduced a CNN-based EfficientNet neural network architecture with a new structural approach (Tan & Le, 2019). A convolutional neural network (CNN) is known as a powerful class of deep neural networks, especially in image processing applications. A CNN architecture consists of input and output layers and intermediate layers, which are also known as hidden layers, as located between them. The main intermediate layers in a CNN architecture are convolution, pooling, and fully connected layers. Essential functions of these layers are as follows: The convolution layer is the layer where an activation map is generated by addition and multiplication operations via filters from the input data. In other words, it is the layer where feature extraction takes place (Stanford.edu, 2018). The pooling layer is the layer where nonlinear subsampling is performed, and the number of parameters is reduced for ensuring a simpler output (Stanford.edu, 2018). The fully connected layer converts the data from the previous layer into a one-dimensional matrix, in order to make the data fully connected to all neurons in the next layer (Stanford.edu, 2018). The fully connected layer generally precedes the classification layer, which is the last layer of a CNN architecture. To date, the depth, that is, the number of layers has been increased to improve the performance of all architectures. However, that situation increased the cost of computing. So, because the accuracy reached saturation after a certain point, there was no increase in the success level.
The proposed EfficientNet-B3 structure is much better than other CNN architectures in increasing success without increasing the depth. However, the main feature of the architecture is that the model referred to as compound scaling increases not only the depth but also the parameters of width and resolution (Tan & Le, 2019). Figure 3 compares the proposed architecture with the other CNN architectures and shows that the EfficientNet (Tan & Le, 2019) performed better despite having fewer parameters. The EfficientNet (Tan & Le, 2019) consists of eight models from B0 to B7, with each subsequent model number indicating higher accuracy. Figure 4 shows the EfficientNet-B0 (Tan & Le, 2019) architecture. The model of choice in this study has been EfficientNEt-B3(300x300) (Tan & Le, 2019) because it has input dimensions similar to those of the other CNN architecture: Inception-ResNet-V2 (299x299) (Szegedy et al., 2017), as allowing us to ignore the effect of resolution while comparing the architectures. Due to its input size and number of parameters, the EfficientNet-B3 (Tan & Le, 2019) model is different from the B0 model. Both models have seven blocks, each of which has a different number of mobile inverted bottleneck convolution (MBConV). Therefore, the parameter size of the B0 model is 5.3M while that of the B3 model is 12M. In the training phase of the EfficientNet-B3 (Tan & Le, 2019) model, different parameters were used, and the most successful values were chosen accordingly. Here, the Adam was used as the optimization algorithm with a learning rate of 0.0001, batch size of 16, and binary cross-entropy as the loss function, whereas the sigmoid function was the activation function in the final classification layer. 10-fold cross-validation was also used within the related process.

Inception-ResNet-V2
The Inception-ResNet-V2 (Szegedy et al., 2017) is a CNN architecture based on the combination of Inception structure and the Residual connection, as trained with more than a million images in the ImageNet (n.d.) database. The network employs 164 layers deep and comes with the learned rich feature representations for different images, thanks to the diversity of the training set. The input image size for the model is 299x299. Figure 5 shows the basic architecture of the Inception-ResNet-V2 (Szegedy et al., 2017). In the Inception-Resnet-V2 (Szegedy et al., 2017) blocks, multi-dimensional convolution filters are combined with residual connections. Residual connections do not only prevent distortion caused by deep structures but also shorten the training time. Parameters, which were selected for the training of the Inception-ResNet-V2 model, were similar to those in the EfficientNet-B3 model. The Adam was used as the optimization algorithm with a learning rate of 0.0001, and batch size of 32, with binary cross-entropy as the loss function, and the sigmoid as the activation function in the final classification layer. 10-fold cross-validation was also used within the process.

Recommended Inception-ResNet-V2 and EfficientNet-B3 based hybrid model
Both architectures were trained separately with the same training set. The mean probability values for each class in the final sigmoid classification layer of both architectures were calculated for obtaining new probability values. Figure 6 shows the flow chart for the proposed hybrid architecture. Additionally, Figure 7 shows the block diagram for the proposed hybrid model. As seen in Figure 7, EfficientNet-B3 has seven blocks, by also having a different number of mobile inverted bottleneck convolution. The Inception-ResNet-V2 has 3 Inception blocks and 2 Reduction blocks. On the other hand, input size of the EfficientNet-B3 is 300x300 while the Inception ResNet-V2 ensures the input size of 299x299.  Table 1 compares the obtained results for the three methods of the study. 10-fold cross-validation was used, therefore the related results are the lowest, highest, and average accuracy values for 10-fold of each model. The success criteria of F1 score (equation 5), precision (equation 4), sensitivity (equation 3), specificity (equation 2), and accuracy (equation 1) were used to measure the success of the proposed hybrid architecture. (1)

Findings and Discussion
The value of TP (True Positive) is used when a person with a disease is classified as a patient. FP (False Positive) is used when a healthy person is classified as a patient. Additionally, the value of TN (True Negative) indicates the healthy person classified as healthy, and the FN (False Negative) is used when a person with a disease is classified as healthy. Accuracy is the ratio of the number of correctly classified images to the total number of images. Precision refers to whether images classified belong to the class to which they are referred. Specificity is the correct classification rate of 'no hemorrhage' images. Sensitivity is the correct classification rate of 'hemorrhage' images. Based on these criteria, the deep learning models were applied to 75280 CT scans of six labels; no hemorrhage and subdural, epidural, subarachnoid, intraparenchymal, and intraventricular hemorrhage. Table 1 shows the three models' results, while Figure 8 and Figure 9 show the results' bar graph. Information in Figure 8 belongs to Test Dataset 1, and information in Figure 9 belongs to Test Dataset2.  Figure 8, and Figure 9). However, the EfficientNet-B3 (Tan & Le, 2019) and Inception-ResNet-V2 (Szegedy et al., 2017) based hybrid model provided the best results with accuracy, F1 score, specificity, sensitivity, and precision of 0.9859, 0.8732, 0.9952, 0.8314, and 0.9119, respectively for Test Dataset 1 and had the best results with accuracy, F1 score, specificity, sensitivity, and  Sensitivities were lower than the other values, which was expected because there are five hemorrhage subtypes, and an image in the dataset has more than one kind of hemorrhage. The bar graph in Figure 11 could support these results. The lower non-hemorrhage accuracy value in Figure  11 indicates that the sensitivity value is low. Specificities were higher than the other metrics because the sizes of the classes in the dataset were not equal. There were quite small differences between the minimum and maximum values, indicating that the models are consistent. Furthermore, in Test Dataset 2, sensitivity, f1 score and precision increased while accuracy and specificity decreased. No-changes have been made except balanced data distribution in Test Dataset 2. This result shows that this value change is due to the balanced data distribution. Confusion matrixes were created for each class to analyze the classbased performance of the hybrid model, which was proposed in the study. A confusion matrix is a table that reports the 'Actual' and 'Predicted' class labels (presented in Figure 10). The results of the proposed hybrid model are presented separately for each class in Table 2, and the bar graph showing the accuracy values for each class is illustrated in Figure 11. The confusion matrix values in Table 2 correspond to the average of 10 folds.  The results of Test Dataset1 in Table 2 and Figure 11 show that the hybrid model shows a lower performance in patients without hemorrhage while the hybrid model makes the estimation of patients with epidural hemorrhage close to 100%. This can be interpreted as the number of images without hemorrhage is much higher than the others, while the number of epidural test images is low. If the model created is evaluated in general for Test Dataset1, the model can diagnose non-hemorrhage images with 97.2% accuracy (Figure 11). When we examine the hemorrhage classes' success, epidural has the best accuracy. However, since the number of epidural images is low, it will not be correct to put the epidural performance in this comparison. The non-epidural hemorrhage classes have a more balanced data distribution. Therefore, when an evaluation is made between them, the accuracy and sensitivity values are descending order intraventricular, intraparenchymal, subdural, and subarachnoid. The fact that all the obtained accuracy value is over 98% is a remarkable indicator of success. The results in Table 3 and Figure 12 show hybrid model performance for Test Dataset 2. As was mentioned above, Test Dataset 1 has imbalanced distribution. This situation prevents objective evaluation. For this reason, the system was tested again by creating a dataset with more balanced distribution. Test Dataset 2 consists of 314 epidural, 500 nonepidural hemorrhage, and 2500 non-hemorrhage images. Test Dataset 2 results show an average %2.8 fewer accuracies. However, when we compare the performance among classes, the same ranking is obtained with Test Dataset1. When we sort according to the highest to the lowest accuracy values, it is again epidural, intraventricular, intraparenchymal, subdural, subarachnoid, and non-hemorrhage. The best estimation is epidural hemorrhage with %98.15 accuracy. On the other hand, testing with a more balanced data set increased sensitivity values. In some aspects, this study is superior to previous ones (Table 4). For example, many classification studies use images for the diagnosis of only intracranial hemorrhage. Very few studies focus on diagnosing different subtypes of hemorrhage. Most of these studies have used a different dataset than ours. Therefore, it would not be correct to compare the performance results of all studies directly. However, Salehinejad et al. (2021) and Burduja et al. (2020) used the same data set in the context of their proposed studies. Salehinejad et al. (2021) developed a machine learning model using SEResNeXt-50 and SEResNeXt-101. They achieved 98.3%, 98.8%, and 98.0% for accuracy, sensitivity, specificity, respectively. When the results are compared with the method we propose, it is seen that our accuracy and specificity values are better than theirs. Burduja et al. (2020) developed a hybrid model using SEResNeXt-101 and Bidirectional LSTM. Their created model achieved average performance values of 94.7% accuracy, 75.6% sensitivity, and 97.2% specificity. Class-based performance values in the study were averaged for comparison. When the results are compared with the method we propose, it is seen that our accuracy, sensitivity, and specificity values are better than theirs.
If we evaluate the proposed model in general, we can say that this study's proposed hybrid model is one of the leading values in diagnosing brain hemorrhage.

Conclusion
Intracranial hemorrhage is an important public health problem, which is leading to high rates of death all over the world. Therefore, early detection of that health problem is too critical for reducing the mortality rate. Analysis of CT scans plays a crucial role in the diagnosis of intracranial hemorrhage. This study, therefore, proposed a system that uses CT scans to detect intracranial hemorrhage and subtypes of intracranial hemorrhage. The system has a hybrid model consisting of EfficientNet-B3 and Inception-ResNet-V2 and their combination. The proposed hybrid model has an accuracy of 98.5%, which is higher than those of the EfficientNet-B3 and Inception-ResNet-V2. The size of the dataset (752.803 images) was also instrumental in the success of the CNN architectures. The literature comparison also shows that the proposed method has great success.
In sum, the proposed deep learning method is successful and consistent, and the dataset used for its training is large. Therefore, it is a promising emergency diagnostic tool that could help healthcare professionals easily overcome clinical problems. This can be an essential step in solving the problem of early detection of intracranial hemorrhage and can take the burden of CT reporting away from specialists. On the other, the related positive results have encouraged the authors to think about some future works to see if the study can open new doors. In this context, there will be some works to search for any alternative hybrid model formations improving the obtained results. Also, the current method will be used in alternative datasets to have deeper analyze of the solution. Finally, possibility of including the method in an Internet of Health Things (IoHT) system setup will be considered in the context of a wider medical project.

Compliance With Ethical Standards
Funding: Funding information is not applicable / No funding was received.
Conflict of Interest: Author Murat Saribas declares that he has no conflict of interest. Author Beyza Guluzar Ciltas declares that she has no conflict of interest. Author Sezin Barin declares that she has no conflict of interest. Author Gur Emre Guraksin declares that he has no conflict of interest. Author Utku Kose declares that he has no conflict of interest.
Ethical Approval: This article does not contain any studies with human participants performed by any of the authors. The used dataset is an open-source material provided in the context of RSNA Intracranial Hemorrhage Detection | Kaggle.