The Role of Big Data and Machine Learning in COVID-19

: The big rise in the existence of digital data contributed to creating many good chances, especially related to corporations, institutions and firms. Also, it gives the capability to scrimp data regarding its major or area, where the countries have benefited from the analysis of big data (BD) greatly in the face of epidemics and diseases, especially COVID-19 since BD is now available everywhere around us, from official reports and scientific studies related to virology and epidemiology. The general aim of this study is to clarify how the conjunction among both BD and machine learning (ML) created huge differences in data science and a big influence on the applications related to a lot of fields chiefly in COVID-19. The method which is used in this study ‘relevance tree’ by identifying papers related to ML and BD, especially in COVID-19. The results have been shown that the use of reinforcement learning in analyzing BD provides effective and tremendous results, although it faces many challenges and restrictions that have been explained in detail in this study. In addition, the results showed that most of the countries in the time of Corona turned into smart cities, totally dependent on smart applications based on the analysis of BD using ML, and one of the most important applications that were circulated around the world global positioning system. In addition to the results that have been found, data privacy is one of the most important challenges facing data analysis. Consequently, it recommended future researchers to focus on studying the challenges faced by ML in analyzing medical data in the COVID-19 era.


Introduction
In today's era of information and technology, the rapid explosion of data is quickly spreading everywhere, such as websites, social applications and smartphones in general. This huge role of data enforces us to think about the algorithms related to online learning and its frameworks. However, to pick an appropriate tool is something very sensitive and hard. For example, this new learning system needs a lot of preparations and requires many methods. This new globe has depended on data which became the main source to acquire knowledge and experience among people and countries. This new type of information has caused a digital war, especially when it comes to organizing information and protecting it. Therefore, the concept of big data (BD) becomes very popular to show both the arrangement of data and its big existence at random. The complexity of this BD obligates us to face all the difficulties. Relying on four kinds of Vs: velocity, variety, volume and veracity, we can consider them as the basic characteristics. Compared to other approaches such as traditional ways of computing, the method faces many problems where some examples are De Mauro et al. (2016) and Barker and Ward (2013). It is undeniable that this topic took the attention of researchers, although it has some problems such as access, process and storage. For instance, to know more about this method, researchers can use some works related to the works by Gandomi and Haider (2015), Khan et al. (2014) and Oussous et al. (2018).
The new technological devices create new types of developed approaches, such as machine learning (ML) and other file systems. To illustrate, Hadoop makes it easy to spread ML in outer libraries like the library of the Scikit Teaching resulting in controlling and mastering data (Shvachko et al., 2010). In these libraries, the methods of ML mostly depend on being classified in specific algorithms that are not suitable for BD. Nevertheless, other methods are used very well for BD in order to enhance the teaching such as deep learning that proved its skills.
The perfect field for BD is its participation in ML research (Al-Jarrah et al., 2015;Ma et al., 2014;Zhou et al., 2017). Both fields share the same major in computer science and data science. Both fields work side by side and combine their usefulness. The main goal of BD is to collect data, store it and analyze it. It aims to discover the invisible patterns and to help in making decisions. However, ML is done with the help of BD by analyzing it with algorithms that help in computing, and taking into consideration the previous experiments as mentioned by Zhou et al. (2017); many interests are shown towards ML by the support of BD.
The catastrophe of COVID-19 is considered as an emergency case that needs help from both national and international sectors. According to world reports, it caused more than 5 million infected cases with about 300,000 dead people almost in a hundred cities, precisely over May 2020 (NHC, 2020;WHO, 2020). This epidemic had a dangerous impact on both fields: social and economic improvements. In addition, on the 28th of February, the UN Secretary-General Guterres asked the governments, all over the world, to take steps towards controlling this virus, especially by making BD analysis (New. Cn, 2020). Therefore, to keep the life moving, a lot of companies, universities and research teams tend to build many information systems, for instance, 'Fever Clinical Queries', 'Passenger Information Queries' and 'Epidemic Map Displays'. They were built relying on a commercial kind of software that provided significant contributions to stop this virus, basically by depending on the artificial intelligence like ML (CAICT, 2020).
The huge increase in making data digitally participated is creating a huge number of chances. Firms and institutions can minimize data regardless of its area or field. The integration between ML and BD has caused a big difference in data science which was easier to work a lot of applications in many majors. This paper is presented in order to get a full reflection of the new research papers related to this topic. The aim of this review to clarify the difficulties faced by many ML methods in order to produce an affordable framework that can easily fit in the sector of BD analysis. In addition, it clarifies the role of BD and the importance of using ML in its analysis in defeating the COVID-19. We wish that the study can present the significance of ML in BD analysis in order to pass all the difficulties that it faced.

Related Studies
To solve problems in artificial intelligence, especially teaching without programming, ML is the best choice (Goodfellow et al., 2016;Murphy, 2012;Shalev-Shwartz & Ben-David, 2014). This multi major has dominated our life from all aspects. Many algorithms were put in ML in order to be precisely presented in three types: supervising, half-supervising, non-supervising and reinforcing learning. The methods of this major are depending on BD to participate in completing other sciences like engineering (Yi et al., 2014). It took the attention of most of the researchers increasingly from 2012, especially in science such as mathematics and August, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2, Supplementary 1 computer science in addition to statistics where its publishing has increased in the year 2016. The scientific research studies use tools that were provided by the Bibliometric (Cadez, 2013;Sweileh et al., 2017;Van Eck & Waltman, 2009). This new method proved its efficiency in helping decision-makers, researchers, and other jobs to find a good view of all the wanted sides. Some useful research was done especially related to the Bibliometrics specialized in big data (Singh et al. 2015;Mishra et al. 2016;Sivarajah et al. 2017). Also, some other majors too got use of the big data like; industry 4.0 (Muhuri et al., 2019), IOT publishing (Nobre & Tavares 2017), technological maps (Gerdsri et al., 2013), cloud computing (Heilig & Voß 2014), IT (Khaparde & Pawar, 2013), data mining or a big one (Tseng et al., 2016), and other fields like cybersecurity and machine learning (Makawana & Jhaveri 2018). Several studies and research studies have worked to clarify the importance of using ML of all kinds in analysing BD in all health, scientific, economic and political fields, and to develop our paper, many previous studies and research studies will be shown related to both BD analysis and ML. Table 1 provides a summary of the existing studies regarding ML for data analysis in BD.

Authors and Year
Aim of the Study Results of the Study Alloghani et al. (2020) The main aim of the methodical revision was analysing the research papers which had been published differently during the years 2015 to 2018. It was done to achieve the perfect application for the ML strategies related to finding solutions for many issues in the models.
It can be seen from the research papers, the decision making, the machine of supporting vectors, besides the algorithms of Naïve Bayes that they are considered as the most applied, augmented, constructed and observed in BD for learners. In addition, K_means, clustering hierarchy and the analysis of principal components are also considered as the most popular tools for learners in an uncontrolled or unobserved way. Qiu et al. (2020) The SE schemes are given a new definition in a practical shape with the help of safe outer resources of the electrocardiogram (ECG) data These SE schemes are not considered as the right models where they must be designed relying on a specific ranking of protecting the ECG data. This in which the mistrust atmosphere of BSN is relying on the ML. must be done in order to save it from any unlawful attack. This kind of protection is much required for saving the privacy of the sick person. Thus, many exams and tests were done to prove the efficiency and the practicability of the SE model. Calderón et al. (2019) In this study, the relatedness of analysis for both: observed sentimentality and the flow of communicating in public studies are discussed. Also, it reveals the application for the analysis of divided watched sentimentality.
This tool tackles the cons of any other principles related to the computational management of small data, especially in the digital environment such as social media. Also, it gives both communicating processes and public research a just view of the cutting-edge technology that can be applied in the field of social computation. Cavalcante et al. (2019) In this study, a crossbred method was created, besides applying the ML technique and mimicry. This applying is checked on the data drivers used in supporting decisionmakers precisely in soft selecting providers.
The main result of this study is increasing awareness about both ML and simulation in order to know the way of combining them and the right time to do such a thing. This combination is done to make digital equipping the Gemini series where it works to develop this softness or resilience. Sughasiny & Rajeshwari (2018) This study provides a full understanding of the significance of the feature selection techniques, the observed ML tools, the unobserved ML tools, and the BD for the healthiness' fields.
Relying on the exploration of many research papers, a recent model is created in order to expect the strictness and the rigidity of the illnesses through BD, data science, and the ML method. Mohammadi & Al-Fuqaha (2018) This paper highlights the difficulties of applying BD produced by ML in smart cities. Also, it describes the process of wasting the unclassified data.
As a result, the study aims on creating a half observed profound learning model that aims to present the difficulties and spotlight on the different fields' applications. Also, many challenges were presented to be August, 2020 Artificial Intelligence and Neuroscience Volume 11, Issue 2, Supplementary 1 6 supported by the most trending fields which can be worthy to make research studies that contain LM for the goal of creating a new smart city serving. Kibria et al. (2018) This paper contains the basic drivers for BD analysis by revealing the application process of ML, computational intelligence and AL. They play a very significant role, especially in the data analysis and precisely for the recent models of wireless networks.
The advantages and difficulties of operating the AL, LM and Wireless networks in the BD analysis are all discussed in these results. Chang et al. (2018) This paper was created for the ML techniques in order to increase their solitude capacity.
As a result, numerical studies were done to describe the various performances that can probably be verified.

The Method
In this section, the literature was reviewed in order to achieve the purpose of this paper, and this was done by identifying papers related to ML and BD in order to identify challenges and obstacles for further studies and research. The method which is used in this study 'relevance tree' (Saunders et al., 2009). This method assists to determine which keywords are relevant to the objectives and research question (Saunders et al., 2009). The following databases were researched: Google Scholar, Scopus, Web of Science and Wiley. When searching in databases, the search keywords option was chosen by using advanced search related to research questions and research goals. The applied terms are 'ML' combined with the words 'BD', 'artificial intelligence', 'Unsupervised ML', 'Supervised ML', 'Reinforcement ML' and 'COVID-19'.
During the search, the terms used were 'title, abstract and keywords' search in scientific journal papers. The last year has presented a plethora of studies on ML and BD that debate definitions, scopes, advantages, disadvantages and challenges. When studying the research, the focus was on the following group of topics: the first one is the definition of ML and BD and the second one is BD analysis and finally kinds of ML.

Results
In this section, the research questions were answered in detail.

Machine Learning and Big Data Analysis
The term BD was defined in many steps starting from being defined as 'volume'. Then, the words 'velocity' and 'variety' were put to express it. However, later on, it was known as 'veracity' (Fan & Bifet, 2012), and the term 'value' (Fan & Bifet, 2012;Demchenko et al., 2013) was also added. The definition is not an easy point where it really takes a huge effort, especially that it needs some processes such as making the visibility of detection easier and helping in also making decisions besides helping in data processing. However, the word 'value' has properties related to the wanted point which is mainly domineering the BD (Uddin et al., 2014). The goal behind the use of BD became well known, but its results rely on the development of the old ways or the new ones to control this data.
As a part of artificial intelligence, ML includes two steps: 'practicing' and 'experimenting' (Al-Jarrah et al., 2015). The main step presents the learning methods relying on the famous properties of the datasets. In addition, the second step tends to create some new expectations for the unknown properties relying on the information that was obtained in step number 1. Both the above-mentioned processes, 'practicing and experimenting' became known as 'learning' and 'expecting'. In fact, the job of ML aims to use a specific algorithm for learning through creating a specific sample to be exercised in predicting.
Thus, this whole process becomes a matter of prediction (Kolisetty & Rajput, 2020). Recently, a lot of researchers explained the ML difficulties related to BD (Najafabadi et al., 2015;Qiu et al., 2016), whereas others explained that are due to a specific technique (Najafabadi et al., 2015). The algorithms of ML are capable to enhance many types of learning strategies, such as 'Rule Learning', 'Instance-based Learning', 'Decision Tree Learning' and 'Collective Learning' (Qiu et al., 2016). The whole concept of algorithms is considered a reflection of the advancement.

Big Data Analysis
Business Intelligence is defined as a tool that is used basically for getting advantage of the BD strategies. These new strategies have already affected our current utilization. For instance, its suitability in realizing its classes' lineaments, the features of the parameters and the observing. All these aspects can help to address any problem facing this new technique.  Assunção et al (2015) tried to show the enhancement of this strategy and to express the best area to apply the BD in the cloud computing platform.
They classified BD analysis' solutions depending on the previous models of customers relying on the existing data models besides other models resulting in helping the decision-making.
Both personalized and no agreements may create many challenges in BD. Every agreement will extremely affect BD which creates some troubles in the acceptance through using three spaces of dimensions. However, improving the BD in classes asks for dividing some categories, which means that the process of enhancement will be very complicated. Therefore, any rise in the shapes of these levels or classes relies on users' learning and experience. As a result, dividing the BD into classes cannot be expected which makes the application of ML very hard, especially when it comes to the algorithms (Kolisetty & Rajput, 2020).
In addition, the agreement of properties participates in the BD hardiness. It is built by dividing the class forms in order to drop down the hardiness, especially in the area of data increase of dimensions.
Thus, they are extremely basic found principles that can solve issues related to scalability of BD forms besides its conformity which participates in controlling the data and analyzing it. It will contribute to increasing the size of data and increase the hardness in processing it precisely with all the modern technology used forms (Kolisetty & Rajput, 2020).
However, methods for BD processing data are summarized in some points: Some algorithms have done based on the UCI ML repository that aims to make enhancements like having a very elastic algorithm for having fast unobserved data learning. This algorithm was done by Xiang et al. (2018) using the approach of a two-stage unobserved multiple kernel learning machine. However, this experiment faced many difficulties like the high computational needed overhead (Xiang et al., 2018). On the contrary, using the UCI and biomedical repositories, algorithms can build a true computational form with high efficiency, especially by using the approach of 'Predictive modelling, Decision tree, Bayesian and Instance-based' (Liu et al., 2017). However, its limits are recorded by creating a big variability in data presentation and a huge variety in performances and honesty.
Also, some of the algorithms required solutions, especially for the hard categorized data with relationships of coupling classification and frequency such as the one that was done by Zhu specialized in harmonious metrical learning with couplings classification relying on 30 datasets taken from various fields (Zhu et al., 2018). On the other hand, this kind of algorithm has faced some limitations too such as its disability towards some data properties and controlled the knowledge.
Finally, other algorithms may work in this area like one done by reading using a strategy of deep learning (Read et al., 2015). This algorithm used real-world types of datasets. They were applied to improve the truthiness of famous existing shapes of data. Unfortunately, it has no obvious explanation of the higher dimensions of datasets when it comes to the properties of both reducing and division of labels.

Types of Machine Learning
ML is considered as a minor form of artificial intelligence that concentrates on learning methods on computers. This process of learning is classified into three main classes: supervised, unsupervised and reinforcement. In the next texts, those three basic tools are explained in detail besides some famous and common techniques in every division:

Supervise Learning
Referring to its title, in this type of learning, there must be an observer or a supervisor who gives the algorithms of learning their ideas precisely on good or bad decision-making or even actions. In this type, data are completely straining where the tools of learning can know if a specific tool action or decision is right or not with a correct percentage. The common algorithms related to this type are discussed in the following.
• First, the Support vector machine: This tool contributes to finding a volume of N-dimensional hyper inside the N space of dimension in order to divide the whole data sets in a group of N properties (Al-Zoubi et al., 2018).
• Second, the Random forest: This tool consists of multi decision trees that were made and mixed by using a factor of merit in order to provide a true division with the expected percentage (Alian et al., 2014).
• Third, the Neural network: This tool was done from unpretentious neurons organised inside many classes and related to each other through a group of weights. It mimics the way that the natural and biological neurons normally work in a full imitation (Azzini & Tettamanzi, 2011).

Unsupervised Learning
This kind of learning is not categorized which indicates that the algorithm needs to work hard in order to identify itself. The algorithms in this type must know the structures and shapes of data and their relations besides their shapes and properties. First, the K-means clustering (Barabasi & Albert, 1999) is defined as the process of dividing the data to many groups by algorithms where these clusters have some related properties in common.
• Self-organizing Neural Networks (Chawla, 2009): This is the NN type. This strategy works to arrange neurons for the goal of decreasing any mistake or error in their function designing for each trouble and problem.

Reinforcement Learning
This tool contains an algorithm that distinguishes between both cases correct or wrong, to be awarded in case correct and published in case of being a false one. Such a kind of learning mimics the same way that creatures or humans precisely get knowledge depending on two types of processes: rewarding and punishing. Here are some examples to make my point crystal clear. First, the Q learning (Chen & Chen, 2014), which depends on Bellman balance and reduces the Q-rate. Second, the Deep Q Network (DQN) (Chen et al., 2005), similar to the previous one, has the same process but with some differences such as the capability of being generalized. Third, Deep Deterministic Policy Gradient (DDPG) (Chowdhury, 2010) is the same as DQN, unlike it can find solutions for troubles that have a continuous space of actions.

The Restrictions of Big Data Analytics
Applying the BD can include many looking forward wishes. Unfortunately, it is not a method that has indefinite features because to make a lot of analysis means that the limits of such data abilities can be known and taken into considerations (Wang et al., 2015). Here are some limits for done experiments of some users with a data explorer used for the first time.
First, Data Misinterpretation where this data can discover the behaviors of the users. On the other hand, it may not know the reason behind these actions and behaviors. However, the way the data are represented in a wrong way can lead the users in a very wrong direction, especially when it comes to their way of making their jobs and business-like getting used of the beneficial information in business areas. Also, relying on the current data to create a new formula of possibilities can probably lead firms to take steps against the right actions relying on a miss-taken relevance. Thus, to clarify the expected engagement and aiming to find solutions for the true troubles with a highlight of the data supporting can be much different in collecting and explaining the data (Kolisetty & Rajput, 2020). Second, Security Limitations where BD has some challenges related to security. Firms that gather data hold a really important responsibility to secure and save this data. The results of the data breaking can contain lawsuits, paying fines or gaining a bad reputation. The issues related to security and protection can affect the capability of processing data. For instance, the analysis of data done by any other organization may be very complicated because the data can be affected by a firewall to hide it or any other private server of the cloud ones. This creates many problems, especially related to involving and moving data to be processed in a reliable way (Kolisetty& Rajput, 2020). Third, the Outlier Effect is considered to be the third main part which is common in processing data, especially if the user failed or if he/she searched for something new in the searching engine which makes some partial results. In fact, technology is incapable to gather data fully and truly. Nevertheless, the algorithms related to Google and its limited expectations for research and their results contributed to making this project a failure one (Kolisetty & Rajput, 2020). Fourth, Organizations that possess huge data facing a big challenge, which is the extent to which these organizations are able to control the diverse and unorganized BD, as storing, managing and utilizing these data in an optimal way is a real problem (Sharma et al., 2020).

The Importance of Machine Learning in Big Data
ML hires an algorithm in order to reveal the undiscovered knowledge but without any programming. Using ML includes having frequent combinations where these new forms and samples aim to highly and separately adapt any insecure or opening to BD. Thus, with the help of technology, especially computers, ML now developed a modern shape better than the past one. Nowadays, ML algorithms have become capable of doing many complicated and hard performances related to computation in order to handle the BD analysis.
To illustrate, the focus of ML in BD is capable of being discovered in: 'Google's self-driving car'. In addition, the ML apps using BD can figure out some recommendations and other systems of business online, such as Netflix and Amazon (Kolisetty & Rajput, 2020). Moreover, it is used in the process of the text data especially in many social media inputs, such as FB and Twitter. Finally, ML is able to process BD in order to expect the discovery of faking processes in specific majors like the financial one and in the privacy or security system (Kolisetty & Rajput, 2020

The importance of big data and machine learning in the face of COVID-19
The world nowadays faces huge threats because of virus Covid-19 all over the world. This virus is a danger that threatens the whole of humanity. Thus, to stop it, all countries must cooperate. Also, to conquer this epidemic, it was very significant to transform many overpopulated cities to be 'Smart' ones such as China, Kuria, and other cities by using the smart applications in most fields to keep going in this period (Allam & Jones, 2020).
The use of these applications reduces overcrowding which contributes to limiting the spread of this virus. In addition, it must be mentioned that beating this epidemic can be achieved not only by abolishing it but also by maintaining the continuity of the countries' different fields, such as economic, education and trading (Pandey et al., 2020).
Thus, the significance of BD, its analysis, the apps of artificial intelligence and ML has appeared, where the analysis of BD can be used by ML in many cities around the world to stop the spreading of the virus. For instance, one application that was activated is the tracking systems via GPS which works to stabilize people to avoid the places that contain infected persons (Green, 2020).
In addition, these applications can observe the suspected people that are put in a compulsory home quarantine to ensure their commitment (Engle et al., 2020;Wang et al., 2020). Moreover, ML is used to expect the places where the epidemic has spread in (McCall, 2020). Therefore, because of the importance of investing data and data analysis, the OSTP, Office of Science and Technology Policy, in the White House has launched a huge open-source data centre (CORD19).
Many academic and governmental institutions have participated in it, besides many other organizations specialized in artificial intelligence, national health and dozens of other institutions . One of these companies that are specialized in analyzing and studying medical data using artificial intelligence, precisely the ML, is the company of BLUEDOT. It is located in the Canadian city: Toronto (Tuite et al., 2020).
Eventually, we notice the significance of BD that the world produces every day. Also, it is obvious that ML is important in analyzing this data and in getting advantages from it, especially in providing better medical services and conquering this catastrophe as well. Moreover, analyzing this data correctly will help presidents and leaders to make correct and reasonable decisions in a suitable time to slow the spread of this virus.

Conclusion
The BD analysis was defined as a process of explaining and practicing examining huge datasets. The unstructured data provide a very important chance for most aspects and fields. Nevertheless, a lot of this flatness is not efficient computing: scalable or practical. These study findings showed first that analyzing data using ML is of great importance and represents a great reflection of progress in the future since it contributes greatly to decision-making and second it clarified the role of the algorithms used in each type of ML in data analysis. Third, it showed that of the most common constraints faced data analysis using ML is the wrong analysis or erroneous prediction of data as it produces many major problems. Fourth, the results illustrate the most important ML applications that use BD, and last but not least, BD and ML are able to make a lot of efforts to combat the COVID-19, such as creating interactive dashboards, analyzing epidemiological models and suggesting the best vehicles to help access virus treatments. The most important future recommendations proposed by this research are to find good solutions to the challenges and obstacles faced by ML in analyzing BD such as facilitating the process of accessing the data to be analyzed and reducing the time it takes to retrieve data, finding ways to maintain data privacy and also attention to data quality, governance and management. Also, the focus should be on activating the role of ML more in confronting the COVID-19 epidemic, because of its great effectiveness.