Deep learning to find pneumonia virus host: a “small step” in AI “evolution”

On January 24th, Zhu Huaiqiu, a professor at the School of Engineering of Peking University, published a paper entitled “Deep Learning Algorithms Predicting the Host and Infectivity of New Coronavirus” on the bioRxiv preprinted platform, stating that bats and mink may be two potential hosts of the new coronavirus. Mink may be the intermediate host.

According to the research by Zhu Huaiqiu’s team, the new type of coronavirus has a consistency of up to 96% with the RaTG13 coronavirus present in Yunnan chrysanthemum bat; In addition, the structure predicted by the VHP (virus host prediction) method developed based on deep learning shows that the mink virus The infectious pattern is closer to the new coronavirus.

It is reported that in this research, the team used AI technology based on deep learning models to find virus hosts. This may be the first time in China to use deep learning AI to achieve results in the study of new coronaviruses in 2019.

AI joins the front line to fight the epidemic, deep learning to find virus hosts

After the emergence of a previously unknown new virus, it is important to identify the host of the virus. Due to the complex and diverse nature of the virus, the viruses known to humans and their understanding of the virus itself are far from adequate. Most human-hosted viruses usually pose a threat to human life and safety, and only then attract more attention.

For viruses that are not hosted by humans, they may mutate suddenly, or they can infect humans through intermediate hosts. Therefore, it is of great significance to quickly find the host for identifying unknown viruses, to help humans understand the interaction between the virus and the host, and to deal with potential threats such as sudden mutations, so as to prevent and control the virus in a targeted manner.

In order to detect the potential host and pathogenicity of new viruses, the traditional method is based on comparing the DNA sequences of new viruses with known virus gene sequences by establishing a viral gene library, and comparing the local similarity of viral DNA, thereby Make vague predictions of new virus hosts.

In the research and prediction of the 2019 new coronavirus host, the Zhu Huaiqiu team constructed a VHP algorithm model to retrieve and compare the genome of the new coronavirus with the existing virus gene database. With the support of computing power, a wide-area search of viral gene data through deep learning models can be used to find and predict the natural host of new coronaviruses.

VHP model calculates infectivity of new coronavirus

Zhu Huaiqiu’s team published a paper published on the bioRxiv preprint platform: “In order to construct a VHP model, we used a two-way convolutional neural network to predict viral sequence hosts; we divided the virus hosts into five types, including plants , Bacteria, invertebrates, vertebrates, and humans; input virus nucleotide sequences, and VHP models based on deep learning will output 5 types of results for each host type, respectively, reflecting the new type of coronavirus infection in each type Sex. “

Based on the analysis of the calculated results of the VHP model, the virus hosts screened include dogs, pigs, marten, turtles and cats. After analysis and comparison, researchers believe that the infectious pattern of mink’s virus is closer to that of the new coronavirus.

In fact, compared to traditional AI machine learning methods, the models trained by the AI ​​deep learning method can be applied to a variety of different types of data, and can also combine data from multiple sources to complete a task together.

In genetic data, not all data has accurate high-quality data labels, and by generating models in depth, even data without high-quality labels can be fully used, so that the model can continuously improve performance.

Therefore, from the perspective of the types of AI deep learning, in addition to the common supervised and unsupervised learning, semi-supervised learning and reinforcement learning are more suitable, and they require more attention from the medical and biological communities.

Deep Learning AI + Medical: Broad Application Prospects but Limitations

In the application scenario of AI, the medical industry is one of the industries with the broadest application prospects. In the field of bioinformatics, drug research and development by pharmaceutical companies, health data collected by medical equipment, diagnosis of patients, and determination of treatment options all have application requirements for deep learning AI.

The essence of deep learning is a complex AI learning algorithm. At present, the most widely used applications of deep learning are in the fields of computer vision and language recognition. Among them, computer vision technology also has certain applications in the medical field, such as medical image recognition.

However, the application of deep learning in the medical field also faces the limitations of practical applications. One of them is the lack of explanatoryness of the analysis process. In fact, deep learning is also essentially a type of statistical learning. Through the aggregation and retrieval of known data, the optimization of the algorithm can achieve the prediction of a certain result.

That is to say, the results obtained by deep learning algorithms are probabilistic predictions of the results under the existing data conditions, and cannot give a “solution process” but only the results. This also makes certain inevitable deviations in the actual results.

Taking this new type of coronavirus host study as an example, after the results of the VHP model calculation are given, the virus hosts screened include dogs, pigs, minks, turtles and cats. The researchers still need to make further conclusions after comparative analysis: The infectious pattern of the virus is closer to the new coronavirus.

The power of technology also needs to “cross over prejudice”

In addition, if the input data sample itself has a “big data bias”, the model calculation will amplify this “bias” and affect the accuracy of the results in real scenarios.

For medical AI based on deep learning, such a situation is also difficult to avoid 100%, especially in the face of complex and huge medical data, the results of such “bias” are difficult to accept.

Therefore, for the implementation of deep learning AI in the medical field, in addition to the problems to be solved by the technology itself, the butterfly effect caused by technology should also receive more attention.

On the bright side, the landing of deep learning AI in the medical field is not only a “recipe” for supplementing high-quality medical resources. At the same time, the application of new technologies such as deep learning AI and big data will also help people face ” Sudden infectious diseases such as the “new coronavirus” give technology the power.

05We will live in an age of analyzing all data

“The era of big data” author Victor Meyer-Schoenberg predicts: “In the era of big data, we can analyze more data, and sometimes even process all the data related to a particular phenomenon, Instead of relying on random sampling. “

In the data age, the advancement and development of AI deep learning and algorithms, and big data will make mankind usher in a brand-new era. In the face of the raging virus, mankind will not remain indifferent. In the current difficult times of the outbreak of new coronaviruses, it is even more necessary for people to be more confident, with more tenacious courage and wisdom, to face the challenges of new viruses!

Leave a Reply

Your email address will not be published. Required fields are marked *