Google announced on Wednesday (28) AlphaGenome, a new artificial intelligence tool aimed at analyzing the human genome, focusing on understanding how extensive regions of DNA influence gene regulation within cells. The initiative seeks to advance beyond simply reading the genetic code, investigating the mechanisms that control when, where, and how genes are activated or silenced in the organism, explains Folha de São Paulo.
During the presentation of AlphaGenome in the journal Nature, Pushmeet Kohli, vice president of research at Google DeepMind, emphasized that the complete sequencing of the human genome, completed in 2003, was only the first step. “Decoding the full human genome in 2003 gave us the book of life, but reading it remains a challenge,” he stated. According to him, although the genetic text is available—formed by about 3 billion pairs of nucleotides—understanding its “grammar” remains one of the great frontiers of science. “We have the text, but understanding the grammar and how it governs life is the next great frontier of research,” Kohli said.
Most of human DNA does not directly code for proteins. Only about 2% of the sequences perform this essential function for the operation of living organisms. The remaining 98% play a complex regulatory role, acting as a conductor that coordinates, protects, and adjusts gene expression in each cell. It is precisely in these regions that numerous variants associated with diseases are concentrated, and it is this territory that AlphaGenome aims to explore.
The new model complements other tools developed by Google’s artificial intelligence lab, such as AlphaMissense, focused on analyzing coding DNA sequences, AlphaProteo, dedicated to protein design, and AlphaFold, responsible for predicting protein structures and winner of the 2024 Nobel Prize in Chemistry. In the case of AlphaGenome, the innovation lies in its ability to analyze long DNA sequences and predict how each pair of nucleotides influences different biological processes within the cell.
Based on deep learning techniques, the system was trained with data from large public consortia that conducted experimental measurements on hundreds of types of human and mouse cells and tissues. This base allowed the model to learn complex patterns of genetic regulation and apply that knowledge in an integrated way.
Before AlphaGenome, there were already models capable of studying regulatory DNA regions, but they faced technical limitations. It was necessary to choose between analyzing long sequences with less precision or focusing on shorter stretches with more detailed resolution. According to Žiga Avsec, one of the project’s co-authors, fully understanding a gene’s regulatory environment requires analyzing sequences that can reach up to one million pairs of nucleotides. The new tool seeks to overcome this dilemma by combining length and precision.
Another distinguishing feature of AlphaGenome is its ability to model the influence of DNA on 11 distinct biological processes simultaneously. Until now, researchers had to resort to different models to obtain this type of integrated analysis. For Natasha Latysheva, also a co-author of the study published in Nature, the tool represents a significant advance. “It can accelerate our understanding of the genome by helping to map the location of functional elements and determine their roles at the molecular level,” she stated.
Kohli also highlighted the collaborative nature of the project. “We hope researchers will enrich it with more data and modalities,” he said. According to Google, AlphaGenome has already been tested by about 3,000 scientists from 160 countries and is now available as open-source for non-commercial research use.
Experts outside the project evaluate the model positively but cautiously. Ben Lehner, head of generative and synthetic genomics at the Wellcome Sanger Institute in Cambridge, stated that the tool is very effective, though it still has limitations. “Accurately identifying the differences in our genomes that make us more or less susceptible to developing thousands of diseases is a crucial step toward better treatments,” he noted. At the same time, he cautioned that “AI models are only as good as the data used to train them,” emphasizing that many available datasets are still small and poorly standardized.
A similar assessment was made by Robert Goldstone, head of genomics at the Francis Crick Institute. For him, AlphaGenome should not be seen as a definitive answer to all biology questions, since gene expression also depends on complex environmental factors. Still, he considered the tool essential for advancing the field. According to Goldstone, it will allow scientists to “programmatically study and simulate the genetic bases of complex diseases,” expanding research possibilities and understanding of human genome function.
Source: brasil247.com


