top of page
  • Writer's picturebxgenetics

ExPecto predicts effects of genetic mutations in genetic dark matter

By Camille Perez

Genes code for proteins, right? However this only holds true for 1% of the human genome. The other 99% of the genome is known as genetic dark matter, long, non-coding sequences of DNA with housekeeping functions, but many of them are not known. Every cell in our body contains the whole human genome, but the noncoding genes determine which genes are switched on and off, allowing for cell differentiation and function. Mutations in the coding region have expected outcomes, since a substituted amino acid or premature stop codon can be detected. But mutations in the noncoding region, which are associated with many diseases, have unpredictable outcomes, such as causing too little or too much expression in the wrong parts of the body at the wrong time.

Identifying these mutations is laborious in such a large data set of DNA. Previously, scientists compared the genomes of many individuals with certain diseases, but this becomes difficult with rarer mutations and because DNA is often inherited in clusters.

A new artificial intelligence program named ExPecto (after a Harry Potter spell) has been developed to read a DNA sequence with a mutation, which causes a change in protein or regulatory action, and predict the effects on the phenotype. The developers at Princeton and the Simons Foundation in New York have trained the system algorithms based on genome-wide association studies that find causal relationships between disease markers and actual conditions, in order to speculate disease-specific variants. The system could simulate mutations in noncoding, regulatory regions and predict the effects as well.

ExPecto was able to identify over 140 million mutations across 200 different tissues and cell types, which are available to the public to access through HumanBase. ExPecto has significant potential in detecting mutations for increasing the risk of immune-related diseases. The scientists used the program to predict the mutations associated with Crohn’s disease, chronic HBV infection and Behçet’s disease. They were able to experimentally confirm that the mutations found with ExPecto were more promising candidates than previously found mutations. Jian Zhou, a Flatiron research fellow, explained the implications of ExPecto in the medical industry saying, “Once you know which protein is affected and what the protein does, then you can design drugs that can fix the problem,” and ““if you can’t produce a certain protein, then you could design a therapy that makes up for the missing protein.”

ExPecto also shows promise in modeling the effects of evolutionary pathways. The main theory of evolution is that favorable genes or traits are passed on to the next generation. ExPecto has found that mutations in genes expressed throughout the whole body are more rare than expressed genes concentrated in one tissue. A genetic variation expressed by more cells are more likely to be fatal, so them being more rare is consistent with our ideas of evolution.


AI accurately predicts effects of genetic mutations in biological dark matter. Simons Foundation. (2019, September 10).

Deirdre. (2018, July 24). Expecto: New AI predicts biological roles of genetic variations. Evolving Science.,the%20gene%20in%20question%20codes

Expecto patronum! Magical Machine Learning Tool summons DNA dark matter data. Genomics Research from Technology Networks. (n.d.).

Zhou, J., Theesfeld, C. L., Yao, K., Chen, K. M., Wong, A. K., & Troyanskaya, O. G. (2018). Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nature genetics, 50(8), 1171–1179.

1 view0 comments
bottom of page