We are pulling together climate related resources.


The current explosion of advances in gene editing technology and public accessibility to these techniques poses potentially harmful biosecurity threats. To accurately assess national security risks, threats from bioengineering must be clearly distinguishable from natural genome variation.

At Sandia, we’re focusing our research on three specific methods judged to have the highest potential for detecting edits. We will leverage the distinct differences in analytical approaches of the three methods by deriving final edit-likelihood scores on an ensemble of three outcomes, thus amplifying the unique discriminative power of each selected method.

Technology Advancement

The three methods being used to detect genome editing include:

  • Iterative Random Forests – a classification method developed to identify subtle patterns and signatures that indicate an edit.
  • Deep Learning Neural Networks trained and tested to verify if patterns via the first method persist when analyzed with an orthogonal technique. Additionally, neural networks will be deployed to identify and classify sequence anomalies to determine whether they correspond to natural variation or intentional modification.
  • Artificial Immune Systems – a biologically inspired anomaly detection method, that when applied, will find insertions and recombination's based on structural differences from unaltered sequences


The use of data science for detection in gene editing would facilitate understanding the consequences of use-cases that involve national security threats which in turn will allow for mitigation efforts and/or forensic investigation.


Data Science for Detection of Genome Editing

Sandia National Laboratories
Publication Date
Sep 1, 2019
Agreement Type