In silico modeling and prediction of CRISPR-Cas9 off- and on-targets
The CRISPR-Cas9 system is a highly popular and widespread used system for genome editing. Although there is debate about the consequences of unintended edits (off-targets) there are a range of applications, e.g. therapeutics for which these are not desirable. In complement, the design of guide RNAs (gRNAs) for binding and cleaving the genomic DNA requires high on-target efficiency. However, most computational designs do not take off-targets explicitly into account, but in best case employ complement off-target predictions. Here, an energy model for off-target assessment is constructed. Compared to its machine learning counterparts, it does not only perform better, but also correlates better with the data. In developing the energy model for on-target use, it is possible to explain the different behavior of efficient versus non-efficient gRNAs. Employing all the acquired modeling into a machine learning framework an on-target efficiency prediction method is made taking off-targets explicitly into account, resulting in on-target efficiency predictions for gRNAs with potential only low off-target. Upon dissecting the machine learning model, we find that features derived from the energy model are among the far most important ones for the on-target prediction.