For each citation that was shared on social media (LinkedIn, Facebook, or Twitter) with the “@GenScript” tag, the author will be rewarded with a $10 Amazon gift card or 2,000 GS points.

Discovery of CRISPR-Cas12a clades using a large language model

Nature Communications. 2025-08; 
Yuanyuan Feng, Junchao Shi, Zhanwei Li, Yongqian Li, Jiaxi Yang, Shisheng Huang, Jinfang Zheng, Wei Han, Yunbo Qiao, Jun Zhang, Qi Liu, Yao Yang, Chunyi Hu, Lina Wu, Xiaokang Zhang, Jin Tang, Xingxu Huang, Peixiang Ma Research Center for Life Sciences computing, Zhejiang Lab
Products/Services Used Details Operation
Synthetic Guide RNA The crRNAs were synthesized by GenScript (Nanjing, China), and sequences are listed in Supplementary Table 7. Get A Quote

Abstract

CRISPR-Cas systems revolutionize life science. Metagenomes contain millions of unknown Cas proteins. Traditional mining relies on protein sequence alignments. In this work, we employ an evolutionary scale language model (ESM) to learn the information beyond sequences. Trained with CRISPR-Cas data, ESM accurately identifies Cas proteins without alignment. Limited experimental data restricts feature prediction, but integrating with machine learning enables trans-cleavage activity prediction of uncharacterized Cas12a. We discover 7 undocumented Cas12a subtypes with unique CRISPR loci. Structural analyses reveal 8 subtypes of Cas1, Cas2, and Cas4. Cas12a subtypes display distinct 3D-folds. CryoEM analyses unveil un... More

Keywords