đź—¨ About Me

I am a first-year PhD student in the Computational Biology and Bioinformatics (CBB) program at the University of Southern California. My research lies at the intersection of AI and biology, where I design computational approaches to accelerate discoveries in synthetic biology, drug discovery, and molecular interaction. Specifically, I primarily focus on three aspects: (1) Structure-based AI driven drug discovery [SMARTBind, Apo2Mol]; (2) Biological foundation model for protein and genomics [Tabula, ProTrek, SaProtHub, Nullsettes]; (3) Machine learning enabled protein evolution [Sequence Display].

Before starting my PhD, I was very fortunate to work with and learn from inspiring mentors and collaborators across these fields, you can find them in the experience panel.

đź“– Educations

  • 2025 - present, PhD student, Computational Biology and Bioinformatics. University of Southern California. Los Angeles, CA
  • 2022 - 2024, Master of Science in Engineering, Computer Science. Johns Hopkins University. Baltimore, MD
  • 2018 - 2022, Bachelor of Science, Computer Science. Wenzhou-Kean University. Wenzhou, China

đź“° News

  • 2025.11: “Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models” is accepted by AAAI 2026!
  • 2025.10: “Engineering Unnatural Cells with a 21st Amino Acid as a Living Epigenetic Sensor” is on Nature Communications!
  • 2025.09: “Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input” is released on bioRxiv.
  • 2025.09: “Tabula: A Tabular Self-Supervised Foundation Model for Single-Cell Transcriptomics” is accepted by NeurIPS 2025!
  • 2025.09: “Biosynthesis of Unnatural Cyclodipeptides through Genetic Code Expansion and Cyclodipeptide Synthase Evolution” is on Journal of the American Chemical Society!
  • 2025.08: “Evaluating DNA function understanding in genomic language models using evolutionarily implausible sequences” (follow-up work from the GenBio workshop) is released on arXiv.
  • 2025.08: “SaprotHub: Democratizing Protein Language Model Training, Sharing and Collaboration for the Biology Community” is accepted by Nature Biotechnology!
  • 2025.07: “A tri-modal protein language model enables advanced protein searches” is accepted by Nature Biotechnology!
  • 2025.07: “Predicting function of evolutionarily implausible DNA sequences” is presented at Q-BIO 2025 Conference: Emergent Orders in Living Systems Across Scales, see our poster.
  • 2025.06: “Sequence Display-Enabled Machine Learning for Protein Evolution” is presented at 2025 Synthetic Biology: Engineering, Evolution, & Design, see our poster.
  • 2025.06: “Predicting function of evolutionarily implausible DNA sequences” is accepted by ICML 2025 Generative AI and Biology Workshop!
  • 2025.04: I will be joining the PhD program in Computational Biology and Bioinformatics at USC QCB, looking forward to the journey.
  • 2025.01: “Toward a privacy-preserving predictive foundation model of single-cell transcriptomics with federated learning and tabular modeling” is released on bioRxiv, see our post.

📝 Selected Publications

bioRxiv
sym

Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input

Shiyu Jiang †, Amirhossein Taghavi †, Tenghui Wang, Samantha M. Meyer, Jessica L. Childs-Disney, Chenglong Li, Mattew D. Disney, Yanjun Li. bioRxiv, 2025. (Under review)

GitHub

arXiv
sym
bioRxiv
sym

Sequence Display: Generating Large-Scale Sequence–Activity Datasets to Advance Universal Protein Evolution

Linqi Cheng †, Xinzhe Zheng †, Shiyu Jiang †, Hu Y, Liu Y, Yang K, Rui J, Ding H, Zhang M, Yuan T, Ye H, Li C, Kevin K. Yang, Xiongyi Huang, Han Xiao. bioRxiv, 2025. (Under review)

GitHub

AAAI 2026
sym

Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models

Xinzhe Zheng, Shiyu Jiang, Gustavo Seabra, Chenglong Li, Yanjun Li. AAAI (poster), 2026.

GitHub

NeurIPS 2025
sym

Tabula: A Tabular Self-Supervised Foundation Model for Single-Cell Transcriptomics

Jiayuan Ding †, Jianhui Lin †, Shiyu Jiang †, Yixin Wang, Ziyang Mao, Zhaoyu Fang, Jiliang Tang, Min Li, Xiaojie Qiu. NeurIPS (poster), 2025.

GitHub

JACS
sym

Biosynthesis of Unnatural Cyclodipeptides through Genetic Code Expansion and Cyclodipeptide Synthase Evolution

Hu Y †, Cheng L †, Liu Y, Liu R, Jiang S, Yuan T, Wang Y, Ye H, Xiao H. Journal of the American Chemical Society, 2025.

GitHub

Nature Biotechnology
sym

A tri-modal protein language model enables advanced protein searches

Jin Su †, Yan He †, Shiyang You †, Shiyu Jiang, Xibin Zhou, Xuting Zhang, Yuxuan Wang, Xining Su, Igor Tolstoy, Xing Chang, Hongyuan Lu, Fajie Yuan. Nature Biotechnology, 2025.

Online Server

Nature Biotechnology
sym

SaprotHub: Democratizing Protein Language Model Training, Sharing and Collaboration for the Biology Community

Jin Su, Zhikai Li, Tianli Tao, Chenchen Han, Yan He, Fengyuan Dai, Qingyan Yuan, Yuan Gao, Tong Si, Xuting Zhang, Yuyang Zhou, Junjie Shan, Xibin Zhou, Xing Chang, Shiyu Jiang, Dacheng Ma, The OPMC, Martin Steinegger, Sergey Ovchinnikov, Fajie Yuan. Nature Biotechnology, 2025.

GitHub | OPMC

ICML 2025 GenBio Workshop
sym

Predicting function of evolutionarily implausible DNA sequences

Shiyu Jiang, Xuyin Liu, Jerry Zitong Wang. ICML 2025 Generative AI and Biology Workshop, 2025.

GitHub

ACS Nano
sym

Integrating Metal–Phenolic Networks-Mediated Separation and Machine Learning-Aided Surface-Enhanced Raman Spectroscopy for Accurate Nanoplastics Quantification and Classification

Haoxin Ye, Shiyu Jiang, Yan Yan, Bin Zhao, Edward R Grant, David D Kitts, Rickey Y Yada, Anubhav Pratap-Singh, Alberto Baldelli, Tianxi Yang. ACS Nano, 2024.

Featured on Cover

ALIFE 2023
sym

Simulating Disease Spread During Disaster Scenarios

Shiyu Jiang, Heejoong Kim, Fabio Henrique Tanaka, Claus Aranha, Anna Bogdanova, Kimia Ghobadi, Anton Dahbura. The International Conference on Artificial Life, 2023.

GitHub

Bioinformatics
sym

HNOXPred: a web tool for the prediction of gas-sensing H-NOX proteins from amino acid sequence

Shiyu Jiang, Hemn Barzan Abdalla, Chuyun Bi, Yi Zhu, Xuechen Tian, Yixin Yang, Aloysius Wong. Bioinformatics, 2022.

Online Server | GitHub

🧑‍💻 Experience

🔨 Models and Tools

Genomics

  • Tabula: A privacy-preserving predictive foundation model for single-cell transcriptomics, leveraging federated learning and tabular learning.

  • Nullsettes: a synthetic biology benchmark simulating loss-of-function mutations via control element translocations, enabling zero-shot evaluation of genomic language models.

  • SICER 2.0 & Clipper dev Version (Spatial-clustering Identification of ChIP-Enriched Regions): a redesigned ChIP-Seq broad peak calling data analysis method.

Protein

  • Sequence display: a platform that integrates large‑scale sequence–activity datasets with protein language models to map activity landscapes and identify high‑performance protein variants.

  • ProTrek: a tri-modal protein language model that jointly models protein sequence, structure and function (SSF).

  • Evolla: a protein-language generative model (Protein ChatGPT) designed to decode the molecular language of proteins.

  • SaProtHub: making Protein Modeling Accessible to All Biologists.

  • HNOXPred (Prediction of Heme-Nitric oxide/OXygen domains): a web server to predict gas-sensing H-NOX proteins from amino acid sequences.

Drug Discovery

  • Apo2Mol: Apo2Mol is a diffusion-based molecule generation model leveraging Apo-Holo pocket dynamics.

  • SMARTBind: SMARTBind is a structure-agnostic RNA-ligand interaction prediction method, which can be used for RNA-ligand virtual screening and binding site prediction.

Other

  • gmx_mmpbsa_py: an easy-to-use Python script that integrates GROMACS molecular dynamics trajectories with APBS to compute protein–ligand binding free energies using the MM/PBSA method.

  • Koudou: an agent-based model that simulates the infectious disease spread under college town scenario.

🌎 Service

  • Journal reviewer: IEEE Transactions on Computational Biology and Bioinformatics
  • Conference reviewer: AAAI 2026

🌎 Miscellaneous

Outside of work, you’ll often find me at the gym, playing soccer, road cycling, or go hiking. I also enjoy playing table tennis and the piano occasionally.