WP4-MUDI4LS WorkPackage 4 du projet MUDIS4LS

Use case AlphaFold

Background

AlphaFold is a tool based on neural networks (IA) developed by the DeepMind company (https://www.deepmind.com/research/highlighted-research/alphafold) that is dedicated to protein structure prediction (Jumper et al., 2021). It predicts 3D coordinates of protein structures with high accuracy from amino acids sequences only. It first runs sequence alignements with state of the art tools for the submited sequence(s) that are further used in a deep learning algorithm to produce structure predictions.

The version 2 of the software showed impressive results at the CASP14 (Critical Assessment of Techniques for Protein Structure Prediction) in 2020 for single proteins (Kryshtafovych et al., 2021). The currently running CASP15 coupled with CAPRI (Critical Assessment of PRedicted Interactions) should provide insights about the accuracy of the tool for quaternary structure predictions (Evans et al., 2021). The code of AlphaFold 2.x was made available on github at the same time as the article (https://github.com/deepmind/alphafold).

The quality of the predictions of AlphaFold obviously boosted the structural bioinformatics community but other communities have also started to think about integrating these predictions in currently running or new projects.

Many predictions are already provided in the ever-growing AlphaFold database, which shows predictions of proteins for more and more species (Varadi et al., 2022). These precomputed structure predictions can already respond to many needs but the demand for using AlphaFold stays very high because e.g. one may need to model a structure for a sequence that is not in the AlphaFold database (or not yet), one may like to get all the predictions provided by AlphaFold (only the best one is available in the Alphafold database), or one may want to model quaternary structures, which is not provided in the database.

In addition, DeepMind provides AlphaFold on a Colab notebook (https://colab.research.google.com/github/deepmind/alphafold/blob/main/notebooks/AlphaFold.ipynb). It allows to run predictions through a web interface but with limitations.

Next to it, Mirdita et al. recently published ColabFold (Mirdita et al., 2022), which is a tuned version of AlphaFold2, which uses a different and faster alignment software for sequence alignment. It is also available as a dedicated Colab notebook (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb)

Need

The growing demand to produce structure predictions requires to provide a wider access of the software to the community. This access needs to be:

  1. in command line with a lot of computing resources behind for users who have to run a lot of modelings
  2. through a simple web interface for people who only need to run a few modelings and don’t know much about command line
  3. both DeepMind’s AlphaFold and ColabFold should be accessible to the community

Resources

The tool is based on a deep learning approach which requires heavy computing that can partially run on GPU (Graphics Processing Unit). The process is divided into 3 steps:

  1. sequence alignments (if not directly provided by the user)
  2. structure modeling
  3. relaxation (optional)

The sequence alignment part runs on CPU. DeepMind’s Alphafold integrates JackHMMER and HHsearch/HHblits which require a ~2.5TB database. ColabFold integrates MMseqs2, which is faster and requires a ~1TB database.

The structure modeling can run both on GPU and CPU, the CPU option being very slow compared to the GPU one. The problem of the GPU option is that it may require a lot of memory for large sequences. This value seems however to also depend on the type of GPU used.

Objectives

The objectives of the use case is to:

The sequences for the benchmark will have to be defined.

Perspectives

Since AlphaFold, many laboratories develop new software based on deep learning approaches to predict structures. Like AlphaFold and ColabFold, they could be included in this use case to make them accessible to the community and benchmark them (e.g.: https://github.com/HeliXonProtein/OmegaFold, https://github.com/aqlaboratory/openfold)

References