Awesome Open-Source Bio/Cheminformatics
A (growing) list of open-source Bio/Cheminformatics tools that I found useful in my work. If you know other tools in this realm that I should check out, please reach out.
Autodock Vina
#molecular-docking
- Open-source program for doing molecular docking.
Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041641/
Forks:
- smina is a fork of AutoDock Vina that focuses on improving scoring and minimization
- QuickVina - fast and accurate molecular docking tool, attained at accurately accelerating AutoDock Vina
- Gnina - molecular docking program with integrated support for scoring and optimizing ligands using convolutional neural networks. It is a fork of smina, which is a fork of AutoDock Vina
Autodock GPU
#molecular-docking
- OpenCL and Cuda accelerated version of AutoDock4.2.6. It leverages its embarrasingly parallelizable LGA by processing ligand-receptor poses in parallel over multiple compute units.
Github: https://github.com/ccsb-scripps/AutoDock-GPU
Publication: Accelerating AutoDock4 with GPUs and Gradient-Based Local Search, J. Chem. Theory Comput. 2021, 10.1021/acs.jctc.0c01006
VirtualFlow
#virtual-screening
- VirtualFlow is a versatile, parallel workflow platform for carrying out virtual screening related tasks on Linux-based computer clusters of any type and size which are managed by a batchsystem (such as SLURM).
Github: https://github.com/VirtualFlow/VFVS
Publication: An open-source drug discovery platform enables ultra-large virtual screens. Nature 580, 663–668 (2020). https://doi.org/10.1038/s41586-020-2117-z
Gypsum-DL
#ligand-preparation
- Gypsum-DL is a free, open-source program for preparing 3D small-molecule models. Beyond simply assigning atomic coordinates, Gypsum-DL accounts for alternate ionization, tautomeric, chiral, cis/trans isomeric, and ring-conformational forms.
Gitlab: https://git.durrantlab.pitt.edu/jdurrant/gypsum_dl
Publication: "Gypsum-DL: An Open-source Program for Preparing Small-molecule Libraries for Structure-based Virtual Screening." Journal of Cheminformatics 11:1. doi:10.1186/s13321-019-0358-3
LIT-PCBA
#dataset
- 15 target sets, 9780 actives and 407839 unique inactives selected from high-confidence PubChem Bioassay data
Data: http://drugdesign.unistra.fr/LIT-PCBA/
Publication: LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening. https://doi.org/10.1021/acs.jcim.0c00155
Apricot
#submodular-optimization
- apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html
Github: https://github.com/jmschrei/apricot
Publication: https://jmlr.org/papers/volume21/19-467/19-467.pdf
MolPal
#active-learning
- Accelerating high-throughput virtual screening through molecular pool-based active learning.
Github: https://github.com/coleygroup/molpal
Publication: https://arxiv.org/abs/2012.07127
PyScreener
#virtual-screening
- A pythonic interface to high-throughput virtual screening software.
Github: https://github.com/coleygroup/pyscreener
Other Resources
- Building a virtual ligand screening pipeline using free software: a survey. https://doi.org/10.1093/bib/bbv037