Awesome Open-Source Bio/Cheminformatics

A (growing) list of open-source Bio/Cheminformatics tools that I found useful in my work. If you know other tools in this realm that I should check out, please reach out.

Autodock Vina

#molecular-docking

Open-source program for doing molecular docking.

Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041641/

Forks:

smina is a fork of AutoDock Vina that focuses on improving scoring and minimization
QuickVina - fast and accurate molecular docking tool, attained at accurately accelerating AutoDock Vina
Gnina - molecular docking program with integrated support for scoring and optimizing ligands using convolutional neural networks. It is a fork of smina, which is a fork of AutoDock Vina

Autodock GPU

#molecular-docking

OpenCL and Cuda accelerated version of AutoDock4.2.6. It leverages its embarrasingly parallelizable LGA by processing ligand-receptor poses in parallel over multiple compute units.

Github: https://github.com/ccsb-scripps/AutoDock-GPU
Publication: Accelerating AutoDock4 with GPUs and Gradient-Based Local Search, J. Chem. Theory Comput. 2021, 10.1021/acs.jctc.0c01006

VirtualFlow

#virtual-screening

VirtualFlow is a versatile, parallel workflow platform for carrying out virtual screening related tasks on Linux-based computer clusters of any type and size which are managed by a batchsystem (such as SLURM).

Github: https://github.com/VirtualFlow/VFVS
Publication: An open-source drug discovery platform enables ultra-large virtual screens. Nature 580, 663–668 (2020). https://doi.org/10.1038/s41586-020-2117-z

Gypsum-DL

#ligand-preparation

Gypsum-DL is a free, open-source program for preparing 3D small-molecule models. Beyond simply assigning atomic coordinates, Gypsum-DL accounts for alternate ionization, tautomeric, chiral, cis/trans isomeric, and ring-conformational forms.

Gitlab: https://git.durrantlab.pitt.edu/jdurrant/gypsum_dl
Publication: "Gypsum-DL: An Open-source Program for Preparing Small-molecule Libraries for Structure-based Virtual Screening." Journal of Cheminformatics 11:1. doi:10.1186/s13321-019-0358-3

LIT-PCBA

#dataset

15 target sets, 9780 actives and 407839 unique inactives selected from high-confidence PubChem Bioassay data

Data: http://drugdesign.unistra.fr/LIT-PCBA/
Publication: LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening. https://doi.org/10.1021/acs.jcim.0c00155

Apricot

#submodular-optimization

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

Github: https://github.com/jmschrei/apricot
Publication: https://jmlr.org/papers/volume21/19-467/19-467.pdf

MolPal

#active-learning

Accelerating high-throughput virtual screening through molecular pool-based active learning.

Github: https://github.com/coleygroup/molpal
Publication: https://arxiv.org/abs/2012.07127

PyScreener

#virtual-screening

A pythonic interface to high-throughput virtual screening software.

Github: https://github.com/coleygroup/pyscreener

Other Resources

Building a virtual ligand screening pipeline using free software: a survey. https://doi.org/10.1093/bib/bbv037