machine learning / bioelectricity / longevity

Setting up Virtual Flow on AWS using Parallelcluster and Slurm

This is a short tutorial on how to set up AWS Parallelcluster with Slurm to run VirtualFlow.

VirtualFlow is a versatile, parallel workflow platform for carrying out virtual screening related tasks on Linux-based computer clusters of any type and size which are managed by a batchsystem (such as SLURM).

AWS Parallelcluster with Slurm

Creating our working environment

First, we'll create our working directory and set up a virtual environment using poetry. We need to add the awscli package as well as the aws-parallelcluster package.

mkdir parallel_cluster
cd parallel_cluster
poetry init
poetry add awscli aws-parallelcluster

Setting up the cluster config

To set up the AWS Parallelcluster I mainly followed this post. We start by creating the config for our cluster. Make sure to create an EC2 key pair beforehand.

 $ poetry run pcluster configure                  
Allowed values for AWS Region ID:
1. ap-northeast-1
2. ap-northeast-2
3. ap-south-1
4. ap-southeast-1
5. ap-southeast-2
6. ca-central-1
7. eu-central-1
8. eu-north-1
9. eu-west-1
10. eu-west-2
11. eu-west-3
12. sa-east-1
13. us-east-1
14. us-east-2
15. us-west-1
16. us-west-2
AWS Region ID [us-west-2]: 16
Allowed values for EC2 Key Pair Name:
1. parallelcluster
EC2 Key Pair Name [parallelcluster]: 1
Allowed values for Scheduler:
1. sge
2. torque
3. slurm
4. awsbatch
Scheduler [slurm]: 3
Allowed values for Operating System:
1. alinux
2. alinux2
3. centos7
4. centos8
5. ubuntu1604
6. ubuntu1804
Operating System [alinux2]: 2
Minimum cluster size (instances) [0]: 1
Maximum cluster size (instances) [10]: 
Head node instance type [t2.micro]: c4.large
Compute instance type [t2.micro]: c4.xlarge
Automate VPC creation? (y/n) [n]: y

We should now have a config file similar to this:

$ cat ~/.parallelcluster/config 
[aws]
aws_region_name = us-west-2

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[global]
cluster_template = default
update_check = true
sanity_check = true

[vpc default]
vpc_id = vpc-*****************
master_subnet_id = subnet-*****************

[cluster default]
key_name = parallelcluster
scheduler = slurm
master_instance_type = c4.large
base_os = alinux2
vpc_settings = default
queue_settings = compute

[queue compute]
enable_efa = false
enable_efa_gdr = false
compute_resource_settings = default

[compute_resource default]
instance_type = c4.xlarge
min_count = 1

Creating the cluster

After the config file is set, we can create our cluster using the following commands. AWS will then spin up our CloudFormation stack which will take a couple of minutes.

$ poetry run pcluster create test-cluster
Beginning cluster creation for cluster: test-cluster
Creating stack named: parallelcluster-test-cluster
...

In order to access our head node we can run the following:

poetry run pcluster ssh test-cluster -i ~/.ssh/<key_name>

VirtualFlow

To get started with VirtualFlow I recommend running through the first tutorial to make sure the cluster has been set up correctly. I'm only going through the changes that need to be made and list the other steps solely for completeness. The tutorial does a good job at explaining each individual step.

Setting up VirtualFlow

First, we download the tutorial files and unzip them.

$ wget https://virtual-flow.org/sites/virtual-flow.org/files/tutorials/VFVS_GK.tar
$ tar -xvf VFVS_GK.tar
$ cd VFVS_GK/tools

Preparing the config files

There are two files in which we need to make changes. We want to make sure our batchsystem is set to 'SLURM' and change the partition to 'compute' which is the default name when we use AWS Parallelcluster.

# tools/templates/all.ctrl
...
batchsystem=SLURM
# Possible values: SLURM, TOQRUE, PBS, LSF, SGE
# Settable via range control files: No
...
partition=compute
# Partitions are also called queues in some batchsystems
# Settable via range control files: Yes

If 'compute' doesn't work, try running the following command to retrieve the correct partition name:

$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST 
compute*     up   infinite      5  idle~ compute-dy-c4xlarge-[5-9] 
compute*     up   infinite      5  alloc compute-dy-c4xlarge-[1-4],compute-st-c4xlarge-1 

The second config file we need to adjust is the Slurm job template script. Usually we should be able to leave all the default values but I ran into this error:

srun: error: Unable to create step for job 874794: Memory required by task is not available

In order to solve it, we simply comment out the line with the --mem-per-cpu parameter.

# Slurm Settings
###############################################################################

#SBATCH --job-name=h-1.1
##SBATCH --mail-user=To be completed if uncommented
#SBATCH --mail-type=fail
#SBATCH --time=00-12:00:00
##SBATCH --mem-per-cpu=1024M
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --partition=main
#SBATCH --output=../workflow/output-files/jobs/job-1.1_%j.out           # File to which standard out will be written
#SBATCH --error=../workflow/output-files/jobs/job-1.1_%j.out            # File to which standard err will be written
#SBATCH --signal=10@300

As a last preparation step we simply go back to the /tools subfolder and run this command:

./vf_prepare_folders.sh

More details here: https://docs.virtual-flow.org/tutorials/-LdE94b2AVfBFT72zK-v/vfvs-tutorial-1/setting-up-the-workflow.

Starting the jobs

To spin up our nodes, we simply run this command:

./vf_start_jobline.sh 1 12 templates/template1.slurm.sh submit 1

More details can be found here: https://docs.virtual-flow.org/tutorials/-LdE94b2AVfBFT72zK-v/vfvs-tutorial-1/starting-the-workflow.

Monitoring and Wrapping Up

In order to monitor the jobs and view the files after completion, I recommend the respective sections of the tutorial:

Monitoring

Completed Workflow

Using our own files

Running the same workflow with our own files is pretty straightforward. After we downloaded the template files in the 'Setting up VirtualFlow' step we need to replace the ligand library as well as our target protein.

Replacing the ligand library

The second tutorial in the VirtualFlow documentation has a section dedicated to this.

Using a different Protein

Here, I downloaded AutoDock Vina together with MGLTools and followed the tutorial on http://vina.scripps.edu which looks outdated but still works fine. We can use AutoDock Vina to convert our protein from .pbd to .pdbqt and use the 'GridBox' tool to get the necessary parameters for respective receptor config file.

# ../input-files/smina_rigid_receptor1/config.txt
receptor = ../input-files/receptor/<protein>.pdbqt
center_x = 28.614
center_y = 15.838
center_z = -2.045
size_x = 36.0
size_y = 32.0
size_z = 36.0
exhaustiveness = 4
scoring = vinardo
cpu = 1

We add our protein to the folder and change both the smina (/input-files/smina_rigid_receptor1) and qvina receptor (/input-files/qvina02_rigid_receptor1) config files.

That's it. Now we can follow the rest of the steps outlined in the 'VirtualFlow' section above.