Run on HPC
ENEO was developed and tested in High Performances Computing (HPC) clusters with the SLURM workload manager. It's strongly suggested to use a recent version of Snakemake (>8.0.0) that works smootly with the slurm executor plugin.
Install the SLURM executor plugin with
Then inside the folder worflow/profile/slurm
you'll find a configuration file named config.yaml
, where you should add the details about your SLURM account and desired partition.
Singularity args
Two rules of the workflow (variant annotation and pMHC binding affinity estimation) depend on Singularity containers. It's key to ensure that all the relevant folders are readable/writable within each container. For this reason, multiple folders are required to be mounted, as Snakemake is lazy in assigning mountpoints.
Populate the last entry of the config file, singularity-args
, adding the absolute path for:
- the resources directory
- the temporary directory
- the output directory
- the workflow directory
Additionally, you had to set the TMPDIR environment variable to the temporary directory, to avoid writing permissions in the last step.
SLURM
Insert the account and partition inside workflow/profile/slurm_profile/config.yaml
and any other additional flags required for submitting jobs on the HPC platform in use.
cluster:
mkdir -p slurm-logs/{rule} &&
sbatch
--cpus-per-task={resources.ncpus}
--mem={resources.mem}
--time={resources.time}
--job-name=smk-{rule}-{wildcards}
--output=slurm-logs/{rule}/{rule}-{wildcards}-%j.out
--partition=<partitionhere>
--account=<accounthere>
This will create a folder called slurm-logs
with a subfolder for each rule, where each patient will have a different log file.
Then execute the pipeline with
SGE
Warning
The support for SGE is still experimental. If you spot any issue, report it in the Github section
A config file for SGE is under workflow/profile/sge_profile/config.yaml
. The overall scheme is the following
cluster:
mkdir -p sge-logs/{rule} &&
qsub
-pe smp {resources.ncpus}
-l mem_free={resources.mem}
-l h_rt={resources.time}
-N smk-{rule}-{wildcards}
-o sge-logs/{rule}/{rule}-{wildcards}-$JOB_ID.out
-e sge-logs/{rule}/{rule}-{wildcards}-$JOB_ID.err
-q all.q
The behavior is analogous to the SLURM one.
To execute the pipeline, run