Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Intro

To schedule and manage jobs on the Supek computer cluster, PBS Pro (Portable Batch System Professional) is used, which schedules jobs within the cluster. Its primary task is the distribution of computer tasks, i.e. batch jobs, among the available computer resources.

This document describes the use of PBS Pro 2022.1.1 version.


Job running

User applications (hereinafter jobs) that are started using the PBS system must be described by a start shell script (sh, bash, zsh...). Within the startup script above the normal commands, the PBS parameters are listed. These parameters can also be specified when submitting a job.

...

Code Block
qcat jobID
qcat -e jobID
qtail jobID
qtail -e jobID

Job submitting

There are s

everal ways jobs can be submitted:

...

Tip
titleJob Array

This method is preferred over multiple submissions (e.g. with a for loop) because:

  • reduces job queue load - each job will compete for resources simultaneously with everyone else in the queue, instead of one after the other
  • easier management - modification of all jobs is possible by calling the main (e.g. 14575[]) or individual (e.g. 14575[3]) job identifier 

The environment variables defined by PBS during their execution are:

  • PBS_ARRAY_INDEX - ordinal number of sub-jobs in the job field (e.g. one to nine in the example above)
  • PBS_ARRAY_ID - identifier of the main job field
  • PBS_JOBID - subjob identifier in the job field


Job Description

The PBS system language is used to describe jobs, while the job description file is a standard shell script. In the header of each script, PBS parameters are listed that describe the job in detail, followed by commands to execute the desired application.

...

Code Block
languagebash
titlemy_job.pbs
#!/bin/bash
 
#PBS -<parametar1> <value>
#PBS -<parametar2> <value>
 
<command>

Basic PBS parameters

OptionArgumentMeaning
-NnameNaming the job
-qdestinationSpecifying job queue or node
-llist_of_resourcesAmount of resources required for the job
-Mlist_of_usersList of users to receive e-mail
-memail_optionsTypes of mail notifications
-opath/to/directoryPath to directory for output file
-epath/to/directoryPath to directory for error file
-j
oeCombining output and error file
-Wgroup_listproject_codeProject code for job

...

-l select=3:ncpus=2Option for 3 chunks of a node with 2 cores (6 cores in total)
-l select=1:ncpus=10:mem=20GBOption for 1 chunka of a node with 10 cores i 20GB RAM
-l ngpus=2Option for 2 GPU-s


PBS environmental variables

NCPUSNumber of cores requested. Matches the value from the ncpus option from the PBS script header.
OMP_NUM_THREADSAn OpenMP variable exported by PBS to the environment that is equal to the value of the ncpus option from the PBS script header
PBS_JOBIDIdentifikator posla koji daje PBS kada se posao preda. Stvoreno nakon izvršenja naredbe qsub.
PBS_JOBNAMEJob identifier provided by PBS when a job is submitted. Created after executing the qsub command.
PBS_NODEFILEList of work nodes, or processor cores on which the job is executed
PBS_O_WORKDIRThe working directory in which the job was submitted, or in which the qsub command was invoked.
TMPDIRThe path to the scratch directory.

...

Tip
titleSetting up working directory

While in PBS pro the path for output and error files is specified in the directory where they are executed, the input and output files of the program itself are by default loaded/saved in the $HOME directory. PBS Pro does not have the option of specifying the job to run in the current directory we are in, so it is necessary to manually change the directory.

After the header it is necessary to write:

cd $PBS_O_WORKDIR

It will redirect the job execution to the directory where the script was run.


Parallel jobs

OpenMP parallelization

If your application uses parallelization exclusively at the level of OpenMP threads and cannot expand beyond one working node (that is, it works with shared memory), you can call the job as shown in the xTB application example below.

...

Code Block
languagebash
#!/bin/bash
 
#PBS -q cpu
#PBS -l ncpus=8
 
cd ${PBS_O_WORKDIR}
 
xtb C2H4BrCl.xyz --chrg 0 --uhf 0 --opt vtight

MPI parallelization

If your application can be parallelized hybridly, i.e. divide its MPI processes into OpenMP threads, you can call the job as shown in the GROMACS application example below:

...

Note

Scientific applications on Supek and cray-pals

Scientific applications that are available on Supek via the modulefiles tool already call this module, so it is not necessary to call it again.


Monitoring and management of job execution

Job monitoring

The PBS command qstat is used to display the status of jobs. The basic command syntax is:

...

Some of the more used options are:

-EGroups jobs by server and displays jobs sorted by ascending ID. When qstat is displayed with a list of jobs, the jobs are grouped by server and each group is shown by ascending ID. This option also improves the performance of qstat.
-tDisplays status information for jobs, job streams, and subjobs.
-pThe display of the Time Used column is replaced by the percentage of work done. For a string job, this is the percentage of subjobs completed. For normal work, this is a percentage of the allocated CPU time used.
-xDisplays status information for completed and moved jobs in addition to pending and running jobs.
-QShows queue status in standard format.
-qDisplays queue status in an alternate format.
-fDisplays job status in an alternate format

Examples of use:

Detailed job description:

...

Code Block
tracejob <job_ID>

Example:

Code Block
$ tracejob 2670
 
Job: 2670.x3000c0s25b0n0.hsn.hpc.srce.hr
 
03/30/2023 11:23:24  L    Considering job to run
03/30/2023 11:23:24  S    Job Queued at request of mhrzenja@x3000c0s25b0n0.hsn.hpc.srce.hr, owner =
                          mhrzenja@x3000c0s25b0n0.hsn.hpc.srce.hr, job name = mapping, queue = cpu
03/30/2023 11:23:24  S    Job Run at request of Scheduler@x3000c0s25b0n0.hsn.hpc.srce.hr on exec_vnode
                          (x8000c0s0b0n0:ncpus=40:mem=104857600kb)
03/30/2023 11:23:24  L    Job run
03/30/2023 11:23:24  S    enqueuing into cpu, state Q hop 1
03/30/2023 11:23:56  S    Holds u set at request of mhrzenja@x3000c0s25b0n0.hsn.hpc.srce.hr
03/30/2023 11:24:22  S    Holds u released at request of mhrzenja@x3000c0s25b0n0.hsn.hpc.srce.hr

Job managment

The job can also be managed after submitting.

...

Code Block
qdel -W force -x <job_ID>


Postponement of execution

PBS provides the feature to perform tasks in dependence on others, which is useful in cases such as:

...