Intro
To schedule and manage jobs on the Supek computer cluster, PBS Pro (Portable Batch System Professional) is used, which schedules jobs within the cluster. Its primary task is the distribution of computer tasks, i.e. batch jobs, among the available computer resources.
This document describes the use of PBS Pro 2022.1.1 version.
Job running
User applications (hereinafter jobs) that are started using the PBS system must be described by a start shell script (sh, bash, zsh...). Within the startup script above the normal commands, the PBS parameters are listed. These parameters can also be specified when submitting a job.
Basic job run:
qsub my_job.pbs
Job run with parameters:
qsub -q cpu -l ncpus=4:mem=10GB moj_posao.pbs
More info for qsub parameters:
qsub --help
After submitting the job, it is possible to view the standard output and error of the job that is in execution state with the commands:
qcat jobID qcat -e jobID qtail jobID qtail -e jobID
Job submitting
There are s
everal ways jobs can be submitted:
- by interactive submission
- using a script
- in an interactive session
- job queue
in the case of interactive submission, directly calling the qsub command will open a text editor in the terminal, through which the commands for execution are submitted:
# run qsub [korisnik@x3000c0s25b0n0:~] $ qsub Job script will be read from standard input. Submit with CTRL+D. echo "Hello world" 14571.x3000c0s25b0n0.hsn.hpc.srce.hr # print directory content [korisnik@x3000c0s25b0n0:~] $ ls -l total 5140716 -rw------- 1 korisnik hpc 0 Jun 1 07:44 STDIN.e14571 -rw------- 1 korisnik hpc 12 Jun 1 07:44 STDIN.o14571 # print output file content [korisnik@x3000c0s25b0n0:~] $ cat STDIN.o14571 Hello world
In the case of script submission, we can specify the commands to be executed in the input file that we submit:
# print file hello.sh [korisnik@x3000c0s25b0n0:~] $ cat hello.sh #!/bin/bash #PBS -N hello echo "Hello world" # submit job script [korisnik@x3000c0s25b0n0:~] $ qsub hello.sh 14572.x3000c0s25b0n0.hsn.hpc.srce.hr # print directory content [korisnik@x3000c0s25b0n0:~] $ ls -l total 5140721 -rw------- 1 korisnik hpc 0 Jun 1 07:44 STDIN.e14571 -rw------- 1 korisnik hpc 12 Jun 1 07:44 STDIN.o14571 -rw------- 1 korisnik hpc 0 Jun 1 08:02 hello.e14572 -rw------- 1 korisnik hpc 12 Jun 1 08:02 hello.o14572 -rw-r--r-- 1 korisnik hpc 46 Jun 1 07:55 hello.sh # print output file content [korisnik@x3000c0s25b0n0:~] $ cat hello.o14572 Hello world
In the case of an interactive session, using the qsub -I option without an input script will open a terminal on the main working node within which we can run commands:
# hostname on access node [korisnik@x3000c0s25b0n0:~] $ hostname x3000c0s25b0n0 # interactive session [korisnik@x3000c0s25b0n0:~] $ qsub -I -N hello-interactive qsub: waiting for job 14574.x3000c0s25b0n0.hsn.hpc.srce.hr to start qsub: job 14574.x3000c0s25b0n0.hsn.hpc.srce.hr ready # hostname on working node [korisnik@x8000c0s3b0n0:~] $ hostname x8000c0s3b0n0
In the case of an array of jobs, using the qsub -J X-Y[:Z] option we can submit a given number of identical jobs in the range X to Y with step Z:
# submit job array [korisnik@x3000c0s25b0n0:~] $ qsub -J 1-10:2 hello.sh 14575[].x3000c0s25b0n0.hsn.hpc.srce.hr # print directory content [korisnik@x3000c0s25b0n0:~] $ ls -l total 5140744 -rw------- 1 korisnik hpc 0 Jun 1 07:44 STDIN.e14571 -rw------- 1 korisnik hpc 12 Jun 1 07:44 STDIN.o14571 -rw------- 1 korisnik hpc 0 Jun 1 08:02 hello.e14572 -rw------- 1 korisnik hpc 0 Jun 1 08:21 hello.e14575.1 -rw------- 1 korisnik hpc 0 Jun 1 08:21 hello.e14575.3 -rw------- 1 korisnik hpc 0 Jun 1 08:21 hello.e14575.5 -rw------- 1 korisnik hpc 0 Jun 1 08:21 hello.e14575.7 -rw------- 1 korisnik hpc 0 Jun 1 08:21 hello.e14575.9 -rw------- 1 korisnik hpc 12 Jun 1 08:02 hello.o14572 -rw------- 1 korisnik hpc 12 Jun 1 08:21 hello.o14575.1 -rw------- 1 korisnik hpc 12 Jun 1 08:21 hello.o14575.3 -rw------- 1 korisnik hpc 12 Jun 1 08:21 hello.o14575.5 -rw------- 1 korisnik hpc 12 Jun 1 08:21 hello.o14575.7 -rw------- 1 korisnik hpc 12 Jun 1 08:21 hello.o14575.9 -rw-r--r-- 1 korisnik hpc 46 Jun 1 07:55 hello.sh
Job Array
This method is preferred over multiple submissions (e.g. with a for loop) because:
- reduces job queue load - each job will compete for resources simultaneously with everyone else in the queue, instead of one after the other
- easier management - modification of all jobs is possible by calling the main (e.g. 14575[]) or individual (e.g. 14575[3]) job identifier
The environment variables defined by PBS during their execution are:
- PBS_ARRAY_INDEX - ordinal number of sub-jobs in the job field (e.g. one to nine in the example above)
- PBS_ARRAY_ID - identifier of the main job field
- PBS_JOBID - subjob identifier in the job field
Job Description
The PBS system language is used to describe jobs, while the job description file is a standard shell script. In the header of each script, PBS parameters are listed that describe the job in detail, followed by commands to execute the desired application.
Structure of the startup script:
#!/bin/bash #PBS -P test_example #PBS -q cpu #PBS -e /home/my_directiry #PBS -l select=2:ncpus=10 module load gcc/12.1.0 gcc --version
Example of a startup script:
#!/bin/bash #PBS -<parametar1> <value> #PBS -<parametar2> <value> <command>
Basic PBS parameters
Option | Argument | Meaning |
-N | name | Naming the job |
-q | destination | Specifying job queue or node |
-l | list_of_resources | Amount of resources required for the job |
-M | list_of_users | List of users to receive e-mail |
-m | email_options | Types of mail notifications |
-o | path/to/directory | Path to directory for output file |
-e | path/to/directory | Path to directory for error file |
-j | oe | Combining output and error file |
-Wgroup_list | project_code | Project code for job |
Options for sending notifications by mail with the -m option:
a | Mail is sent when the batch system terminates the job |
b | Mail is sent when the job starts executing |
e | Mail is sent when the job is done |
j | Mail is sent for sub jobs. Must be combined with one or more sub-options a, b or e |
#!/bin/bash #PBS -q cpu #PBS -l select=1:ncpus=2 #PBS -M <name>@srce.hr,<name2>@srce.hr #PBS -m be echo $PBS_JOBNAME > out echo $PBS_O_HOST
Two emails were received:
PBS Job Id: 2686.x3000c0s25b0n0.hsn.hpc.srce.hr Job Name: pbs.pbs Begun execution
PBS Job Id: 2686.x3000c0s25b0n0.hsn.hpc.srce.hr Job Name: pbs.pbs Execution terminated Exit_status=0 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.ncpus=2 resources_used.vmem=0kb resources_used.walltime=00:00:01
Options for resources with the -l option:
-l select=3:ncpus=2 | Option for 3 chunks of a node with 2 cores (6 cores in total) |
-l select=1:ncpus=10:mem=20GB | Option for 1 chunka of a node with 10 cores i 20GB RAM |
-l ngpus=2 | Option for 2 GPU-s |
PBS environmental variables
NCPUS | Number of cores requested. Matches the value from the ncpus option from the PBS script header. |
OMP_NUM_THREADS | An OpenMP variable exported by PBS to the environment that is equal to the value of the ncpus option from the PBS script header |
PBS_JOBID | Identifikator posla koji daje PBS kada se posao preda. Stvoreno nakon izvršenja naredbe qsub. |
PBS_JOBNAME | Job identifier provided by PBS when a job is submitted. Created after executing the qsub command. |
PBS_NODEFILE | List of work nodes, or processor cores on which the job is executed |
PBS_O_WORKDIR | The working directory in which the job was submitted, or in which the qsub command was invoked. |
TMPDIR | The path to the scratch directory. |
Setting up working directory
While in PBS pro the path for output and error files is specified in the directory where they are executed, the input and output files of the program itself are by default loaded/saved in the $HOME directory. PBS Pro does not have the option of specifying the job to run in the current directory we are in, so it is necessary to manually change the directory.
After the header it is necessary to write:
cd $PBS_O_WORKDIR
It will redirect the job execution to the directory where the script was run.