Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
languagebash
# submission of job array 
[korisnik@padobran:~] $ qsub -J 1-10:2 hello.sh 
107[].admin

# print files content
[korisnik@padobran:~] $ ls -l
total 5140744
-rw-------  1 korisnik hpc          0 Jun  1 07:44 STDIN.e14571
-rw-------  1 korisnik hpc         12 Jun  1 07:44 STDIN.o14571
-rw-------  1 korisnik hpc          0 Jun  1 08:02 hello.e14572
-rw-------  1 korisnik hpc          0 Jun  1 08:21 hello.e14575.1
-rw-------  1 korisnik hpc          0 Jun  1 08:21 hello.e14575.3
-rw-------  1 korisnik hpc          0 Jun  1 08:21 hello.e14575.5
-rw-------  1 korisnik hpc          0 Jun  1 08:21 hello.e14575.7
-rw-------  1 korisnik hpc          0 Jun  1 08:21 hello.e14575.9
-rw-------  1 korisnik hpc         12 Jun  1 08:02 hello.o14572
-rw-------  1 korisnik hpc         12 Jun  1 08:21 hello.o14575.1
-rw-------  1 korisnik hpc         12 Jun  1 08:21 hello.o14575.3
-rw-------  1 korisnik hpc         12 Jun  1 08:21 hello.o14575.5
-rw-------  1 korisnik hpc         12 Jun  1 08:21 hello.o14575.7
-rw-------  1 korisnik hpc         12 Jun  1 08:21 hello.o14575.9
-rw-r--r--  1 korisnik hpc         46 Jun  1 07:55 hello.sh


Tip
titlePolja poslovaJob array

This method is preferred over multiple submissions (e.g. with a for loop) because:

  • reduces job queue load - each job will compete for resources simultaneously with everyone else in the queue, instead of one after the other
  • easier management - modification of all jobs is possible by calling the main (e.g. 14575[]) or individual (e.g. 14575[3]) job identifier

The environment variables defined by PBS during their execution are:

  • PBS_ARRAY_INDEX - number of sub-jobs in the job field array (e.g. one to nine in the example above)
  • PBS_ARRAY_ID - identifier of the main job field
  • PBS_JOBID - subjob identifier in the job field

...

Code Block
languagebash
titlemy_job.pbs
#!/bin/bash

#PBS -P test example
#PBS -e /home/my_directory
#PBS -q cpu
#PBS -l walltime=00:01:00
#PBS -l select=1:ncpus=10

module load mpi/openmpi-x86_64

mpicc --version


Osnovni PBS parametri

OpcijaOption argumentThe meaning of the option
-NnameSetting the job name
-qdestinationSpecifying the job queue and/or server
-lresource_listSpecifying the resources required to perform the job
-Muser_listSetting up a list of mail recipients
-mmail_optionsSetting the email notification type
-opath/to/desired/directorySetting the name/path where standard output is saved
-epath/to/desired/directorySetting the name/path where the standard error is saved
-j
oe Concatenation of standard output and error in the same file
-Wgroup_listproject_codeSelection of the project under which the job will be performed


Options for sending notifications by mail option -m:

aMail is sent when the batch system terminates the job
bMail is sent when the job starts executing
eThe mail is sent when the job is finished
jMail is sent for sub jobs. Must be combined with one or more sub-options a, b or e


Code Block
languagebash
titleEmail example
#!/bin/bash

#PBS -q cpu
#PBS -l walltime=00:01:00
#PBS -l select=1:ncpus=2
#PBS -M <name>@srce.hr,<name2>@srce.hr
#PBS -m be

echo $PBS_JOBNAME > out
echo $PBS_O_HOST

...

Options for requesting resources with the -l option

-l select=3:ncpus=2Requesting 3 chunks with 2 cores each (6 cores in total)
-l select=1:ncpus=10:mem=20GBRequesting 1 chunk with 10 cores and 20GB of working memory
-l ngpus=2Requesting 2 gpus
-l walltime=00:10:00Maximum job execution time

PBS environmental variables

NameDescription
PBS_JOBIDJob identifier provided by PBS when a job is submitted. Created after executing the qsub command
PBS_JOBNAMEThe name of the job provided by the user. The default name is the name of the submitted script
PBS_NODEFILEList of work nodes, or processor cores on which the job is executed
PBS_O_WORKDIRThe working directory in which the job was submitted, i.e. in which qsub command was called
OMP_NUM_THREADSAn OpenMP variable that PBS exports to the environment, which is equal to the value of the ncpus option from the PBS script header
NCPUSNumber of cores requested. Matches the value from the ncpus option from the PBS script header
TMPDIRPath to temporary directory


Tip
titleOdređivanje radnog direktorijaSpecifying the working directory

While in PBS the path for the output and error files is specified in the directory in which they are executed, the input and output files of the program itself are loaded/saved in the $HOME directory by default. PBS does not have an option to specify the job execution in the current directory we are in, so it is necessary to change the directory manually.

If you want to switch to the directory where the script was started, after the header you have to write

Dok je u PBS određena putanja za output i error datoteke u direktoriju u kojem se izvode, input i output datoteke samog programa se zadano učitavaju/spremaju u $HOME direktorij.  PBS nema opciju određivanja izvođenja posla u trenutnom direktoriju u kojem se nalazimo stoga je potrebno ručno promijeniti direktorij.

Ako se želite prebaciti u direktorij u kojem je pokrenuta skripta, poslije zaglavlja potrebno je napisati:

cd $PBS_O_WORKDIRAko želite pokretati poslove visokog opterećenja

spremišta If you want to run jobs with high storage load (I/O zahtjevni) izvođenje posla ne preporuča se pokretanje iz intensive) job execution is not recommended to run from $PBS_O_WORKDIR-a već sa $TMPDIR lokacije čime će se iskoristiti brzo spremište. U nastavku pročitajte više o korištenju brzog spremišta i privremenim rezultatima.

Dodjeljivanje resursa poslovima

PBS omogućava definiranje potrebnih resursa na nekoliko načina. Glavna jednica za dodjeljivanje resursa je takozvani "Chunk" ili komad čvora. Chunk se definira s opcijom select. Broj procesorskih  jezgri po chunk-u moguće je definirati s ncpus, broj mpi procesa s mpiprocs i količinu radne memorije s mem. Također moguće je definirati walltime (maksimalno vrijeme izvođenja posla) i place (način raspoređivanje chunk-ova po čvorovima). 

Ako neki od parametara nisu definirani koristiti će se defaultne vrijednost:

but from $TMPDIR location, which will use fast storage. Read more about using fast storage and temporary results below.

Allocating resources to jobs

PBS makes it possible to define the necessary resources in several ways. The main unit for resource allocation is the so-called "Chunk" or piece of node. A chunk is defined with the select option. The number of processor cores per chunk can be defined with ncpus, the number of mpi processes with mpiprocs and the amount of working memory with mem. It is also possible to define walltime (maximum job execution time) and place (chunk allocation method by nodes).

If some of the parameters are not defined, the default value will be used:

ParameterDefault value
ParametarDefaultna vrijednost
select1
ncpus1
mpiprocs1
mem

3500 MB

walltime

48:00:00

place

pack

Kontrola memorije pomoću cgrupa

Memory control using cgroups

In addition to controlling processor usage, cgroups are also set to control memory consumption. This means that jobs run by the user are limited to the requested amount of memory. If the job tries to use more memory than requested in the job description, the system will terminate that job and write the following in the output error fileOsim za kontrolu korištenja procesora, cgrupe postavljene su da kontroliraju i potrošnju memorije. To znači da su poslovi koje korisnik pokreće ograničeni na traženu količinu memorije. Ako posao pokuša iskoristiti više memorije nego je to zatraženo u opisu posla, sustav će prekinuti taj posao i u izlaznu error datoteku zapisati:

Code Block
languagebash
titlePoruka za korisnika kojemu cgrupe ubiju posao zbog nedostatka memorijeMessage to user when cgroups kill job due to lack of memory
-bash: line 1: PID Killed                  /var/spool/pbs/mom_priv/jobs/JOB_ID.SC
Cgroup mem limit exceeded: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=_JOB_ID,mems_allowed=0,oom_memcg=/pbs_jobs.service/jobid/JOB_ID,task_memcg=/pbs_jobs.service/jobid/JOB_ID,task=JOB_ID,pid=PID,uid=UID

Kod svakog posla ova poruka bit će malo drugačija, jer sadrži podatke kao što su UID (jedinstvena brojčana oznaka korisnika), PID( brojčana oznaka procesa koji je ubijenFor each job, this message will be slightly different, because it contains information such as UID (Unique Numeric Identification of the User), PID (Numeric Identification of the process that was killed), JOB_ID (Job ID posla koji dodijeljuje assigned by PBS).

Dodjeljivanje po traženom chunku

...

Neke od korištenijih opcija su:

-E Grupira poslove prema poslužitelju i prikazuje poslove poredane prema uzlaznom ID-u. Kada se qstat prikaže s popisom poslova, poslovi su grupirani po poslužitelju i svaka grupa je prikazana uzlaznim ID-om. Ova opcija također poboljšava performanse qstata.
-t Prikazuje informacije o statusu za poslove, nizove poslova i podposlove.
-p Prikaz stupca za Iskorišteno vrijeme zamjenjuje se postotkom obavljenog posla. Za posao niza ovo je postotak završenih podposlova. Za normalan posao, to je postotak iskorištenog dodijeljenog CPU vremena.
-x Prikazuje informacije o statusu za dovršene i premještene poslove uz poslove u čekanju i pokrenute poslove.
-Q Prikazuje status redova u standardnom formatu.
-q Prikazuje status redova u alternativnom formatu.
-f Prikazuje status posla u alternativnom formatu


Primjeri korištenja:

Detaljan prikaz posla:

...