Table of Contents | ||||
---|---|---|---|---|
|
Introduction
To schedule and manage jobs on the Parachute computer cluster, PBS (Portable Batch System) is used, which performs job scheduling within the cluster. Its primary task is the distribution of computer tasks, i.e. batch jobs, among the available computer resources.
...
Tip | ||
---|---|---|
| ||
This method is preferred over multiple submissions (e.g. with a for loop) because:
The environment variables defined by PBS during their execution are:
|
...
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash #PBS -P test example #PBS -e /home/my_directory #PBS -q cpu #PBS -l walltime=00:01:00 #PBS -l select=1:ncpus=10 module load mpi/openmpi-x86_64 mpicc --version |
Osnovni PBS parametri
Opcija | Option argument | The meaning of the option |
-N | name | Setting the job name |
-q | destination | Specifying the job queue and/or server |
-l | resource_list | Specifying the resources required to perform the job |
-M | user_list | Setting up a list of mail recipients |
-m | mail_options | Setting the email notification type |
-o | path/to/desired/directory | Setting the name/path where standard output is saved |
-e | path/to/desired/directory | Setting the name/path where the standard error is saved |
-j | oe | Concatenation of standard output and error in the same file |
-Wgroup_list | project_code | Selection of the project under which the job will be performed |
Options for sending notifications by mail option -m:
a | Mail is sent when the batch system terminates the job |
b | Mail is sent when the job starts executing |
e | The mail is sent when the job is finished |
j | Mail is sent for sub jobs. Must be combined with one or more sub-options a, b or e |
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash #PBS -q cpu #PBS -l walltime=00:01:00 #PBS -l select=1:ncpus=2 #PBS -M <name>@srce.hr,<name2>@srce.hr #PBS -m be echo $PBS_JOBNAME > out echo $PBS_O_HOST |
...
Options for requesting resources with the -l option
-l select=3:ncpus=2 | Requesting 3 chunks with 2 cores each (6 cores in total) |
-l select=1:ncpus=10:mem=20GB | Requesting 1 chunk with 10 cores and 20GB of working memory |
-l ngpus=2 | Requesting 2 gpus |
-l walltime=00:10:00 | Maximum job execution time |
PBS environmental variables
Name | Description |
---|---|
PBS_JOBID | Job identifier provided by PBS when a job is submitted. Created after executing the qsub command |
PBS_JOBNAME | The name of the job provided by the user. The default name is the name of the submitted script |
PBS_NODEFILE | List of work nodes, or processor cores on which the job is executed |
PBS_O_WORKDIR | The working directory in which the job was submitted, i.e. in which qsub command was called |
OMP_NUM_THREADS | An OpenMP variable that PBS exports to the environment, which is equal to the value of the ncpus option from the PBS script header |
NCPUS | Number of cores requested. Matches the value from the ncpus option from the PBS script header |
TMPDIR | Path to temporary directory |
Tip | ||
---|---|---|
| ||
While in PBS the path for the output and error files is specified in the directory in which they are executed, the input and output files of the program itself are loaded/saved in the $HOME directory by default. PBS does not have an option to specify the job execution in the current directory we are in, so it is necessary to change the directory manually. If you want to switch to the directory where the script was started, after the header you have to write:
If you want to run jobs with high storage load (I/O intensive) job execution is not recommended to run from $PBS_O_WORKDIR-a but from $TMPDIR location, which will use fast storage. Read more about using fast storage and temporary results below. |
...
PBS makes it possible to define the necessary resources in several ways. The main unit for resource allocation is the so-called "Chunk" or piece of node. A chunk is defined with the select option. The number of processor cores per chunk can be defined with ncpus, the number of mpi processes with mpiprocs and the amount of working memory with mem. It is also possible to define walltime (maximum job execution time) and place (chunk allocation method by nodes).
If some of the parameters are not defined, the default value will be used:
Parameter | Default value |
---|---|
select | 1 |
ncpus | 1 |
mpiprocs | 1 |
mem | 3500 MB |
walltime | 48:00:00 |
place | pack |
Memory control using cgroups
...
For each job, this message will be slightly different, because it contains information such as UID (Unique Numeric Identification of the User), PID (Numeric Identification of the process that was killed), JOB_ID (Job ID assigned by PBS).
Dodjeljivanje po traženom chunku
Allocation per requested chunk
Examples:
The user requests two chunks, each of which consists of 10 processor cores and 10GB of RAM, with the fact that the user did not specify how many nodes the system will optimize the allocation to. In this case, the user will get 20 processor cores and 20 GB of working memoryPrimjeri:
Korisnik traži dva chunka od kojih se svaki sastoji od 10 procesorskih jezgara i 10GB RAM-a, s time da korisnik nije specificirao na koliko čvorova već će sustav optimizirati dodijelu. U ovom slučaju korisnik će na korištenje dobiti 20 procesorskih jezgara i 20 GB radne memorije.
Code Block | ||||
---|---|---|---|---|
| ||||
#PBS -l select=2:ncpus=10:mem=10GB |
Korisnik traži 10 chunkova od kojih se svaki sastoji od jedne procesorske jezgre i 1 GB RAM-a, ali s uvjetom na jednom čvoru, pa će korisnik dobiti ukupno 10 procesorskih jezgara i 10 GB RAM-aThe user requests 10 chunks each consisting of one processor core and 1 GB of RAM on one node so the user will get a total of 10 processor cores and 10 GB of RAM.
Code Block | ||||
---|---|---|---|---|
| ||||
#PBS -l select=10:ncpus=1:mem=1GB:place=pack |
U gornjim primjerima poslovi su definirani kroz količinu chunkova, jezgara i memorije, ali sustav dozvoljava da se poslovima dodjeljuju resursi ako oni nisu zatraženi (default resursiIn the above examples, jobs are defined by the amount of chunks, cores and memory, but the system allows resources to be assigned to jobs if they are not requested (default resources):
Code Block | ||||
---|---|---|---|---|
| ||||
#PBS -l ncpus=4 #PBS -l mem=14GB |
U ovom slučaju korisnik dobiva 4 procesorske jezgre i ukupno 14GB memorije na jednom chunku. Kad se poslovi opisuju bez opcije select, nije moguće "ulančavanje resursa" (odvajanje traženih resursa dvotočkom, potrebno je za svaki resurs staviti u novi red -l opcijuIn this case, the user gets 4 processor cores and a total of 14GB of memory on one chunk. When jobs are described without the select option, it is not possible to "chain resources" (separate the required resources with a colon, it is necessary to put the -l option on a new line for each resource)
Tip | ||
---|---|---|
| ||
Ako definirate poslove koristeći ncpus bez opcije select, poželjno je definirati i količinu memorije, jer će u suprotnom dostupna radna memorija iznositi 3500 MB. |
Spremanje privremenih rezultata
Za spremanje privremenih rezultata koji se generiraju tijekom izvođenja može se koristiti $TMPDIR direktorij umjesto $HOME direktorija. Korištenjem $TMPDIR-a iskorištava se brzo spremište (BeeGFS-fast) rezervirano za pohranu privremenih datoteka.
| |
If you define jobs using ncpus without the select option, it is preferable to define the amount of memory, because otherwise the available working memory will be 3500 MB. |
Saving temporary results
A $TMPDIR directory can be used to store temporary results generated at runtime instead of $HOME directory. Using $TMPDIR-a takes advantage of the fast storage (BeeGFS-fast) reserved for storing temporary files.
PBS creates a temporary directory for each individual job at the address stored in the $TMPDIR variable PBS za svaki pojedini posao kreira privremeni direktorij na adresi pohranjenoj u varijabli $TMPDIR (/beegfs-fast/scratch/<jobID>).
Warning |
---|
Privremeni direktorij se briše automatski po završetku izvođenja posla! |
Primjeri korištenja
The temporary directory is automatically deleted when the job is done! |
Usage examples
- Example of simple use of $TMPDIR variablePrimjer jednostavnog korištenja $TMPDIR varijable:
Code Block #!/bin/bash #PBS -q cpu #PBS -l walltime=00:00:05 cd $TMPDIR pwd > test cp test $PBS_O_WORKDIR
- Primjer kopiranja ulaznih podataka u $TMPDIR, pokretanje aplikacije, i kopiranje u radni direktorijAn example of copying the input data to $TMPDIR, running the application, and copying it to the working directory:
Code Block #!/bin/bash #PBS -q cpu #PBS -l walltime=00:00:05 # StvaranjaCreating directories direktorijafor zainput ulaznedata podatkein ua privremenomtemporary direktorijudirectory mkdir -p $TMPDIR/data # KopiratiCopy sveall potrebnerequired inputeinputs to ua privremenitemporary direktorijdirectory cp -r $HOME/data/* $TMPDIR/data # Pokrenuti aplikaciju i preusmjeriti outpute u "aktualni" (privremeni) direktorij Run the application and redirect the outputs to the "current" (temporary) directory cd $TMPDIR <izvršna<application naredbaexecutable aplikacije>command> 1>output.log 2>error.log # KopiratiCopy željenidesired output uto radniworking direktorijdirectory cp -r /$TMPDIR/output $PBS_O_WORKDIR
...
Parallel jobs
OpenMP
...
parallelization
If your application uses parallelization exclusively at the level of OpenMP threads and cannot expand beyond one worker node (that is, it works with shared memory), you can call the job as shown in the xTB application example below.
Tip |
---|
OpenMP applications require the definition of the |
Ako Vaša aplikacija koristi paralelizaciju isključivo na razini OpenMP dretvi (engl. threads) i ne može se širiti van jednog radnog čvora (odnosno radi s dijeljenom memorijom), posao možete pozvati na način kako je prikazano u primjeru xTB aplikacije niže.
Tip |
---|
OpenMP aplikacije zahtjevaju definiranje varijable PBS sustav vodi računa o tome umjesto Vas, te joj pridružuje vrijednost varijable The PBS system takes care of this for you, and associates it with the value of the ncpus variable, defined in the header of the PBS script. If you define jobs using ncpus without the select option, it is preferable to define the amount of memory as well, because otherwise the available working memory will be 3500 MB Ako definirate poslove koristeći ncpus bez opcije select, poželjno je definirati i količinu memorije, jer će u suprotnom dostupna radna memorija iznositi 3500 MB (select x mem → 1 x 3500 MB). |
...
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash #PBS -q cpu #PBS -l walltime=10:00:00 #PBS -l ncpus=8:mem=28GB cd ${PBS_O_WORKDIR} xtb C2H4BrCl.xyz --chrg 0 --uhf 0 --opt vtight |
MPI paralelizacija
Ako Vaša aplikacija koristi paralelizaciju isključivo na razini MPI procesa i može se širiti van jednog radnog čvora (odnosno radi s raspodijeljenom memorijom), posao možete pozvati na način kako je prikazano u primjeru Quantum ESPRESSO aplikacije niže. Za izvođenje aplikacija koje koriste paralelizaciju MPI (ili hibridno MPI+OMP) potrebno je učitati mpi modul prije pozivanja naredbe mpiexec ili mpirun.
MPI parallelization
If your application uses parallelization exclusively at the MPI process level and can extend beyond a single worker node (that is, it works with distributed memory), you can call the job as shown in the Quantum ESPRESSO application example below. To run applications using MPI (or hybrid MPI+OMP) parallelization, the mpi module must be loaded before calling mpiexec or mpirun.
Tip |
---|
The value of the variable |
Tip |
Vrijednost varijable |
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash #PBS -q cpu #PBS -l walltime=10:00:00 #PBS -l select=16 module load mpi/openmpi-x86_64 cd ${PBS_O_WORKDIR} mpiexec pw.x -i calcite.in |
MPI + OpenMP (
...
hybrid)
...
parallelization
If your application can be parallelized hybridly, i.e. divide its MPI processes into OpenMP threads, you can call the job as shown in the GROMACS application example belowAko se Vaša aplikacije može paralelizirati hibridno, odnosno dijeliti svoje MPI procese u OpenMP threadove, možete posao pozvati na način kako je prikazano u primjeru GROMACS aplikacije niže:
Tip |
---|
OpenMP aplikacije zahtijevaju definiranje varijable applications require the variable The value of the variable |
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash #PBS -q cpu #PBS -l walltime=10:00:00 #PBS -l select=8:ncpus=4:mem=14GB module load mpi/openmpi-x86_64 cd ${PBS_O_WORKDIR} mpiexec -d ${OMP_NUM_THREADS} --cpu-bind depth gmx mdrun -v -deffnm md |
Praćenje i upravljanje izvođenja posla
Praćenje posla
Monitoring and management of job performance
Job monitoring
The PBS command qstat is used to display the status of jobs. Command syntax isZa prikaz stanja poslova koristi se PBS-ova naredba qstat. Osnovna sintaksa naredbe je:
Code Block | ||
---|---|---|
| ||
qstat <opcije><options> <ID<job_posla>ID> |
Izvršavanjem naredbe qstat bez dodatnih opcija dobiva se ispis svih trenuthi poslova svih korisnikaExecuting the qstat command without additional options displays a printout of all current jobs of all users:
Code Block | ||
---|---|---|
| ||
Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 111.admin mpi+omp_s kmrkalj 00:36:09 R cpu |
Neke od korištenijih opcija suSome of the more frequently used options are:
-E Grupira poslove prema poslužitelju i prikazuje poslove poredane prema uzlaznom ID-u. Kada se qstat prikaže s popisom poslova, poslovi su grupirani po poslužitelju i svaka grupa je prikazana uzlaznim ID-om. Ova opcija također poboljšava performanse qstata. | |
-t | Prikazuje informacije o statusu za poslove, nizove poslova i podposlove. | -p | Prikaz stupca za Iskorišteno vrijeme zamjenjuje se postotkom obavljenog posla. Za posao niza ovo je postotak završenih podposlova. Za normalan posao, to je postotak iskorištenog dodijeljenog CPU vremena.
-x | Prikazuje informacije o statusu za dovršene i premještene poslove uz poslove u čekanju i pokrenute poslove. |
-Q | Prikazuje status redova u standardnom formatu. |
-q | Prikazuje status redova u alternativnom formatu. |
-f | Prikazuje status posla u alternativnom formatu |
Primjeri korištenja:
...
Groups jobs by server and displays jobs sorted by ascending ID. When qstat is displayed with a list of jobs, the jobs are grouped by server and each group is displayed by ascending ID. This option also improves the performance of qstat. | |
-t | Displays status information for jobs, jobs array, and subjobs. |
-p | The display of the Time Used column is replaced by the percentage of work done. For a job arrays, this is the percentage of subjobs completed. For normal job, this is a percentage of the allocated CPU time used. |
-x | Displays status information for completed and moved jobs in addition to pending and running jobs. |
-Q | Shows queue status in standard format. |
-q | Displays queue status in an alternative format. |
-f | Displays job status in an alternative format |
Usage examples:
Detailed job description:
Code Block | ||
---|---|---|
| ||
qstat -fxw 2648 |
Tracejob naredba vadi i prikazuje log poruke za PBS posao po kronološkom reduThe tracejob command extracts and displays log messages for a PBS job in chronological order.
Code Block | ||
---|---|---|
| ||
tracejob <ID<job_posla>ID> |
PrimjerExample:
Code Block | ||
---|---|---|
| ||
$ tracejob 111 Job: 111.admin 03/30/2023 11:23:24 L Considering job to run 03/30/2023 11:23:24 S Job Queued at request of mhrzenja@node034, owner = mhrzenja@node034, job name = mapping, queue = cpu 03/30/2023 11:23:24 S Job Run at request of Scheduler@node034 on exec_vnode (node034:ncpus=40:mem=104857600kb) 03/30/2023 11:23:24 L Job run 03/30/2023 11:23:24 S enqueuing into cpu, state Q hop 1 03/30/2023 11:23:56 S Holds u set at request of mhrzenja@node034 03/30/2023 11:24:22 S Holds u released at request of mhrzenja@node034 |
Upravljanje poslovima
Poslom se može upravljati i nakon pokretanja.
Job management
The job can be managed even after it has started.
While the job is in the queue, it is possible to temporarily stop its execution with the commandDok je posao u redu čekanja, moguće je privremeno zaustaviti njegovo izvršavanje naredbom:
Code Block | ||
---|---|---|
| ||
qhold <ID<job_posla>ID> |
To return to the queueVraćanje natrag na red čekanja:
Code Block | ||
---|---|---|
| ||
qrls <ID<job_posla>ID> |
The job is completely stopped or unqueued with the commandPosao se u potpunosti zaustavlja ili miče iz reda čekanja naredbom:
Code Block |
---|
qdel <ID<job_posla>ID> |
Force stop should be used for stuck jobsZa zaglavljene poslove treba koristiti prisilno zaustavljanje:
Code Block | ||
---|---|---|
| ||
qdel -W force -x <ID<job_posla> |
Odgađanje izvođenja
PBS pruža mogućnost izvođenja poslova u ovisnosti o drugima, što je korisno u slučajevima poput:
- izvršavanje poslova ovisi o izlazu ili stanju prethodno izvršenog
- aplikacija zahtijeva sekvencijalno izvođenje raznih komponenata
- ispis podataka jednog posla može ugroziti izvođenje drugog
ID> |
Delay of execution
PBS provides the ability to perform jobs in dependence on others, which is useful in cases such as:
- the execution of jobs depends on the output or state of the previously executed
- the application requires the sequential execution of various components
- printing data from one job may compromise the execution of another
The directive that enables this functionality when submitting a job immediately isDirektiva koja omogućuje ovu funkcionalnost pri trenutnom podnošenju posla je:
Code Block | ||
---|---|---|
| ||
qsub -W depend=<tip><type>:<ID<job_posla>ID>[:<ID<job_posla>ID>] ... |
Gdje < tip>
može bitiWhere < type>
can be:
-
after*
- pokretanje trenutnog s obzirom na ostale -
after
- izvršavanje trenutnog nakon početka izvršavanja navedenih -
afterok
- izvršavanje trenutnog nakon uspješnog završetka navedenih -
afterany
- izvršavanje trenutnog nakon završetka navedenih -
on:<broj>
- izvršavanje posla koji će ovisiti o naknadno navedenom brojubefore*
tipa poslova
afternotok
-izvršavanje trenutnognakon greške u završetku navedenihbefore*
- pokretanje ostalih s obzirom na trenutni before
- pokretanjenavedenih nakon početka trenutnogbeforeok
- pokretanjenavedenih nakon usprešnog završetka trenutnogbeforenotok
- pokretanjenavedenih nakon greške u izvršavanju trenutnog beforeany
- pokretanjenavedenih nakon završetka trenutnogNote |
---|
Posao s direktivom |
Primjeri
- starting the current one with respect to the others
-
after
- execution of the current one after the start of execution of the specified ones -
afterok
- execution of the current one after the successful completion of the specified ones -
afternotok
-execution of the current after an error in the completion of the specified -
afterany
- execution of the current one after the end of the specified ones
-
-
before*
- starting the others with respect to the current one-
before
- execution of the specified ones after the start of the current one -
beforeok
- execution of the specified ones after the successful completion of the current one -
beforenotok
- execution of the specified ones after the an error in the completion of the current one -
beforeany
- execution of the specified ones after the end of the current one
-
-
on:<number>
- execution of a job that will depend on the subsequently specified number ofbefore*
types of jobs
Note |
---|
A job with a directive -W depend=... will not be submitted if the specified job IDs do not exist (or if they are not in a queue) |
Usage examples:
If we want posao1
to start after successful completion of Ako želimo da posao1
započne nakon uspješnog završetka posao0
:
Code Block |
---|
[korisnik@padobran] $ qsub posao0 1000.admin [korisnik@padobran] $ qsub -W depend=afterok:1000 posao1 1001.admin [korisnik@padobran] $ qstat 1000 1001 Job id Name User Time Use S Queue --------------------- ---------------- ---------------- -------- - ----- 1000.admin posao0 korisnik 00:00:00 R cpu 1001.admin posao1 korisnik 0 H cpu |
Ako želimo da posao0
započne tek nakon uspješnog završetka If we want posao0
to start after successful completion of posao1
:
Code Block |
---|
[korisnik@padobran] $ qsub -W depend=on:1 posao0 1002.admin [korisnik@padobran] $ qsub -W depend=beforeok:1002 posao1 1003.admin [korisnik@padobran] $ qstat 1002 1003 Job id Name User Time Use S Queue --------------------- ---------------- ---------------- -------- - ----- 1002.admin posao0 korisnik 0 H cpu 1003.admin posao1 korisnik 00:00:00 R cpu |
...