Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

More information about the parallel environments available on Isabella can be found on this page : Redovi poslova i paralelne okoline.


Info

Since the user does not need to know in advance how many processor cores will be allocated, SGE sets the value of the $NSLOTS variable to the number of allocated cores.


Running parallel jobs itself is specific because there are tools for running sub-jobs (eg. mpirun) that do the scheduling of sub-jobs on nodes themselves. When SGE assigns nodes to a parallel job, it saves the list of nodes in the $TMPDIR/machines file which is passed as a parameter to the parallel jobs (eg mpirun, mpiexec, pvm...) inside the job description script.


An example of a start script for starting one type of parallel job:

Code Block
languagebash
#!/bin/bash

#$ -N parallel-job
#$ -cwd
#$ -pe *mpi 14

mpirun_rsh -np $NSLOTS -hostfile $TMPDIR/machines ...


Job arrays

SGE enables multiple starting of the same job, the so-called job arrays. Sub-jobs within an array are called tasks and each task gets its own identifier. When running jobs, the user specifies a range of job identifier values ​​using the -t parameter:

Code Block
#$ -t <start>:<end>:<step>


The value <start> is the identifier of the first task, <end> the identifier of the last task, and <step> is the value by which each subsequent identifier between <start> and <end> is incremented. SGE stores the identifier of each task in the variable $SGE_TASK_ID, with which users can assign different parameters to a particular task. Tasks can be serial or parallel jobs.

Examples of use

  1. Example script to run a job array of 10 batch jobs:

    Code Block
    languagebash
    #!/bin/bash
    
    #$ -N job_array_serial
    #$ -cwd
    #$ -o output/
    #$ -j y
    #$ -t 1-10
    
    ./starSeeker starCluster.$SGE_TASK_ID


  2. Example script to run a job array of 10 parallel jobs:

    Code Block
    languagebash
    #!/bin/bash
    
    #$ -N job_array_parallel
    #$ -cwd
    #$ -o output/
    #$ -j y
    #$ -t 1-10
    
    mpiexec -machinefile $TMPDIR/machines ./starseeker starCluster.$SGE_TASK_ID


Interactive jobs

SGE enables the launch of interactive jobs. The qrsh command is used to run jobs interactively.

It is recommended to use this form of jobs in the case when it is necessary to compile or debug applications on nodes.

Unlike using ssh, this lets SGE know that the nodes are busy and not run other jobs on them. When executing the command interactively, it is necessary to specify the full path to the command. If the SGE currently has no free resources and the job is to be left waiting in the queue, it is necessary to specify the "-now n" parameter. Otherwise, SGE will immediately end the job execution with the message:

Code Block
languagetext
Your "qrsh" request could not be scheduled, try again later.

Examples of use:

  1. Direct access to the command line of the test node:

    Code Block
    languagetext
    qrsh


  2. Interactive command execution:


    Code Block
    languagetext
    qrsh /home/user/moja_skripta



  3. Interactive application execution with graphical interface:

    Code Block
    languagetext
    qrsh -DISPLAY=10.1.1.1:0.0 <moja_skripta>


Advanced job descriptions

Saving temporary results

It is not recommended to use the $HOME directory to save temporary results generated during job execution. This reduces the efficiency of the application and burdens the front end and the cluster network.

SGE creates a directory on the disk on the work nodes (/scratch) for each individual job, of the form /scratch/<jobID>.<taskID>.<queue>. The address of this directory is saved by SGE in the variable $TMPDIR.

Info

For higher execution speed, using a temporary directory on the /scratch disk is also recommended for jobs that often require random access to data on disk, such as TensorFlow and PyTorch jobs.


Warning

If there are indications that the created temporary files will exceed 500 GB, the temporary data should be saved to the /shared disk (described below).


Warning

The temporary directory on the /scratch disk is deleted automatically at the end of the job execution.


Examples of use:

  1. An example of simple use of the $TMPDIR variable:

    Code Block
    languagebash
    #!/bin/bash 
    
    #$ -N scratch_1 
    #$ -cwd 
    #$ -o output/scratch.out 
    #$ -j y
    #$ -l scratch=50 
    
    cd $TMPDIR 
    pwd > test 
    cp test $SGE_O_WORKDIR


  2. An example of copying data to a scratch disk:

    Code Block
    languagebash
    #!/bin/bash 
    
    #$ -N scratch_2 
    #$ -cwd 
    #$ -o output/scratch.out 
    #$ -j y
    #$ -l scratch=50 
    
    mkdir -p $TMPDIR/data 
    cp -r $HOME/data/* $TMPDIR/data 
    
    python3.5 main.py $TMPDIR/data


If the temporary data exceeds 500 GB, it is necessary to use /shared. Unlike scratch, the directory on shared must be created manually and there is no automatic directory removal.

Example of use:

Code Block
languagebash
#!/bin/bash 

#$ -N shared
#$ -cwd 
#$ -o output/shared.out 
#$ -j y 

mkdir -p /shared/$USER/$TMPDIR
cd /shared/$USER/$TMPDIR

pwd > test
cp test $SGE_O_WORKDIR


Resources

When starting jobs, the user can describe in more detail which conditions must be met for the job. For example, it is possible to require only a certain architecture of the worker node, the amount of memory or the execution time. Specifying required resources allows for better job scheduling and gives jobs a higher priority (more on the Priorities page on Isabella).

The required resources are specified using the -l parameter:

Code Block
#$ -l <resource>=<value>

Resources which define job requirements:

Code Block
arch : node architecture (eg. lx26-x86, lx25-amd64)
hostname : node address (eg. c4140-01.isabella)

Resources that place real limits on jobs:

Code Block
vmem : amount of virtual memory (format: <num>K|M|G)
rss : amount of real memory
stack : stack size
data : total amount of memory (without stack)
fsize : total file size
cput : processor time (format: [<hours>:<min>:]<sec>)
rt : real time
scratch : space on the scratch disk expressed in GB


Note

The values ​​of these resources should be carefully defined (eg take 50% higher values ​​than expected). In case of exceeding, the job will be stopped with the "segmentation fault" signal.


Note

Values ​​cannot be changed for active jobs.


Info

The resource values ​​defined in the job start script are set per process. For example, if a user on one node requires 3 processor cores, the values ​​of all requested resources will be multiplied by 3.

Example of use:

  1. Example of a job that requires 20 CPU cores and 10 GB of RAM per process (the job will be allocated a total of 200 GB of RAM):

    Code Block
    #$ -pe *mpi 20
    #$ -l memory=10


  2. The job requires 100 GB of scratch space:

    Code Block
    #$ -pe *mpisingle 4
    #$ -l scratch=25