„Debrecen2 GPU klaszter en” változatai közötti eltérés

Innen: KIFÜ Wiki
(2 közbenső módosítás ugyanattól a szerkesztőtől nincs mutatva)
157. sor: 157. sor:
 
==== SLURM warnings ====
 
==== SLURM warnings ====
 
<pre>
 
<pre>
Resources/AssociationResourceLimit - Erőforrásra vár
+
Resources / AssociationResourceLimit - Waiting for a resource
AssociationJobLimit/QOSJobLimit - Nincs elég CPU idő vagy a maximális CPU szám le van foglalva
+
AssociationJobLimit / QOSJobLimit - Not enough CPU time or maximum CPU number is reserved
Piority - Alacsony prioritás miatt várakozik
+
Priority - Waiting due to low priority
 +
 
 
</pre>
 
</pre>
Az utóbbi esetben, csökkenteni kell a job által lefoglalni kívánt időt. Egy adott projekt részére maximálisan 512 CPU-n futhatnak jobok egy adott időben.
+
In the latter case, the time to be reserved by the job must be reduced. Jobs for a given project can run on up to 512 CPUs at a given time.
  
==== Licenszek ellenőrzése ====
+
==== Checking licenses ====
 
Az elérhető és éppen használt licenszekről a következő parancs ad információt:
 
Az elérhető és éppen használt licenszekről a következő parancs ad információt:
 
<pre>
 
<pre>
169. sor: 170. sor:
 
</pre>
 
</pre>
  
==== Karbantartás ellenőrzése ====
+
==== Checking maintenance ====
A karbantartási időablakban az ütemező nem indít új jobokat, de beküldeni lehet. A karbantartások időpontjairól a következő parancs ad tájékoztatást:
+
In the maintenance time window, the scheduler does not start new jobs, but jobs could still be submitted. The following command provides information on maintenance dates:
 
<pre>
 
<pre>
 
sreservations
 
sreservations
 
</pre>
 
</pre>
  
==== Összesített felhasználás ====
+
==== Aggregate consumption ====
Egy hónapra visszamenőleg az elfogyasztott CPU perceket a következő paranccsal kérhetjük le:
+
You can retrieve the CPU minutes consumed up to one month ago with the following command:
 
<pre>
 
<pre>
 
susage
 
susage
 
</pre>
 
</pre>
  
==== Teljes fogyasztás ====
+
==== Total consumption ====
Ha szeretnénk tájékozódni arról, hogy egy bizony idő óta mennyi a CPU idő felhasználásunk akkor azt ezzel paranccsal tudjuk lekérdezni:
+
If you want to know how much CPU time you have been using for a certain period, you can query it with this command:
 
<pre>
 
<pre>
 
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01
 
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01
 
</pre>
 
</pre>
  
=== Feladatok futtatása ===
+
=== Submitting jobs ===
Alkalmazások futtatása a szupergépeken kötegelt (batch) üzemmódban lehetséges. Ez azt jelenti, hogy minden futtatáshoz egy job szkriptet kell elkészíteni, amely tartalmazza az igényelt erőforrások leírását és a futtatáshoz szükséges parancsokat. Az ütemező paramétereit (erőforrás igények) a <code>#SBATCH</code> direktívával kell megadni.
+
It is possible to run applications on supercomputers in batch mode. This means that for each run, a job script must be created that includes a description of the resources required and the commands required to run. Scheduler parameters (resource requirements) must be specified with the <code>#SBATCH</code> directive.
  
==== Kötelező paraméterek ====
+
==== Mandatory parameters ====
A következő paramétereket minden esetben meg kell adni:
+
The following parameters must be specified in each case:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
198. sor: 199. sor:
 
#SBATCH --time=TIME
 
#SBATCH --time=TIME
 
</pre>
 
</pre>
ahol az <code>ACCOUNT</code> a terhelendő számla neve (elérhető számláinkről az <code>sbalance</code> parancs ad felvilágosítást), a <code>NAME</code> a job rövid neve, a <code>TIME</code> pedig a maximális walltime idő (<code>DD-HH:MM:SS</code>). A következő időformátumok használhatók:  
+
where <code>ACCOUNT</code> is the name of the account to be charged (your available accounts are indicated by the sbalance command), <code>NAME</code> is the short name of the job, and <code>TIME</code> is the maximum wall clock time (DD-HH:MM:SS).  
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" és "days-hours:minutes:seconds".
+
The following time formats can be used:
 +
 
 +
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
  
==== GPU-k lefoglalása ====
+
==== Reservation of GPUs ====
A GPU-k lefoglalása a következő direktívával törénik:
+
GPUs are reserved using the following directive:
 
<pre>
 
<pre>
 
#SBATCH --gres=gpu:N
 
#SBATCH --gres=gpu:N
 
</pre>
 
</pre>
Az <code>N</code> a GPU-k/node számát adja meg, ami 1, 2 és 3 lehet maximum.
+
<code>N</code> specifies the number of GPUs / node, which can be 1, 2, and a maximum of 3.
  
==== Interaktív használat ====
+
==== Interactive use ====
Rövid interaktív feladatokat az 'srun' paranccsal tudunk beküldeni, pl.:
+
You can submit short interactive jobs with the 'srun' command, e.g.
 
<pre>
 
<pre>
 
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP
 
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP
 
</pre>
 
</pre>
  
==== Batch job-ok indítása ====
+
==== Submitting batch jobs ====
A jobok feladását a következő parancs végzi:
+
To submit jobs use the following command:
 
<pre>
 
<pre>
 
sbatch slurm.sh
 
sbatch slurm.sh
 
</pre>
 
</pre>
  
Sikeres feladás esetén a következő kimenetet kapjuk:
+
On successful submission you get the following output:
 
<pre>
 
<pre>
 
Submitted batch job JOBID
 
Submitted batch job JOBID
226. sor: 229. sor:
 
ahol a <code>JOBID</code> a feladat egyedi azonosítószáma.
 
ahol a <code>JOBID</code> a feladat egyedi azonosítószáma.
  
A feladat leállítását a következő parancs végzi:
+
The following command stops the job:
 
<pre>
 
<pre>
 
scancel JOBID
 
scancel JOBID
 
</pre>
 
</pre>
  
==== Nem újrainduló jobok ====
+
==== Non-restarting jobs ====
Nem újrainduló jobokhoz a következő direktívát kell használni:
+
For non-restarting jobs, the following directive should be used:
 
<pre>
 
<pre>
 
#SBATCH --no-requeue
 
#SBATCH --no-requeue
 
</pre>
 
</pre>
  
==== Feladat sorok ====
+
==== Partitions ====
A szupergépen két, egymást nem átfedő, sor (partíció) áll rendelkezésre, a <code>prod-gpu-k40</code> sor és a <code>prod-gpu-k20</code> sor. Mind a kettő éles számolásokra való, az első olyan CN gépeket tartalmaz amikben Nvidia K40x GPU-k, a másodikban pedig Nvidia K20x GPU-k vannak. Az alapértelmezett sor a <code> prod-gpu-k20</code>. A prod-gpu-k40 partíciót a következő direktívával lehet kiválasztani:
+
There are two non-overlapping queues (partitions) on the supercomputer: the <code>prod-gpu-k40</code> queue and the <code>prod-gpu-k20</code> queue. Both are for production purposes, the first featuring CN machines with Nvidia K40x GPUs and the second with Nvidia K20x GPUs. The default queue is <code> prod-gpu-k20</code>. The prod-gpu-k40 partition can be selected with the following directive:
 
<pre>
 
<pre>
 
#SBATCH --partition=prod-gpu-k40
 
#SBATCH --partition=prod-gpu-k40
 
</pre>
 
</pre>
  
==== A szolgáltatás minősége (QOS) ====
+
==== Quality of Service (QoS) ====
A szolgáltatást alapértelmezett minősége <code>normal</code>, azaz nem megszakítható a futás.
+
The default quality of the service is <code>normal</code>, i.e. it cannot be interrupted.
  
===== Magas prioritás =====
+
===== High priority =====
A magas prioritású jobok maximum 24 óráig futhatnak, és kétszer gyorsabb időelszámolással rendelkeznek, cserébe az ütemező előreveszi ezeket a feladatokat.
+
High-priority jobs can run for up to 24 hours and are charged for twice the time in return for prioritizing these jobs.
 
<pre>
 
<pre>
 
#SBATCH --qos=fast
 
#SBATCH --qos=fast
 
</pre>
 
</pre>
  
===== Alacsony prioritás =====
+
===== Low priority =====
Lehetőség van alacsony prioritású jobok feladására is. Az ilyen feladatokat bármilyen normál prioritású job bármikor megszakíthatja, cserébe az elhasznált gépidő fele számlázódik csak. A megszakított jobok automatikusan újraütemeződnek. Fontos, hogy olyan feladatokat indítsunk alacsony prioritással, amelyek kibírják a véletlenszerű megszakításokat, rendszeresen elmentik az állapotukat (checkpoint) és ebből gyorsan újra tudnak indulni.
+
It is also possible to post low-priority jobs. Such jobs can be interrupted at any time by any normal priority job, in exchange for being charged for only half the machine time spent. Interrupted jobs are automatically rescheduled. Only submit jobs with low priority that can withstand random interruptions and save their status regularly (checkpoint) so that they could be quickly restarted.
 
<pre>
 
<pre>
 
#SBATCH --qos=lowpri
 
#SBATCH --qos=lowpri
 
</pre>
 
</pre>
  
==== Memória foglalás ====
+
==== Memory allocation ====
Alapértelmezetten 1 CPU core-hoz 1000 MB memória van rendelve, ennél többet a következő direktívával igényelhetünk:
+
By default, 1 CPU core is assigned 1000 MB of memory but more can be requested with the following directive:
 
<pre>
 
<pre>
 
#SBATCH --mem-per-cpu=MEMORY
 
#SBATCH --mem-per-cpu=MEMORY
 
</pre>
 
</pre>
ahol <code>MEMORY</code> MB egységben van megadva. A maximális memória/core 7800 MB lehet.
+
where <code>MEMORY</code> is specified in MB. The maximum memory / core can be 7800 MB.
  
==== Email értesítés ====
+
==== Email notification ====
Levél küldése job állapotának változásakor (elindulás,leállás,hiba):
+
Send mail when job status changes (start, stop, error):
 
<pre>
 
<pre>
 
#SBATCH --mail-type=ALL
 
#SBATCH --mail-type=ALL
 
#SBATCH --mail-user=EMAIL
 
#SBATCH --mail-user=EMAIL
 
</pre>
 
</pre>
ahol az <code>EMAIL</code> az értesítendő emial cím.
+
where <code>EMAIL</code> is the email address to be notified.
  
==== Tömbfeladatok (arrayjob) ====
+
==== Arrayjobs ====
Tömbfeladatokra akkor van szükségünk, egy szálon futó (soros) alkalmazást szeretnénk egyszerre sok példányban (más-más adatokkal) futtatni. A példányok számára az ütemező a <code>SLURM_ARRAY_TASK_ID</code> környezeti változóban tárolja az egyedi azonosítót. Ennek lekérdezésével lehet az arrayjob szálait elkülöníteni. A szálak kimenetei a <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> fájlokba íródnak. Az ütemező a feltöltést szoros pakolás szerint végzi. Ebben az esetben is érdemes a processzorszám többszörösének választani a szálak számát. [http://slurm.schedmd.com/job_array.html Bővebb ismertető]
 
  
 +
Arrayjobs are needed when a single threaded (serial) application  is to be run in many instances (with different parameters) at once. For instances, the scheduler stores the unique identifier in the <code>SLURM_ARRAY_TASK_ID</code> environment variable. By querying this, the threads of the array job can be separated. The outputs of the threads are written to the <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> files. The scheduler performs the upload according to a tight pack. You may want to select the number of threads as a multiple of the number of processors in this case too. [http://slurm.schedmd.com/job_array.html|More information]
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
285. sor: 288. sor:
 
</pre>
 
</pre>
  
==== OpenMPI feladatok ====
+
==== OpenMPI jobs ====
MPI feladatok esetén meg kell adnunk az egy node-on elinduló MPI processzek számát is (<code>#SBATCH --ntasks-per-node=</code>). A leggyakoribb esetben ez az egy node-ban található CPU core-ok száma. A párhuzamos programot az <code>mpirun</code> paranccsal kell indítani.
+
For MPI jobs, you must also specify the number of MPI processes starting on each node (<code>#SBATCH --ntasks-per-node=</code>). In the most common case this is the number of CPU cores of a single node. The parallel program must be started with the <code>mpirun</code> command.
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
299. sor: 302. sor:
 
OpenMPI FAQ: http://www.open-mpi.org/faq
 
OpenMPI FAQ: http://www.open-mpi.org/faq
  
==== OpenMP (OMP) feladatok ====
+
==== OpenMP (OMP) jobs ====
OpenMP párhuzamos alkalmazásokhoz maximum 1 node-ot lehet lefoglalni. Az OMP szálák számát az <code>OMP_NUM_THREADS</code> környezeti változóval kell megadni. A változót vagy az alkamazás elé kell írni (ld. példa), vagy exportálni kell az indító parancs előtt:
+
 
 +
A maximum of 1 node can be reserved for OpenMP parallel applications. The number of OMP threads must be specified with the <code>OMP_NUM_THREADS</code> environment variable. The variable must either be set before the application (see example) or exported before the start command:
 
<code>
 
<code>
 
export OMP_NUM_THREADS=8
 
export OMP_NUM_THREADS=8
 
</code>
 
</code>
  
A következő példában egy taskhoz 8 CPU core-t rendeltunk, a 8 CPU core-nak egy node-on kell lennie. A CPU core-ok számát a <code>
+
In the following example, we have assigned 8 CPU cores to a task, the 8 CPU cores must be on one node. The number of CPU cores is included in the <code>SLURM_CPUS_PER_TASK</code> variable, and it also sets the number of OMP threads.
SLURM_CPUS_PER_TASK</code> változó tartalmazza, és ez állítja be az OMP szálak számát is.
 
  
Alice felhasználó a foobar számla terhére, maximum 6 órára indít el egy 8 szálas OMP alkalmazást.
+
User Alice launches an 8-thread OMP application at the expense of the foobar account for a maximum of 6 hours.
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
319. sor: 322. sor:
 
</pre>
 
</pre>
  
==== Hibrid MPI-OMP feladatok ====
+
==== Hybrid MPI-OMP jobs ====
Hibrid MPI-OMP módról akkor beszélünk, ha a párhuzamos alkalmazás MPI-t és OMP-t is használ. Érdemes tudni, hogy az Intel MKL-el linkelt programok MKL hívásai OpenMP képesek. Általában a következő elosztás javasolt: az MPI processzek száma 1-től az egy node-ban található CPU foglalatok száma, az OMP szálak ennek megfelelően az egy node-ban található összes CPU core szám vagy annak fele, negyede (értelem szerűen). A jobszkipthez a fenti két mód paramétereit kombinálni kell.
+
We speak of a hybrid MPI-OMP mode when the parallel application uses both MPI and OMP. It is worth noting that MKL calls of programs linked with Intel MKL are OpenMP-capable. In general, the following distribution is recommended: the number of MPI processes from 1 to the number of CPU sockets in one node, the OMP threads to be the total number of CPU core numbers in one node, or half, or quarter (as appropriate). For the job script the parameters of the above two modes must be combined.
  
A következő példában 2 node-ot, és node-onként 1-1 taskot indítunk taskonként 10 szállal. Alice felhasználó a foobar számla terhére, 8 órára, 2 node-ra küldött be egy hibrid jobot. Egy node-on egyszerre csak 1 db MPI processz fut ami node-onként 8 OMP szálat használ. A 2 gépen összesen 2 MPI proceszz és 2 x 8 OMP szál fut.
+
In the following example, we start 2 nodes and 1-1 task per node with 10 threads per task. User Alice submitted a hybrid job to 2 nodes for 8 hours at the expense of the foobar account. Only 1 MPI process runs on one node at a time, which uses 8 OMP threads per node. The 2 machines run a total of 2 MPI processes and 2 x 8 OMP threads.
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash
337. sor: 340. sor:
 
</pre>
 
</pre>
  
==== Maple Grid feladatok ====
+
==== Maple Grid jobs ====
Maple-t az OMP feladatokhoz hasonlóan 1 node-on lehet futtatni. Használatához be kell tölteni a maple modult is. A Maple kliens-szerver üzemmódban működik ezért a Maple feladat futtatása előtt szükség van a grid szerver elindítására is (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>). Ez az alkalmazás licensz köteles, amit a jobszkriptben meg kell adni (<code>#SBATCH --licenses=maplegrid:1</code>). A Maple feladat indátását a <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> paranccsal kell elvégezni.
+
 
 +
Maple can be run on 1 node - like OMP tasks. You must also load the maple module to use it. Maple works in client-server mode so you must also start the grid server (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>) before running the Maple job. This application requires a license, which must be specified in the job script (<code>#SBATCH --licenses=maplegrid:1</code>). The Maple job must be started with the <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> command.
  
Alice felhasználó a foobar számla terhére, 6 órára indítja el a Maple Grid alkalmazást:
+
User Alice starts Maple Grid for 6 hours from the foobar account:
 
<pre>
 
<pre>
 
#!/bin/bash
 
#!/bin/bash

A lap 2021. július 14., 09:25-kori változata

Cluster Debrecen2 (Leo)
Type HP SL250s
Core / node 8 × 2 Xeon E5-2650v2 2.60GHz
GPU / node 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x
# of compute nodes 84
Max Walltime 7-00:00:00
Max core / project 336
Max mem / core 7000 MB

Requesting CPU time


Login

ssh USER@login.debrecen2.hpc.niif.hu

If a non-default key is used, it must be specified with the -i KEY option (SSH and SCP commands).

Copying files with SCP

Download from the HOME directory and upload to the HOME directory:

Up: scp FILE USER@login.debrecen2.hpc.niif.hu: FILE
Down: scp USER@login.debrecen2.hpc.niif.hu: FILE FILE

Data synchronization

Larger files / directory structures shall be synchronized using the following commands

Up: rsync -a -e ssh DIRECTORY USER@login.debrecen2.hpc.niif.hu:/home/USER
Down: rsync -a -e ssh USER@login.debrecen2.hpc.niif.hu:/home/USER/DIRECTORY

The --delete option must be specified to synchronize deleted files.

User interface

               short form of CWD
                     |
    DEBRECEN2[login] ~ (0)$
        |       |       |
   HPC station  |       |
    short machine name  |
               exit code of the previous command

Module environment

The list of available modules is obtained with the following command:

module avail

the list of already loaded modules:

module list

You can load an application with the following command:

module load APP

The environment variables set by KIFÜ are listed by the nce command.

Data sharing for project members

To share files / directories ACLs must be set. To make the HOME directory readable by another user (OTHER):

setfacl -m u:OTHER:rx $HOME

To make a specific directory (DIRECTORY) writable:

setfacl -m u:OTHER:rxw $HOME/DIRECTORY

You can list extended rights with the following command:

getfacl $HOME/DIRECTORY

Using a shared home directory

The common file system that is available for the login nodes of the supercomputers is accessible under the following path:

/mnt/fhgfs/home/$USER

Backups could be made into the shared directory with the following command:

rsync -avuP --delete $HOME/DIRECTORY /mnt/fhgfs/home/$USER

Compiling applications

Users are encouraged to try compiling needed applications in their own home directory first. If it fails for some reason, then the next step is to ask the Hungarian supercomputer users because there is a good chance that others have already run into the same problem. They can be reached at: hpc-forum at listserv.niif.hu. You can subscribe to this mailing list [1]. You should also check the archive when looking into the issue. KIFÜ HPC support has extremely limited capacity to handle individual compiling requests but still you may contact hpc-support at niif.hu with your problem. In the latter case please be patient for a few days while waiting for responses.

Using the SLURM scheduler

The supercomputer has a CPU hour (machine time) based schedule. The following command provides information about the status of the user's Slurm projects (Account):

sbalance

The second column (Usage) shows the machine time spent by each user, and the fourth column shows the total machine time of the account. The last two columns provide information about the maximum (Account Limit) and available machine time.

Scheduler Account Balance
---------- ----------- + ---------------- ----------- + ------------- -----------
User             Usage |          Account       Usage | Account Limit   Available (CPU hrs)
---------- ----------- + ---------------- ----------- + ------------- -----------
bob *                7 |           foobar           7 |         1,000         993
alice                0 |           foobar           7 |         1,000         993

Estimating CPU time

It is advisable to estimate the wall clock time before large-scale (production) runs. To do this, use the following command:

sestimate -N NODES -t WALLTIME

where NODES is the number of nodes to be reserved and WALLTIME is the maximum run time.

It is important to specify the wall clock time you want to reserve as accurately as possible, as the scheduler also ranks the jobs waiting to be run based on this. It is generally true that the shorter job will take place sooner. It is advisable to check the actual run time with the sacct command afterwards.

Status information

The squeue and the sinfo command provide information about the general state of the cluster. Each job submitted is assigned a unique identification number (JOBID). Knowing this, we can ask for more information. Characteristics of the submitted or already running job:

scontrol show job JOBID

Each job is also put into a so-called accounting database. From this you can retrieve the characteristics of the jobs you have run and the statistics of resource usage. You can view detailed statistics with the following command:

sacct -l -j JOBID

The following command provides information about the memory used:

smemory JOBID

The next one shows disk usage:

sdisk JOBID

SLURM warnings

Resources / AssociationResourceLimit - Waiting for a resource
AssociationJobLimit / QOSJobLimit - Not enough CPU time or maximum CPU number is reserved
Priority - Waiting due to low priority

In the latter case, the time to be reserved by the job must be reduced. Jobs for a given project can run on up to 512 CPUs at a given time.

Checking licenses

Az elérhető és éppen használt licenszekről a következő parancs ad információt:

slicenses

Checking maintenance

In the maintenance time window, the scheduler does not start new jobs, but jobs could still be submitted. The following command provides information on maintenance dates:

sreservations

Aggregate consumption

You can retrieve the CPU minutes consumed up to one month ago with the following command:

susage

Total consumption

If you want to know how much CPU time you have been using for a certain period, you can query it with this command:

sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01

Submitting jobs

It is possible to run applications on supercomputers in batch mode. This means that for each run, a job script must be created that includes a description of the resources required and the commands required to run. Scheduler parameters (resource requirements) must be specified with the #SBATCH directive.

Mandatory parameters

The following parameters must be specified in each case:

#!/bin/bash
#SBATCH -A ACCOUNT
#SBATCH --job-name=NAME
#SBATCH --time=TIME

where ACCOUNT is the name of the account to be charged (your available accounts are indicated by the sbalance command), NAME is the short name of the job, and TIME is the maximum wall clock time (DD-HH:MM:SS). The following time formats can be used:

"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

Reservation of GPUs

GPUs are reserved using the following directive:

#SBATCH --gres=gpu:N

N specifies the number of GPUs / node, which can be 1, 2, and a maximum of 3.

Interactive use

You can submit short interactive jobs with the 'srun' command, e.g.

srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP

Submitting batch jobs

To submit jobs use the following command:

sbatch slurm.sh

On successful submission you get the following output:

Submitted batch job JOBID

ahol a JOBID a feladat egyedi azonosítószáma.

The following command stops the job:

scancel JOBID

Non-restarting jobs

For non-restarting jobs, the following directive should be used:

#SBATCH --no-requeue

Partitions

There are two non-overlapping queues (partitions) on the supercomputer: the prod-gpu-k40 queue and the prod-gpu-k20 queue. Both are for production purposes, the first featuring CN machines with Nvidia K40x GPUs and the second with Nvidia K20x GPUs. The default queue is prod-gpu-k20. The prod-gpu-k40 partition can be selected with the following directive:

#SBATCH --partition=prod-gpu-k40

Quality of Service (QoS)

The default quality of the service is normal, i.e. it cannot be interrupted.

High priority

High-priority jobs can run for up to 24 hours and are charged for twice the time in return for prioritizing these jobs.

#SBATCH --qos=fast
Low priority

It is also possible to post low-priority jobs. Such jobs can be interrupted at any time by any normal priority job, in exchange for being charged for only half the machine time spent. Interrupted jobs are automatically rescheduled. Only submit jobs with low priority that can withstand random interruptions and save their status regularly (checkpoint) so that they could be quickly restarted.

#SBATCH --qos=lowpri

Memory allocation

By default, 1 CPU core is assigned 1000 MB of memory but more can be requested with the following directive:

#SBATCH --mem-per-cpu=MEMORY

where MEMORY is specified in MB. The maximum memory / core can be 7800 MB.

Email notification

Send mail when job status changes (start, stop, error):

#SBATCH --mail-type=ALL
#SBATCH --mail-user=EMAIL

where EMAIL is the email address to be notified.

Arrayjobs

Arrayjobs are needed when a single threaded (serial) application is to be run in many instances (with different parameters) at once. For instances, the scheduler stores the unique identifier in the SLURM_ARRAY_TASK_ID environment variable. By querying this, the threads of the array job can be separated. The outputs of the threads are written to the slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out files. The scheduler performs the upload according to a tight pack. You may want to select the number of threads as a multiple of the number of processors in this case too. information

#!/bin/bash
#SBATCH -A ACCOUNT
#SBATCH --job-name=array
#SBATCH --time=24:00:00
#SBATCH --array=1-96
srun envtest.sh

OpenMPI jobs

For MPI jobs, you must also specify the number of MPI processes starting on each node (#SBATCH --ntasks-per-node=). In the most common case this is the number of CPU cores of a single node. The parallel program must be started with the mpirun command.

#!/bin/bash
#SBATCH -A ACCOUNT
#SBATCH --job-name=mpi
#SBATCH -N 2
#SBATCH --ntasks-per-node=8
#SBATCH --time=12:00:00
mpirun --report-pid ${TMPDIR}/mpirun.pid PROGRAM

OpenMPI FAQ: http://www.open-mpi.org/faq

OpenMP (OMP) jobs

A maximum of 1 node can be reserved for OpenMP parallel applications. The number of OMP threads must be specified with the OMP_NUM_THREADS environment variable. The variable must either be set before the application (see example) or exported before the start command: export OMP_NUM_THREADS=8

In the following example, we have assigned 8 CPU cores to a task, the 8 CPU cores must be on one node. The number of CPU cores is included in the SLURM_CPUS_PER_TASK variable, and it also sets the number of OMP threads.

User Alice launches an 8-thread OMP application at the expense of the foobar account for a maximum of 6 hours.

#!/bin/bash
#SBATCH -A foobar
#SBATCH --job-name=omp
#SBATCH --time=06:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./a.out

Hybrid MPI-OMP jobs

We speak of a hybrid MPI-OMP mode when the parallel application uses both MPI and OMP. It is worth noting that MKL calls of programs linked with Intel MKL are OpenMP-capable. In general, the following distribution is recommended: the number of MPI processes from 1 to the number of CPU sockets in one node, the OMP threads to be the total number of CPU core numbers in one node, or half, or quarter (as appropriate). For the job script the parameters of the above two modes must be combined.

In the following example, we start 2 nodes and 1-1 task per node with 10 threads per task. User Alice submitted a hybrid job to 2 nodes for 8 hours at the expense of the foobar account. Only 1 MPI process runs on one node at a time, which uses 8 OMP threads per node. The 2 machines run a total of 2 MPI processes and 2 x 8 OMP threads.

#!/bin/bash
#SBATCH -A foobar
#SBATCH --job-name=mpiomp
#SBATCH --time=08:00:00
#SBATCH -N 2
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH -o slurm.out
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun ./a.out

Maple Grid jobs

Maple can be run on 1 node - like OMP tasks. You must also load the maple module to use it. Maple works in client-server mode so you must also start the grid server (${MAPLE}/toolbox/Grid/bin/startserver) before running the Maple job. This application requires a license, which must be specified in the job script (#SBATCH --licenses=maplegrid:1). The Maple job must be started with the ${MAPLE}/toolbox/Grid/bin/joblauncher command.

User Alice starts Maple Grid for 6 hours from the foobar account:

#!/bin/bash
#SBATCH -A foobar
#SBATCH --job-name=maple
#SBATCH -N 1
#SBATCH --ntasks-per-node=16
#SBATCH --time=06:00:00
#SBATCH -o slurm.out
#SBATCH --licenses=maplegrid:1

module load maple

${MAPLE}/toolbox/Grid/bin/startserver
${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl