Módosítások

← Régebbi szerkesztés

PRACE User Support

1 119 bájt hozzáadva, 2019. október 29., 15:56

a

→‎Acknowledgement in publications

==Usage of the SLURM scheduler ==

Website: http://slurm.schedmd.com

The schedule of the HPCs are CPU hour based. This means that the available core hours are divided between users on a monthly basis. All UNIX users are connected to one or more account. This scheduler account is connected to an HPC project and a UNIX group. HPC jobs can only be sent by using one of the accounts. The core hours are calculated by the multiplication of wall time (time spent running the job) and the CPU cores requested.

For example reserving 2 nodes (48 cpu cores) at the NIIFI SC for 30 minutes gives 48 * 30 = 1440 core minutes = 24 core hours. Core hours are measured between the start and and the end of the jobs.

'''It is very important to be sure the application maximally uses the allocated resources. An ~~emty~~ empty or non-optimal job will consume allocated core time very fast. If the account run out of the allocated time, no new jobs can be submitted until the beginning of the next accounting period. Account limits are regenerated the beginning of each month.'''

Information about an account can be listed with the following command:

==== Example ====

After executing the command, the following table shows up for Bob. The user can access, and run jobs by using two ~~differnt~~ different accounts (foobar,barfoo). He can see his name marked with * in the table. He shares both accounts with alice (Account column). The consumed core hours for the users are displayed in the second row (Usage), and the consumption for the jobs ran as the account is displayed in the 4th row. The last two row defines the allocated maximum time (Account limit), and the time available for the machine (Available).

<pre>

</code>

where <code>NODES</code> are the number of nodes to be reserved, <code>WALLTIME</code> is the maximal time spent running the job.

'''It is important to provide the core time to be reserved most precisely, because the scheduler queue the jobs based on this value. Generally, a job with shorter core time will be run sooner. It is advised to check the time used to run the job after completion with <code>sacct</code> command.'''

</code>

All ~~job~~ jobs will be inserted into an accounting database. The properties of the completed jobs can be retrieved from this database. Detailed statistics can be viewed by using this command:

<code>

sacct -l -j JOBID

==== Example ====

There are 3 jobs in the queue. The first is an ~~arrayjob wainting~~ array job which is waiting for resources (PENDING). The second is an MPI job running on 4 nodes for 25 minutes now. The third is an OMP run running on one node, just ~~staerted~~started. The NAME of the jobs can be freely given, it is advised to use short, informative names.

<pre>

==== Checking licenses ====

The ~~licenses~~ used and available licenses can be retrieved with this command:

<code>

</pre>

where <code>ACCOUNT</code> is the name of the account to use (available accounts can be retrieved with the <code>sbalance</code> command), <code>NAME</code> is the short name of the job, <code>TIME</code> is the maximum walltime using <code>DD-HH:MM:SS</code> syntax. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

The following command submit jobs:

where <code>EMAIL</code> is the e-mail to notify.

==== ~~Tömbfeladatok (arrayjob)~~ Array jobs ====~~Tömbfeladatokra akkor van szükségünk~~Array jobs are needed, ~~egy szálon futó~~ when multiple one threaded (~~soros~~serial) ~~alkalmazást szeretnénk egyszerre sok példányban~~ jobs are to be sent (~~más-más adatokkal~~with different data) ~~futtatni~~. ~~A példányok számára az ütemező a~~ Slurm stores unique id of the instances in the <code>SLURM_ARRAY_TASK_ID</code> ~~környezeti változóban tárolja az egyedi azonosítót~~enviromnemt variable. ~~Ennek lekérdezésével lehet az arrayjob szálait elkülöníteni~~It is possible to seperate threads of the array job by retrieving these ids. ~~A szálak kimenetei a~~ Output of the threads are written into <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> ~~fájlokba íródnak~~files. ~~Az ütemező a feltöltést szoros pakolás szerint végzi~~The scheduler uploads outputs tightly. ~~Ebben az esetben~~ It is ~~érdemes a processzorszám többszörösének választani~~ useful to use multiply threads for a ~~szálak számát~~CPU core. [http://slurm.schedmd.com/job_array.html ~~Bővebb ismertető~~More on this topic]

===== ~~Példa~~ Example =====Alice ~~felhasználó~~ user submits 96 serial job for a ~~foobar számla terhére,~~ maximum of 24 ~~órára ad fel 96 db soros jobot~~hour run. on the expenses of 'foobar' account. A The <code>#SBATCH --array=1-96</code> ~~direktíva jelzi~~directive indicates, ~~hogy tömbfeladatról van szó~~that it is an array job. ~~Az alkalmazást az~~ The application can be run with the <code>srun</code> ~~paranccsal kell indítani~~command. ~~Ebben az esetben ez egy~~ This is a shell ~~szkript~~script in this example.

<pre>

#!/bin/bash

</pre>

==== MPI ~~feladatok~~ jobs ====Using MPI ~~feladatok esetén meg kell adnunk az egy~~ jobs, the number of MPI processes running on a node~~-on elinduló MPI processzek számát~~ is to be given (<code>#SBATCH --ntasks-per-node=</code>). ~~A leggyakoribb esetben ez az egy node-ban található~~ The most frequent case is to provide the number of CPU ~~core-ok száma~~cores. ~~A párhuzamos programot az~~ Parallel programs should be started by using <code>mpirun</code> ~~paranccsal kell indítani~~command. ===== Example =====Bob user allocates 2 nodes, 12 hour for an MPI job, billing 'barfoo' account. 24 MPI thread will be started on each node. The stdout output is piped to <code>slurm.out</code> file (<code>#SBATCH -o</code>).

~~===== Példa =====~~

Bob felhasználó a barfoo számla terhére 2 node-ot, 12 órára foglal le egy MPI job számára. Az egyes node-okon 24 MPI szálat fog elindítani. A program stdout kimenetét a <code>slurm.out</code> fájlba irányítja (<code>#SBATCH -o</code>).

<pre>

#!/bin/bash

==== CPU binding ====

Az Generally, the performance of MPI ~~programok teljesítménye általában javítható a processzek~~ application can be optimized with CPU ~~magokhoz kötésével~~core binding. ~~Ilyenkor a párhuzamos~~ In this case, the threads of the paralel program ~~szálait az operációs rendszer nem ütemezi a~~ won't be scheduled by the OS between the CPU ~~magok között~~cores, ~~ezért javulhat a memória lokalizáció~~ and the memory localization can be made better (~~kevesebb~~ less cache miss). ~~A kötés használata ajánlott~~It is advised to use memory binding. ~~Tesztekkel meg kell győződni~~Tests can be run to define, ~~hogy egy adott alkalmazás esetén melyik kötési stratégia adja a legjobb eredményt~~what binding strategy gives the best performance for our application. ~~A következő beállítások az~~ The following settings are valid for OpenMPI ~~környezetre vontakoznak~~environment. ~~A kötésekről részletes információt a~~ Further information on binding can be retrieved with <code>--report-bindings</code> MPI ~~opcióval kaphatunk~~option. ~~Az indítási parancsok melett a részletes~~ Along with the running commands, few lines of the detailed binding ~~információ néhány sora~~ information are shown. It is ~~látható. Fontos~~important, ~~hogy az ütemező task binding-ját nem kell használni~~that one should not use task_binding of the scheduler! ===== Binding per CPU core =====In this case, MPI fills CPU cores by the order of threads (rank).

~~===== Kötés CPU magonként =====~~

~~Ebben az esetben az MPI szálak (rank) sorban töltik fel a CPU magokat.~~

<pre>

~~Indítási parancs~~Command to run: mpirun --bind-to-core --bycore

[cn05:05493] MCW rank 0 bound to socket 0[core 0]: [B . . . . . . . . . . .][. . . . . . . . . . . .]

</pre>

===== ~~Kötés~~ Binding based on CPU ~~foglalat szerint~~ socket =====~~Ebben az esetben az~~ In this case, MPI ~~szálak váltakozva töltik fel a CPU-kat~~threads are filling CPUs alternately.

<pre>

~~Indítási parancs~~Command to run: mpirun --bind-to-core --bysocket

[cn05:05659] MCW rank 0 bound to socket 0[core 0]: [B . . . . . . . . . . .][. . . . . . . . . . . .]

</pre>

===== ~~Kötés node-ok szerint~~ Binding by nodes =====~~Ebben az esetben az~~ In this case, MPI ~~szálak váltakozva töltik fel a node-okat~~threads are filling nodes alternately. ~~Lagalább~~ At least 2 ~~node foglalása szükséges~~nodes needs to be allocated.

<pre>

~~Indítási parancs~~Command to run: mpirun --bind-to-core --bynode

[cn05:05904] MCW rank 0 bound to socket 0[core 0]: [B . . . . . . . . . . .][. . . . . . . . . . . .]

</pre>

==== OpenMP (OMP) ~~feladatok~~ jobs ====For OpenMP ~~párhuzamos alkalmazásokhoz~~ paralell applications, 1 node~~-ot kell lefoglalni és meg kell adni az~~ needs to be allocated, and the number of OMP ~~szálák számát a~~ threads needs to be provided with the <code>OMP_NUM_THREADS</code> ~~környezeti változóval~~environment variable. ~~A változót vagy az alkamazás elé kell írni~~ The variable needs to be written before the application (~~ld. példa~~see example), ~~vagy exportálni kell az alkalmazást indító parancs előtt~~or needs to be exported before executing the command:

<code>

export OMP_NUM_THREADS=24

</code>

===== ~~Példa~~ Example =====Alice ~~felhasználó~~ user starts a ~~foobar számla terhére,~~ 24 threaded OMP application for maximum 6 ~~órára indít el egy 24 szálas OMP alkalmazást~~hours on the expenses of foobar account.

<pre>

#!/bin/bash

</pre>

==== ~~Hibrid~~ Hybrid MPI-OMP ~~feladatok~~ jobs ====~~Hibrid~~ When an application uses MPI-and OMP ~~módról akkor beszélünk, ha a párhuzamos alkalmazás~~ it is running in hybrid MPI-~~t és~~ OMP~~-t is használ~~mode. ~~Érdemes tudni, hogy az~~ Good to know that Intel MKL~~-el linkelt programok~~ linked applications MKL ~~hívásai~~ calls are OpenMP ~~képesek~~capable. ~~Általában a következő elosztás javasolt~~Generally, the following distribution suggested: az MPI ~~processzek száma~~ process number is from 1~~-től az egy node-ban található~~ to the CPU ~~foglalatok száma~~socket number, az OMP ~~szálak ennek megfelelően az egy~~ thread number is the number of CPU cores in a node~~-ban található összes CPU core szám vagy annak fele~~, ~~negyede~~ or the half or quarter of that (~~értelem szerűen~~it depends on code). ~~A jobszkipthez a fenti két mód paramétereit kombinálni kell~~For the job script, the parameters of these two needs to be combined.

===== ~~Példa~~ Example =====Alice ~~felhasználó~~ user sent a hybrid job on the expenses of the 'foobar ~~számla terhére,~~ ' account for 8 ~~órára~~hours, and 2 ~~node-ra küldött be egy hibrid jobot~~nodes. ~~Egy node-on egyszerre csak~~ 1 db MPI ~~processz fut ami~~ process is running on one node~~-onként~~ using 24 OMP ~~szálat használ~~thread per node. A For the 2 ~~gépen összesen~~ nodes, 2 MPI ~~proceszz és 2 x 24~~ process is running, with 2x24 OMP ~~szál fut.~~threads

<pre>

#!/bin/bash

</pre>

==== Maple Grid ~~feladatok~~ jobs ====Maplecan be run -~~t az~~ similarly to OMP ~~feladatokhoz hasonlóan 1 node~~jobs -on ~~lehet futtatni~~one node. ~~Használatához~~ Maple module need to be ~~kell tölteni a maple modult is~~loaded for using it. A grid server needs to be started, because Maple ~~kliens~~is working in client-~~szerver üzemmódban működik ezért a Maple feladat futtatása előtt szükség van a grid szerver elindítására is~~ server mode (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>). ~~Ez az alkalmazás licensz köteles~~This application needs to use license, ~~amit a jobszkriptben meg kell adni~~ which have to be given in the jobscript (<code>#SBATCH --licenses=maplegrid:1</code>). A Starting of a Maple ~~feladat indátását a~~ job is done by using <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> ~~paranccsal kell elvégezni~~code.

===== ~~Példa~~ Example =====Alice ~~felhasználó a foobar számla terhére, 6 órára indítja el~~ user is running a Maple Grid ~~alkalmazást~~application for 6 hours on the expenses of 'foobar' account:

<pre>

#!/bin/bash

#SBATCH -o slurm.out

#SBATCH --licenses=maplegrid:1

module load maple

${MAPLE}/toolbox/Grid/bin/startserver

${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl

</pre>

==== GPU compute nodes ====

The Szeged site accomodates 2 GPU enabled compute nodes. Each GPU node has 6 Nvidia Tesla M2070 cards. The GPU nodes reside in a separate job queue (<code>--partition gpu</code>). To specify the number of GPUs set <code>--gres gpu:#</code> directive.

===== Example =====

Alice user submits to the foobar account a 4 GPU, 6 hour job.

<pre>

#!/bin/bash

#SBATCH -A foobar

#SBATCH --job-name=GPU

#SBATCH --partition gpu

#SBATCH --gres gpu:4

#SBATCH --time=06:00:00

$PWD/gpu_burnout 3600

</pre>

== Extensions ==

Extensions should be asked for at the Execution site (NIIF) at prace-support@niif.hu. All requests will be carefully reviewed and decided if eligable.

== Reporting after finishing project ==

A report must be created after using PRACE resources. Please contact prace-support@niif.hu for further details.

== Acknowledgement in publications ==

PRACE

'''We acknowledge [PRACE/KIFÜ] for awarding us access to resource based in Hungary at [Budapest/Debrecen/Pécs/Szeged].'''

KIFÜ

'''We acknowledge KIFÜ for awarding us access to resource based in Hungary at [Budapest/Debrecen/Pécs/Szeged].'''

Where technical support has been received the following additional text should also be used:

'''The support of [name of person/people] from KIFÜ, Hungary to the technical work is gratefully acknowledged.'''

[[Category: HPC]]

Kzoli(AT)niif.hu

bürokraták, adminisztrátorok

142

szerkesztés

Módosítások

PRACE User Support

Navigációs menü

Személyes eszközök

Névterek

Változatok

Nézetek

Több

Keresés

Navigáció

Eszközök