Módosítások

← Régebbi szerkesztés

PRACE User Support

1 409 bájt hozzáadva, 2019. október 29., 15:56

a

→‎Acknowledgement in publications

==Usage of the SLURM scheduler ==

Website: http://slurm.schedmd.com

The schedule of the HPCs are CPU hour based. This means that the available core hours are divided between users on a monthly basis. All UNIX users are connected to one or more account. This scheduler account is connected to an HPC project and a UNIX group. HPC jobs can only be sent by using one of the accounts. The core hours are calculated by the multiplication of wall time (time spent running the job) and the CPU cores requested.

For example reserving 2 nodes (48 cpu cores) at the NIIFI SC for 30 minutes gives 48 * 30 = 1440 core minutes = 24 core hours. Core hours are measured between the start and and the end of the jobs.

'''It is very important to be sure the application maximally uses the allocated resources. An ~~emty~~ empty or non-optimal job will consume allocated core time very fast. If the account run out of the allocated time, no new jobs can be submitted until the beginning of the next accounting period. Account limits are regenerated the beginning of each month.'''

Information about an account can be listed with the following command:

==== Example ====

After executing the command, the following table shows up for Bob. The user can access, and run jobs by using two ~~differnt~~ different accounts (foobar,barfoo). He can see his name marked with * in the table. He shares both accounts with alice (Account column). The consumed core hours for the users are displayed in the second row (Usage), and the consumption for the jobs ran as the account is displayed in the 4th row. The last two row defines the allocated maximum time (Account limit), and the time available for the machine (Available).

<pre>

</code>

where <code>NODES</code> are the number of nodes to be reserved, <code>WALLTIME</code> is the maximal time spent running the job.

'''It is important to provide the core time to be reserved most precisely, because the scheduler queue the jobs based on this value. Generally, a job with shorter core time will be run sooner. It is advised to check the time used to run the job after completion with <code>sacct</code> command.'''

</code>

All ~~job~~ jobs will be inserted into an accounting database. The properties of the completed jobs can be retrieved from this database. Detailed statistics can be viewed by using this command:

<code>

sacct -l -j JOBID

==== Example ====

There are 3 jobs in the queue. The first is an ~~arrayjob wainting~~ array job which is waiting for resources (PENDING). The second is an MPI job running on 4 nodes for 25 minutes now. The third is an OMP run running on one node, just ~~staerted~~started. The NAME of the jobs can be freely given, it is advised to use short, informative names.

<pre>

==== Checking licenses ====

The ~~licenses~~ used and available licenses can be retrieved with this command:

<code>

</pre>

where <code>ACCOUNT</code> is the name of the account to use (available accounts can be retrieved with the <code>sbalance</code> command), <code>NAME</code> is the short name of the job, <code>TIME</code> is the maximum walltime using <code>DD-HH:MM:SS</code> syntax. Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

The following command submit jobs:

==== Array jobs ====

Array jobs are needed, when multiple one threaded (serial) jobs are to be sent (with different data). Slurm stores unique id of the instances in the <code>SLURM_ARRAY_TASK_ID</code> enviromnemt variable. It is possible to seperate threads of the array job by retrieving these ids. Output of the threads are written into <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> files. The scheduler uploads outputs tightly. It is useful to use multiply threads for a CPU core. [http://slurm.schedmd.com/job_array.html More on this topic]

===== Example =====

==== MPI jobs ====

Using MPI jobs, the number of MPI processes running on a node is to be given (<code>#SBATCH --ntasks-per-node=</code>). The most frequent case is to provide the number of CPU cores. ~~Paralell~~ Parallel programs should be started by using <code>mpirun</code> command.

===== Example =====

==== CPU binding ====

Generally, the performance of MPI application can be optimized with CPU core binding. In this case, the threads of the paralel program won't be scheduled by the OS between the CPU cores, and the memory localization can be made better (less cache miss). It is advised to use memory binding. Tests can be run to define, what binding strategy gives the best performance for an our application. The following settings are valid for OpenMPI environment. Further information on binding can be retrieved with <code>--report-bindings</code> MPI option. Along with the running commands, few lines of the detailed binding information are shown. It is important, that one should not use task_binding of the scheduler!

===== Binding per CPU core =====

</pre>

==== ~~Hibrid~~ Hybrid MPI-OMP ~~feladatok~~ jobs ====~~Hibrid~~ When an application uses MPI-and OMP ~~módról akkor beszélünk, ha a párhuzamos alkalmazás~~ it is running in hybrid MPI-~~t és~~ OMP~~-t is használ~~mode. ~~Érdemes tudni, hogy az~~ Good to know that Intel MKL~~-el linkelt programok~~ linked applications MKL ~~hívásai~~ calls are OpenMP ~~képesek~~capable. ~~Általában a következő elosztás javasolt~~Generally, the following distribution suggested: az MPI ~~processzek száma~~ process number is from 1~~-től az egy node-ban található~~ to the CPU ~~foglalatok száma~~socket number, az OMP ~~szálak ennek megfelelően az egy~~ thread number is the number of CPU cores in a node~~-ban található összes CPU core szám vagy annak fele~~, ~~negyede~~ or the half or quarter of that (~~értelem szerűen~~it depends on code). ~~A jobszkipthez a fenti két mód paramétereit kombinálni kell~~For the job script, the parameters of these two needs to be combined.

===== ~~Példa~~ Example =====Alice ~~felhasználó~~ user sent a hybrid job on the expenses of the 'foobar ~~számla terhére,~~ ' account for 8 ~~órára~~hours, and 2 ~~node-ra küldött be egy hibrid jobot~~nodes. ~~Egy node-on egyszerre csak~~ 1 db MPI ~~processz fut ami~~ process is running on one node~~-onként~~ using 24 OMP ~~szálat használ~~thread per node. A For the 2 ~~gépen összesen~~ nodes, 2 MPI ~~proceszz és 2 x 24~~ process is running, with 2x24 OMP ~~szál fut.~~threads

<pre>

#!/bin/bash

</pre>

==== Maple Grid ~~feladatok~~ jobs ====Maplecan be run -~~t az~~ similarly to OMP ~~feladatokhoz hasonlóan 1 node~~jobs -on ~~lehet futtatni~~one node. ~~Használatához~~ Maple module need to be ~~kell tölteni a maple modult is~~loaded for using it. A grid server needs to be started, because Maple ~~kliens~~is working in client-~~szerver üzemmódban működik ezért a Maple feladat futtatása előtt szükség van a grid szerver elindítására is~~ server mode (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>). ~~Ez az alkalmazás licensz köteles~~This application needs to use license, ~~amit a jobszkriptben meg kell adni~~ which have to be given in the jobscript (<code>#SBATCH --licenses=maplegrid:1</code>). A Starting of a Maple ~~feladat indátását a~~ job is done by using <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> ~~paranccsal kell elvégezni~~code.

===== ~~Példa~~ Example =====Alice ~~felhasználó a foobar számla terhére, 6 órára indítja el~~ user is running a Maple Grid ~~alkalmazást~~application for 6 hours on the expenses of 'foobar' account:

<pre>

#!/bin/bash

#SBATCH -o slurm.out

#SBATCH --licenses=maplegrid:1

module load maple

${MAPLE}/toolbox/Grid/bin/startserver

${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl

</pre>

==== GPU compute nodes ====

The Szeged site accomodates 2 GPU enabled compute nodes. Each GPU node has 6 Nvidia Tesla M2070 cards. The GPU nodes reside in a separate job queue (<code>--partition gpu</code>). To specify the number of GPUs set <code>--gres gpu:#</code> directive.

===== Example =====

Alice user submits to the foobar account a 4 GPU, 6 hour job.

<pre>

#!/bin/bash

#SBATCH -A foobar

#SBATCH --job-name=GPU

#SBATCH --partition gpu

#SBATCH --gres gpu:4

#SBATCH --time=06:00:00

$PWD/gpu_burnout 3600

</pre>

== Extensions ==

Extensions should be asked for at the Execution site (NIIF) at prace-support@niif.hu. All requests will be carefully reviewed and decided if eligable.

== Reporting after finishing project ==

A report must be created after using PRACE resources. Please contact prace-support@niif.hu for further details.

== Acknowledgement in publications ==

PRACE

'''We acknowledge [PRACE/KIFÜ] for awarding us access to resource based in Hungary at [Budapest/Debrecen/Pécs/Szeged].'''

KIFÜ

'''We acknowledge KIFÜ for awarding us access to resource based in Hungary at [Budapest/Debrecen/Pécs/Szeged].'''

Where technical support has been received the following additional text should also be used:

'''The support of [name of person/people] from KIFÜ, Hungary to the technical work is gratefully acknowledged.'''

[[Category: HPC]]

Kzoli(AT)niif.hu

bürokraták, adminisztrátorok

142

szerkesztés

Módosítások

PRACE User Support

Navigációs menü

Személyes eszközök

Névterek

Változatok

Nézetek

Több

Keresés

Navigáció

Eszközök