
PRACE User Support

21 540 bájt hozzáadva, 2019. október 29., 15:56
Acknowledgement in publications
  == User Guide to obtain a digital certificate ==  This document gives a short overview about how to require a digital certificate from NIIF CA for users, if the pre-registration form has been filled. This guide is valid only for the Hungarian users. If you are from a foreign country, and would like to get a certificate, [http://www.eugridpma.org/members/worldmap/ here] you can found your country's certification authority.  === Installing NIIF CA root certificate === : The first step is to download the "[https://www.ca.niif.hu/en/node/4 root certificate]" ("NIIF CA Root Certificate" part), in the the format, which is known for the used browser or other SSL-using program. The browser asks wether to install/accept the certificate or not - accept or install the certificate in any cases. In addition, activate or allow the option which permits the browser to use the certificate to authenticate websites. Without that, it is not possible to reach the CA's web interface with secure protocol (https). The downloaded/installed certificate can be found in the certificate management modul of the browser.   === Request a certificate === ==== Request a certificate with openssl ====*: Sing in into the certification registration website of the NIIF CA with our email address and password stored in the directory. *: This site uses secure protocol (https), which the browser often indicates with a warning window - they should be acknowledged implicitly.  *: In the opening page - which is the public web surface of the CMS certificate management software - choose the "OpenSSL kliens kérelem benyújtása (PKCS#10)" (request an OpenSSL client) option. This leads to the datasheet, which must be filled in accordance with the printed datasheet. First, according to the purpose of the request, the corresponding field must be choosen (CSIRT, GRID, NIIF felhasználó, Független kutató, HBONE). *: Copy the public part of our certificate in the field "PKCS#10". You can find a user guide about ''How to create a PKCS#10 certificate with openssl, which suitable for the NIIF CA requirements'' below. *: A Challenge and a Request passwords must be given - both of them must be at least 8 characters long. Note them, because they needed for cancellation the certificate, or for the personal authentication. *: Fill the other fields (name, email address, phone, organisation), and if there is anything, the CA operator should know, fill the last field with it. If everything is done, after a last check, click on the Elküld ("send") button on the bottom of the page. *: In case of a successful PKCS#10 key-uploading, a page is opening with the confirmation of the successful certification request. ==== User Guide to create a PKCS#10 digital certification request with openssl ====
This paragraph gives a short overview about how to require a digital certificate from NIIF CA for users using openssl with the PKCS#10 format.The latest version of the openssl program can be downloaded from: [http://www.slproweb.com/products/Win32OpenSSL.html Windows], [http://www.openssl.org/source/ Linux].
:1. Download the openssl configuration file
|= GRID # ''Ezek lehetnekFor example: GRID, HBONE, General Purpose''<br/>
How === Personal Authentication ===  After the successful registration on the website, please visit the NIIF CA Registration Office personally with the copy of the pre-registration datasheet, the Request password and an ID document (ID card, passport).<br /> Address: : NIIF Iroda : (RA Administrator): Victor Hugo Str. 18-22.: H-1132 Budapest, HUNGARY: email: ca (at) niif (dot) hu : RA opening hours: Monday, 14:00 - 16:30 (CET) During the authentication, the colleagues of the Registration Office verify the datas of the certificate and the user, and after the successful identification, they take the next steps in order to send create the PKCS#10 certification (it is not needed to wait for it).  === Downloading the CA:certificate ===
==An email is going to arrive after the valid certificate has been completed (to the given email address during the request), and clicking on the URL in the email, the certificate can be downloaded. The saved certificate does not contain the private key. If the certificate is installed into the browser, it is advised to export it with the private key in PKCS#12 format, so there will be a common backup with the private key and the certificate. Handle this backup carefully! If the private key lost, or gets into unauthorized hands, immediately request a certificate cancellation at the registration interface "Tanúsítvány visszavonása" (certificate cancellation) or at the Registration Office, and inform the concerned people! == Access with GSI-SSH ====
A user can access to the supercomputers by using the GSI-SSH protocol.
Both of the previous commands set the validation of the proxies to 24 hours.
Using the arcproxy, the validation time must be given is in seconds.
gsissh -p 2222 prace-login.budapest.hpcsc.niif.hu
<br />
==== GridFTP file transfer ====
globus-url-copy file://task/myfile.c gsiftp://prace-login.budapestsc.hpcniif.hu/home/taskprace/pr1hrocz/myfile.c
* -p <number of parallel streams> Specifies the number of parallel streams to be used in the GridFTP transfer.
* -stripe Use this parameter to initiate a “striped” GridFTP transfer that uses more than one node at the source and destination. As multiple nodes contribute to the transfer, each using its own network interface, a larger amount of the network bandwidth can be consumed than with a single system. Thus, at least for “big” (> 100 MB) files, striping can considerably improve performance.
==Usage of the SLURM scheduler ==
Website: http://slurm.schedmd.com
The schedule of the HPCs are CPU hour based. This means that the available core hours are divided between users on a monthly basis. All UNIX users are connected to one or more account. This scheduler account is connected to an HPC project and a UNIX group. HPC jobs can only be sent by using one of the accounts. The core hours are calculated by the multiplication of wall time (time spent running the job) and the CPU cores requested.
For example reserving 2 nodes (48 cpu cores) at the NIIFI SC for 30 minutes gives 48 * 30 = 1440 core minutes = 24 core hours. Core hours are measured between the start and and the end of the jobs.
'''It is very important to be sure the application maximally uses the allocated resources. An empty or non-optimal job will consume allocated core time very fast. If the account run out of the allocated time, no new jobs can be submitted until the beginning of the next accounting period. Account limits are regenerated the beginning of each month.'''
Information about an account can be listed with the following command:
==== Example ====
After executing the command, the following table shows up for Bob. The user can access, and run jobs by using two different accounts (foobar, barfoo). He can see his name marked with * in the table. He shares both accounts with alice (Account column). The consumed core hours for the users are displayed in the second row (Usage), and the consumption for the jobs ran as the account is displayed in the 4th row. The last two row defines the allocated maximum time (Account limit), and the time available for the machine (Available).
Scheduler Account Balance
---------- ----------- + ---------------- ----------- + ------------- -----------
User Usage | Account Usage | Account Limit Available (CPU hrs)
---------- ----------- + ---------------- ----------- + ------------- -----------
alice 0 | foobar 0 | 0 0
bob * 0 | foobar 0 | 0 0
bob * 7 | barfoo 7 | 1,000 993
alice 0 | barfoo 7 | 1,000 993
=== Estimating core time ===
Before production runs, it is advised to have a core time estimate. The following command can be used for getting estimate:
sestimate -N NODES -t WALLTIME
where <code>NODES</code> are the number of nodes to be reserved, <code>WALLTIME</code> is the maximal time spent running the job.
'''It is important to provide the core time to be reserved most precisely, because the scheduler queue the jobs based on this value. Generally, a job with shorter core time will be run sooner. It is advised to check the time used to run the job after completion with <code>sacct</code> command.'''
==== Example ====
Alice want to reserve 2 days 10 hours and 2 nodes, she checks, if she have enough time on her account.
sestimate -N 2 -t 2-10:00:00
Estimated CPU hours: 2784
Unfortunately, she couldn't afford to run this job.
=== Status information ===
Jobs in the queue can be listed with <code>squeue</code> command, the status of the cluster can be retrieved with the <code>sinfo</code> command. All jobs sent will get a JOBID. The properties of a job can be retrieved by using this id. Status of a running or waiting job:
scontrol show job JOBID
All jobs will be inserted into an accounting database. The properties of the completed jobs can be retrieved from this database. Detailed statistics can be viewed by using this command:
sacct -l -j JOBID
Memory used can be retrieved by using
smemory JOBID
Disk usage can be retrieved by this command:
sdisk JOBID
==== Example ====
There are 3 jobs in the queue. The first is an array job which is waiting for resources (PENDING). The second is an MPI job running on 4 nodes for 25 minutes now. The third is an OMP run running on one node, just started. The NAME of the jobs can be freely given, it is advised to use short, informative names.
squeue -l
Wed Oct 16 08:30:07 2013
591_[1-96] normal array alice PENDING 0:00 30:00 1 (None)
589 normal mpi bob RUNNING 25:55 2:00:00 4 cn[05-08]
590 normal omp alice RUNNING 0:25 1:00:00 1 cn09
This two-node batch job had a typical load of 10GB virtual, and 6.5GB RSS memory per node.
smemory 430
MaxVMSize MaxVMSizeNode AveVMSize MaxRSS MaxRSSNode AveRSS
---------- -------------- ---------- ---------- ---------- ----------
10271792K cn06 10271792K 6544524K cn06 6544524K
10085152K cn07 10085152K 6538492K cn07 6534876K
==== Checking jobs ====
It is important to be sure the application fully uses the core time reserved. A running application can be monitored with the following command:
sjobcheck JOBID
===== Example =====
This job runs on 4 nodes. The LOAD group provides information about the general load of the machine, this is more or less equal to the number of cores. The CPU group gives you information about the exact usage. Ideally, values of the <code>User</code> column are over 90. If the value is below that, there is a problem with the application, or it is not optimal, and the run should be ended. This example job fully using ("maxing out") the available resources.
Hostname LOAD CPU Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle, Wio]
cn08 24 ( 25/ 529) [ 24.83, 24.84, 20.98] [ 99.8, 0.0, 0.2, 0.0, 0.0] OFF
cn07 24 ( 25/ 529) [ 24.93, 24.88, 20.98] [ 99.8, 0.0, 0.2, 0.0, 0.0] OFF
cn06 24 ( 25/ 529) [ 25.00, 24.90, 20.97] [ 99.9, 0.0, 0.1, 0.0, 0.0] OFF
cn05 24 ( 25/ 544) [ 25.11, 24.96, 20.97] [ 99.8, 0.0, 0.2, 0.0, 0.0] OFF
==== Checking licenses ====
The used and available licenses can be retrieved with this command:
==== Checking downtime ====
In downtime periods, the scheduler doesn't start new jobs, but jobs can be sent. The periods can be retrieved by using the following command:
=== Running jobs ===
Running applications in the HPC can be done in batch mode. This means all runs must have a job script containing the resources and commands needed. The parameters of the scheduler (resource definitions) can be given with the <code>#SBATCH</code> directive. Comparison of the schedulers, and the directives available at slurm are available at this [http://slurm.schedmd.com/rosetta.pdf table].
==== Obligatory parameters ====
The following parameters are obligatory to provide:
#SBATCH --job-name=NAME
where <code>ACCOUNT</code> is the name of the account to use (available accounts can be retrieved with the <code>sbalance</code> command), <code>NAME</code> is the short name of the job, <code>TIME</code> is the maximum walltime using <code>DD-HH:MM:SS</code> syntax.
Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
The following command submit jobs:
sbatch jobscript.sh
If the submission was successful, the following is outputted:
Submitted batch job JOBID
where <code>JOBID</code> is the unique id of the job
The following commmand cancels the job:
scancel JOBID
==== Job queues ====
There are two separate queue (partition) available in the HPC, the <code>test</code> queue and the <code>prod</code> queue. Tha latter is for the production runs, the former is for testing purposes. In the test queue, 1 node can be allocated for the maximum of half hours, The default queue is <code>prod</code>. Test partition can be chosen with the following directive:
#SBATCH --partition=test
==== Quality of Service (QoS) ====
There is an option for submitting low priority jobs. These jobs can be interrupted by any normal priority job at any time, but only the half of the time is billed to the account. Interrupted jobs will be automatically queued again. Therefore it is important to only run jobs that can be interrupted at any time, periodically saves their states (checkpoint) and can restart quickly.
The default QoS is <code>normal</code>, non-interruptable.
The following directive choses low priority:
#SBATCH --qos=lowpri
==== Memory settings ====
1000 MB memory is allocated for 1 CPU core by default, more can be allocated with the following directive:
#SBATCH --mem-per-cpu=MEMORY
where <code>MEMORY</code> is given in MB. The maximum memory/core at NIIFI SC is 2600 MB.
==== Email notification ====
Sending mail when the status of the job change (start, stop, error):
#SBATCH --mail-type=ALL
#SBATCH --mail-user=EMAIL
where <code>EMAIL</code> is the e-mail to notify.
==== Array jobs ====
Array jobs are needed, when multiple one threaded (serial) jobs are to be sent (with different data). Slurm stores unique id of the instances in the <code>SLURM_ARRAY_TASK_ID</code> enviromnemt variable. It is possible to seperate threads of the array job by retrieving these ids. Output of the threads are written into <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> files. The scheduler uploads outputs tightly. It is useful to use multiply threads for a CPU core. [http://slurm.schedmd.com/job_array.html More on this topic]
===== Example =====
Alice user submits 96 serial job for a maximum of 24 hour run.
on the expenses of 'foobar' account. The <code>#SBATCH --array=1-96</code> directive indicates, that it is an array job. The application
can be run with the <code>srun</code> command. This is a shell script in this example.
#SBATCH -A foobar
#SBATCH --time=24:00:00
#SBATCH --job-name=array
#SBATCH --array=1-96
srun envtest.sh
==== MPI jobs ====
Using MPI jobs, the number of MPI processes running on a node is to be given (<code>#SBATCH --ntasks-per-node=</code>). The most frequent case is to provide the number of CPU cores. Parallel programs should be started by using <code>mpirun</code> command.
===== Example =====
Bob user allocates 2 nodes, 12 hour for an MPI job, billing 'barfoo' account. 24 MPI thread will be started on each node. The stdout output is piped to <code>slurm.out</code> file (<code>#SBATCH -o</code>).
#SBATCH -A barfoo
#SBATCH --job-name=mpi
#SBATCH --ntasks-per-node=24
#SBATCH --time=12:00:00
#SBATCH -o slurm.out
mpirun ./a.out
==== CPU binding ====
Generally, the performance of MPI application can be optimized with CPU core binding. In this case, the threads of the paralel program won't be scheduled by the OS between the CPU cores, and the memory localization can be made better (less cache miss). It is advised to use memory binding. Tests can be run to define, what binding strategy gives the best performance for our application. The following settings are valid for OpenMPI environment. Further information on binding can be retrieved with <code>--report-bindings</code> MPI option. Along with the running commands, few lines of the detailed binding information are shown. It is important, that one should not use task_binding of the scheduler!
===== Binding per CPU core =====
In this case, MPI fills CPU cores by the order of threads (rank).
Command to run: mpirun --bind-to-core --bycore
[cn05:05493] MCW rank 0 bound to socket 0[core 0]: [B . . . . . . . . . . .][. . . . . . . . . . . .]
[cn05:05493] MCW rank 1 bound to socket 0[core 1]: [. B . . . . . . . . . .][. . . . . . . . . . . .]
[cn05:05493] MCW rank 2 bound to socket 0[core 2]: [. . B . . . . . . . . .][. . . . . . . . . . . .]
[cn05:05493] MCW rank 3 bound to socket 0[core 3]: [. . . B . . . . . . . .][. . . . . . . . . . . .]
===== Binding based on CPU socket =====
In this case, MPI threads are filling CPUs alternately.
Command to run: mpirun --bind-to-core --bysocket
[cn05:05659] MCW rank 0 bound to socket 0[core 0]: [B . . . . . . . . . . .][. . . . . . . . . . . .]
[cn05:05659] MCW rank 1 bound to socket 1[core 0]: [. . . . . . . . . . . .][B . . . . . . . . . . .]
[cn05:05659] MCW rank 2 bound to socket 0[core 1]: [. B . . . . . . . . . .][. . . . . . . . . . . .]
[cn05:05659] MCW rank 3 bound to socket 1[core 1]: [. . . . . . . . . . . .][. B . . . . . . . . . .]
===== Binding by nodes =====
In this case, MPI threads are filling nodes alternately. At least 2 nodes needs to be allocated.
Command to run: mpirun --bind-to-core --bynode
[cn05:05904] MCW rank 0 bound to socket 0[core 0]: [B . . . . . . . . . . .][. . . . . . . . . . . .]
[cn05:05904] MCW rank 2 bound to socket 0[core 1]: [. B . . . . . . . . . .][. . . . . . . . . . . .]
[cn06:05969] MCW rank 1 bound to socket 0[core 0]: [B . . . . . . . . . . .][. . . . . . . . . . . .]
[cn06:05969] MCW rank 3 bound to socket 0[core 1]: [. B . . . . . . . . . .][. . . . . . . . . . . .]
==== OpenMP (OMP) jobs ====
For OpenMP paralell applications, 1 node needs to be allocated, and the number of OMP threads needs to be provided with the <code>OMP_NUM_THREADS</code> environment variable. The variable needs to be written before the application (see example), or needs to be exported before executing the command:
===== Example =====
Alice user starts a 24 threaded OMP application for maximum 6 hours on the expenses of foobar account.
#SBATCH -A foobar
#SBATCH --job-name=omp
#SBATCH --time=06:00:00
OMP_NUM_THREADS=24 ./a.out
==== Hybrid MPI-OMP jobs ====
When an application uses MPI and OMP it is running in hybrid MPI-OMP mode. Good to know that Intel MKL linked applications MKL calls are OpenMP capable. Generally, the following distribution suggested: MPI process number is from 1 to the CPU socket number, OMP thread number is the number of CPU cores in a node, or the half or quarter of that (it depends on code). For the job script, the parameters of these two needs to be combined.
===== Example =====
Alice user sent a hybrid job on the expenses of the 'foobar' account for 8 hours, and 2 nodes. 1 MPI process is running on one node using 24 OMP thread per node. For the 2 nodes, 2 MPI process is running, with 2x24 OMP threads
#SBATCH -A foobar
#SBATCH --job-name=mpiomp
#SBATCH --time=08:00:00
#SBATCH --ntasks-per-node=1
#SBATCH -o slurm.out
mpirun ./a.out
==== Maple Grid jobs ====
Maple can be run - similarly to OMP jobs - on one node. Maple module need to be loaded for using it. A grid server needs to be started, because Maple is working in client-server mode (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>). This application needs to use license, which have to be given in the jobscript (<code>#SBATCH --licenses=maplegrid:1</code>). Starting of a Maple job is done by using
<code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> code.
===== Example =====
Alice user is running a Maple Grid application for 6 hours on the expenses of 'foobar' account:
#SBATCH -A foobar
#SBATCH --job-name=maple
#SBATCH --ntasks-per-node=24
#SBATCH --time=06:00:00
#SBATCH -o slurm.out
#SBATCH --licenses=maplegrid:1
module load maple
${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl
==== GPU compute nodes ====
The Szeged site accomodates 2 GPU enabled compute nodes. Each GPU node has 6 Nvidia Tesla M2070 cards. The GPU nodes reside in a separate job queue (<code>--partition gpu</code>). To specify the number of GPUs set <code>--gres gpu:#</code> directive.
===== Example =====
Alice user submits to the foobar account a 4 GPU, 6 hour job.
#SBATCH -A foobar
#SBATCH --job-name=GPU
#SBATCH --partition gpu
#SBATCH --gres gpu:4
#SBATCH --time=06:00:00
$PWD/gpu_burnout 3600
== Extensions ==
Extensions should be asked for at the Execution site (NIIF) at prace-support@niif.hu. All requests will be carefully reviewed and decided if eligable.
== Reporting after finishing project ==
A report must be created after using PRACE resources. Please contact prace-support@niif.hu for further details.
== Acknowledgement in publications ==
'''We acknowledge [PRACE/KIFÜ] for awarding us access to resource based in Hungary at [Budapest/Debrecen/Pécs/Szeged].'''
'''We acknowledge KIFÜ for awarding us access to resource based in Hungary at [Budapest/Debrecen/Pécs/Szeged].'''
Where technical support has been received the following additional text should also be used:
'''The support of [name of person/people] from KIFÜ, Hungary to the technical work is gratefully acknowledged.'''
[[Category: HPC]]

Navigációs menü