Debrecen2 GPU klaszter en
[Access Policy] [Privacy Policy]
Cluster | Debrecen2 (Leo) |
Type | HP SL250s |
Core / node | 8 × 2 Xeon E5-2650v2 2.60GHz |
GPU / node | 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x |
# of compute nodes | 84 |
Max Walltime | 7-00:00:00 |
Max core / project | 336 |
Max mem / core | 7000 MB |
Tartalomjegyzék
[elrejtés]- 1 Requesting CPU time
- 2 Login
- 3 Copying files with SCP
- 4 Data synchronization
- 5 User interface
- 6 Using a shared home directory
- 7 Compiling applications
- 8 Using the SLURM scheduler
- 8.1 Estimating CPU time
- 8.2 Status information
- 8.3 Submitting jobs
- 8.3.1 Mandatory parameters
- 8.3.2 Reservation of GPUs
- 8.3.3 Interactive use
- 8.3.4 Submitting batch jobs
- 8.3.5 Non-restarting jobs
- 8.3.6 Partitions
- 8.3.7 Quality of Service (QoS)
- 8.3.8 Memory allocation
- 8.3.9 Email notification
- 8.3.10 Arrayjobs
- 8.3.11 OpenMPI jobs
- 8.3.12 OpenMP (OMP) jobs
- 8.3.13 Hybrid MPI-OMP jobs
- 8.3.14 Maple Grid jobs
Requesting CPU time
![]() |
FIGYELEM!
When applying for CPU time, we expect a brief justification from the HPC project managers stating that the application to be run is capable of using a GPU (except when the purpose is to use licensed software – available on the machine – that is unable to use a GPU, e.g. Gaussian, Maple). This is necessary because most of the HPC resource performance comes from GPU acceleration, so a program without acceleration that allocates CPUs, would be limiting the use of GPUs leading to underutilization. NVIDIA released a list of applications officially supported by NVIDIA GPUs but of course other programs that use GPUs are also likely to perform well on the machine.
|
Login
ssh USER@login.debrecen2.hpc.niif.hu
If a non-default key is used, it must be specified with the -i KEY option (SSH and SCP commands).
Copying files with SCP
Download from the HOME directory and upload to the HOME directory:
Up: scp FILE USER@login.debrecen2.hpc.niif.hu: FILE Down: scp USER@login.debrecen2.hpc.niif.hu: FILE FILE
Data synchronization
Larger files / directory structures shall be synchronized using the following commands
Up: rsync -a -e ssh DIRECTORY USER@login.debrecen2.hpc.niif.hu:/home/USER Down: rsync -a -e ssh USER@login.debrecen2.hpc.niif.hu:/home/USER/DIRECTORY
The --delete option must be specified to synchronize deleted files.
User interface
short form of CWD | DEBRECEN2[login] ~ (0)$ | | | HPC station | | short machine name | exit code of the previous command
Module environment
The list of available modules is obtained with the following command:
module avail
the list of already loaded modules:
module list
You can load an application with the following command:
module load APP
The environment variables set by KIFÜ are listed by the nce command.
Data sharing for project members
To share files / directories ACLs must be set. To make the HOME directory readable by another user (OTHER):
setfacl -m u:OTHER:rx $HOME
To make a specific directory (DIRECTORY) writable:
setfacl -m u:OTHER:rxw $HOME/DIRECTORY
You can list extended rights with the following command:
getfacl $HOME/DIRECTORY
The common file system that is available for the login nodes of the supercomputers is accessible under the following path:
/mnt/fhgfs/home/$USER
Backups could be made into the shared directory with the following command:
rsync -avuP --delete $HOME/DIRECTORY /mnt/fhgfs/home/$USER
Compiling applications
Users are encouraged to try compiling needed applications in their own home directory first. If it fails for some reason, then the next step is to ask the Hungarian supercomputer users because there is a good chance that others have already run into the same problem. They can be reached at: hpc-forum at listserv.niif.hu
. You can subscribe to this mailing list [1]. You should also check the archive when looking into the issue. KIFÜ HPC support has extremely limited capacity to handle individual compiling requests but still you may contact hpc-support at niif.hu
with your problem. In the latter case please be patient for a few days while waiting for responses.
Using the SLURM scheduler
The supercomputer has a CPU hour (machine time) based schedule. The following command provides information about the status of the user's Slurm projects (Account):
sbalance
The second column (Usage) shows the machine time spent by each user, and the fourth column shows the total machine time of the account. The last two columns provide information about the maximum (Account Limit) and available machine time.
Scheduler Account Balance ---------- ----------- + ---------------- ----------- + ------------- ----------- User Usage | Account Usage | Account Limit Available (CPU hrs) ---------- ----------- + ---------------- ----------- + ------------- ----------- bob * 7 | foobar 7 | 1,000 993 alice 0 | foobar 7 | 1,000 993
Estimating CPU time
It is advisable to estimate the wall clock time before large-scale (production) runs. To do this, use the following command:
sestimate -N NODES -t WALLTIME
where NODES
is the number of nodes to be reserved and WALLTIME
is the maximum run time.
It is important to specify the wall clock time you want to reserve as accurately as possible, as the scheduler also ranks the jobs waiting to be run based on this. It is generally true that the shorter job will take place sooner. It is advisable to check the actual run time with the sacct
command afterwards.
Status information
The squeue
and the sinfo
command provide information about the general state of the cluster. Each job submitted is assigned a unique identification number (JOBID). Knowing this, we can ask for more information. Characteristics of the submitted or already running job:
scontrol show job JOBID
Each job is also put into a so-called accounting database. From this you can retrieve the characteristics of the jobs you have run and the statistics of resource usage. You can view detailed statistics with the following command:
sacct -l -j JOBID
The following command provides information about the memory used:
smemory JOBID
The next one shows disk usage:
sdisk JOBID
SLURM warnings
Resources / AssociationResourceLimit - Waiting for a resource AssociationJobLimit / QOSJobLimit - Not enough CPU time or maximum CPU number is reserved Priority - Waiting due to low priority
In the latter case, the time to be reserved by the job must be reduced. Jobs for a given project can run on up to 512 CPUs at a given time.
Checking licenses
Az elérhető és éppen használt licenszekről a következő parancs ad információt:
slicenses
Checking maintenance
In the maintenance time window, the scheduler does not start new jobs, but jobs could still be submitted. The following command provides information on maintenance dates:
sreservations
Aggregate consumption
You can retrieve the CPU minutes consumed up to one month ago with the following command:
susage
Total consumption
If you want to know how much CPU time you have been using for a certain period, you can query it with this command:
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01
Submitting jobs
It is possible to run applications on supercomputers in batch mode. This means that for each run, a job script must be created that includes a description of the resources required and the commands required to run. Scheduler parameters (resource requirements) must be specified with the #SBATCH
directive.
Mandatory parameters
The following parameters must be specified in each case:
#!/bin/bash #SBATCH -A ACCOUNT #SBATCH --job-name=NAME #SBATCH --time=TIME
where ACCOUNT
is the name of the account to be charged (your available accounts are indicated by the sbalance command), NAME
is the short name of the job, and TIME
is the maximum wall clock time (DD-HH:MM:SS).
The following time formats can be used:
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
Reservation of GPUs
GPUs are reserved using the following directive:
#SBATCH --gres=gpu:N
N
specifies the number of GPUs / node, which can be 1, 2, and a maximum of 3.
Interactive use
You can submit short interactive jobs with the 'srun' command, e.g.
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP
Submitting batch jobs
To submit jobs use the following command:
sbatch slurm.sh
On successful submission you get the following output:
Submitted batch job JOBID
ahol a JOBID
a feladat egyedi azonosítószáma.
The following command stops the job:
scancel JOBID
Non-restarting jobs
For non-restarting jobs, the following directive should be used:
#SBATCH --no-requeue
Partitions
There are two non-overlapping queues (partitions) on the supercomputer: the prod-gpu-k40
queue and the prod-gpu-k20
queue. Both are for production purposes, the first featuring CN machines with Nvidia K40x GPUs and the second with Nvidia K20x GPUs. The default queue is prod-gpu-k20
. The prod-gpu-k40 partition can be selected with the following directive:
#SBATCH --partition=prod-gpu-k40
Quality of Service (QoS)
The default quality of the service is normal
, i.e. it cannot be interrupted.
High priority
High-priority jobs can run for up to 24 hours and are charged for twice the time in return for prioritizing these jobs.
#SBATCH --qos=fast
Low priority
It is also possible to post low-priority jobs. Such jobs can be interrupted at any time by any normal priority job, in exchange for being charged for only half the machine time spent. Interrupted jobs are automatically rescheduled. Only submit jobs with low priority that can withstand random interruptions and save their status regularly (checkpoint) so that they could be quickly restarted.
#SBATCH --qos=lowpri
Memory allocation
By default, 1 CPU core is assigned 1000 MB of memory but more can be requested with the following directive:
#SBATCH --mem-per-cpu=MEMORY
where MEMORY
is specified in MB. The maximum memory / core can be 7800 MB.
Email notification
Send mail when job status changes (start, stop, error):
#SBATCH --mail-type=ALL #SBATCH --mail-user=EMAIL
where EMAIL
is the email address to be notified.
Arrayjobs
Arrayjobs are needed when a single threaded (serial) application is to be run in many instances (with different parameters) at once. For instances, the scheduler stores the unique identifier in the SLURM_ARRAY_TASK_ID
environment variable. By querying this, the threads of the array job can be separated. The outputs of the threads are written to the slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out
files. The scheduler performs the upload according to a tight pack. You may want to select the number of threads as a multiple of the number of processors in this case too. information
#!/bin/bash #SBATCH -A ACCOUNT #SBATCH --job-name=array #SBATCH --time=24:00:00 #SBATCH --array=1-96 srun envtest.sh
OpenMPI jobs
For MPI jobs, you must also specify the number of MPI processes starting on each node (#SBATCH --ntasks-per-node=
). In the most common case this is the number of CPU cores of a single node. The parallel program must be started with the mpirun
command.
#!/bin/bash #SBATCH -A ACCOUNT #SBATCH --job-name=mpi #SBATCH -N 2 #SBATCH --ntasks-per-node=8 #SBATCH --time=12:00:00 mpirun --report-pid ${TMPDIR}/mpirun.pid PROGRAM
OpenMPI FAQ: http://www.open-mpi.org/faq
OpenMP (OMP) jobs
A maximum of 1 node can be reserved for OpenMP parallel applications. The number of OMP threads must be specified with the OMP_NUM_THREADS
environment variable. The variable must either be set before the application (see example) or exported before the start command:
export OMP_NUM_THREADS=8
In the following example, we have assigned 8 CPU cores to a task, the 8 CPU cores must be on one node. The number of CPU cores is included in the SLURM_CPUS_PER_TASK
variable, and it also sets the number of OMP threads.
User Alice launches an 8-thread OMP application at the expense of the foobar account for a maximum of 6 hours.
#!/bin/bash #SBATCH -A foobar #SBATCH --job-name=omp #SBATCH --time=06:00:00 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8 OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./a.out
Hybrid MPI-OMP jobs
We speak of a hybrid MPI-OMP mode when the parallel application uses both MPI and OMP. It is worth noting that MKL calls of programs linked with Intel MKL are OpenMP-capable. In general, the following distribution is recommended: the number of MPI processes from 1 to the number of CPU sockets in one node, the OMP threads to be the total number of CPU core numbers in one node, or half, or quarter (as appropriate). For the job script the parameters of the above two modes must be combined.
In the following example, we start 2 nodes and 1-1 task per node with 10 threads per task. User Alice submitted a hybrid job to 2 nodes for 8 hours at the expense of the foobar account. Only 1 MPI process runs on one node at a time, which uses 8 OMP threads per node. The 2 machines run a total of 2 MPI processes and 2 x 8 OMP threads.
#!/bin/bash #SBATCH -A foobar #SBATCH --job-name=mpiomp #SBATCH --time=08:00:00 #SBATCH -N 2 #SBATCH --ntasks=2 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=8 #SBATCH -o slurm.out export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK mpirun ./a.out
Maple Grid jobs
Maple can be run on 1 node - like OMP tasks. You must also load the maple module to use it. Maple works in client-server mode so you must also start the grid server (${MAPLE}/toolbox/Grid/bin/startserver
) before running the Maple job. This application requires a license, which must be specified in the job script (#SBATCH --licenses=maplegrid:1
). The Maple job must be started with the ${MAPLE}/toolbox/Grid/bin/joblauncher
command.
User Alice starts Maple Grid for 6 hours from the foobar account:
#!/bin/bash #SBATCH -A foobar #SBATCH --job-name=maple #SBATCH -N 1 #SBATCH --ntasks-per-node=16 #SBATCH --time=06:00:00 #SBATCH -o slurm.out #SBATCH --licenses=maplegrid:1 module load maple ${MAPLE}/toolbox/Grid/bin/startserver ${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl