https://wiki.niif.hu/api.php?action=feedcontributions&user=Itamas%28AT%29niif.hu&feedformat=atomKIFÜ Wiki - Szerkesztő közreműködései [hu]2024-03-29T05:53:41ZSzerkesztő közreműködéseiMediaWiki 1.30.0https://wiki.niif.hu/index.php?title=Privacy_Policy&diff=4920Privacy Policy2022-02-23T11:41:03Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>[[https://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en Back to the Leo Wiki page]]<br />
<br />
=== Privacy policy of HPC services ===<br />
This privacy policy explains how the LEO HPC cluster operational team (hereafter referred to as "we") collects, uses and manages users' data. Before going into details, we stress that we are committed to securing and respecting your privacy, in other words, the protection and security of your data in compliance with all the relevant regulations, as well as cultural, moral, and ethical norms.<br />
<br />
=== Official information ===<br />
<br />
* Name of the data controller: Governmental Agency for IT Development <br />
* Short name: KIFÜ <br />
* Address: Váci út 35, 1134 Budapest, Hungary <br />
* Tax number: 15598316-2-41 <br />
* Registration number: 598316 <br />
* Statistical number: 15598316 8411 312 01 <br />
* Website: https://kifu.gov.hu <br />
* Representative: President Zoltán Szijártó <br />
* E-mail: '''info@kifu.gov.hu''' <br />
* Phone number: +36 1 795 2861 <br />
* Data protection officer: Dr. Mária Kunhegyi <br />
* E-mail of data protection oficer: '''adatvedelem@kifu.gov.hu''' <br />
<br />
=== Legal framework of Data privacy ===<br />
* Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), hereinafter referred to as the GDPR<br />
* Act CXII of 2011 on the right to information self-determination and freedom of information law of Hungary<br />
* 5/20011. (II.3) Government Decree on the National Information Infrastructure Development Program<br />
* 268/2010. (XII.3) Government Decree on the : Governmental Agency for IT Development <br />
The purpose of this data privacy policy is to ensure that the supercomputer <br />
capacity provided by the KIFÜ and the related services are handling the data in accordance with Article 12 (1) of the GDPR.<br />
<br />
=== Legal basis for data management ===<br />
For the processing of personal data provided by the user is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller as defined Article 6 1 (e) of the GDPR.<br />
KIFÜ provides the service on 5/2011. (II.3) on the National Information Infrastructure Development Program to the group of beneficiaries specified in the Government Decree.<br />
<br />
=== To whom the policy applies ===<br />
The LEO HPC cluster privacy policy, as appropriate and in the relevant sections, applies to all users of the LEO HPC cluster.<br />
<br />
=== How, when we collect your data and what purpose ===<br />
'''We collect the following personal data:'''<br />
* Personal information received during federated authentication:<br />
# eduPersonPrincipalName: a unique identifier in the federation<br />
# mail: the email address used in the Service<br />
# sn: User's last name<br />
# givenName: the first name of the User<br />
<br />
'''We use this data to identify you and send possible reports via e-mail.'''<br />
----------------------------------------------------------------------------<br />
<br />
* Personal data processed in connection with a project application: name; e-mail address; institution, department, telephone number.<br />
<br />
'''We us this data to contact project submitter after project approval'''<br />
-------------------------------------------------------------------------<br />
<br />
* Profile data provided by the user on the portal: title, name, affiliation, organization, department, address, phone number, email.<br />
<br />
'''We use this data to contact the user in case of problem with the project'''<br />
------------------------------------------------------------------------------<br />
<br />
* Data management related to error reporting: telephone number, e-mail<br />
<br />
'''We use this data to contact the user in case he or she reported a problem.'''<br />
--------------------------------------------------------------------------------<br />
* Log entries: The log entries generated during the use of the Service (eg. IP addresses, web server logs) KIFÜ retains for 30 days.<br />
----------------------------------------------------------------------------------------------------------------------------------------------------------------<br />
Whenever you interact with the LEO HPC cluster, we also automatically receive and record information on our server log files, which may include network addresses you use to connect to the system, device identification, the type of client and/or device you're using to access our system, the time and date you accessed the cluster, and the time spent logged into the service. We may store details of any aspects of the cluster use, including, for example, the amount of processor time and storage space used by user accounts. This may include some personally identifiable data, such as the network addresses or type of device you use to connect to the system. We will use this information to help us manage and administer the cluster, to review, analyze and improve its performance, security, its patterns of use, and to plan for future upgrades.<br />
<br />
We reserve the right to monitor the use of the cluster by your account, including anything transmitted over the Internet, and any data or software stored on our systems, in order to ensure that you and all the other users are complying with the Terms of use and Conditions of access and not breaking the law.<br />
<br />
=== Where and how your data is stored ===<br />
All LEO HPC cluster equipment is hosted on a single site at KIFÜ, and personal information is stored and processed only on this equipment. Also, we have implemented appropriate technical and organizational security measures designed to protect the security of any personal information we collect and process.<br />
<br />
We aim to protect your account's privacy and other personal information we hold in our records; however, we cannot guarantee complete security. Unauthorized entry or use, hardware or software failure, and other factors may compromise user information security at any time.<br />
<br />
=== Who do we share your data with? ===<br />
Collected personal information will be available to the LEO HPC operational team only. Project Resource usage and statistical information might be shared with third parties and reported in publicly available documents.<br />
<br />
=== How long does the cluster keeps your data? ===<br />
After the end of your use of the LEO HPC cluster, we have no need to process your personal information, and it will be deleted, or, if this is not possible (for example, because your personal information has been stored in backup archives), we will securely store your personal information and isolate it from any further processing until deletion is possible. However, data that could be used to identify you directly will not be retained for longer than five years.<br />
<br />
=== Your rights ===<br />
As a person whose data we hold, in general, your rights are outlined by Europe's General Data Protection Regulation (GDPR) whether you are resident of European Economic Area (EEA) or not, and in certain circumstances includes:<br />
<br />
=== Right of access ===<br />
<br />
You have the right to submit a request to the LEO HPC operational team to obtain information on whether we are processing your data. In the case that your data is processed, you may request at any time a copy of the personal data we hold about you.<br />
<br />
=== Right to correction ===<br />
<br />
All data that we process is reviewed and verified as accurate, whenever possible, and updated regularly. If you believe that the data is incorrect or has changed, you have the right to request that we correct those data.<br />
<br />
=== Right to be forgotten or right to erasure ===<br />
<br />
You have the right to request from the LEO HPC operational team to delete your data unless there is a legal obligation to keep that data or there is a legal basis for refusing a request for deletion of data.<br />
<br />
=== Right to restriction of processing ===<br />
<br />
You have the right to obtain from the controller restriction of processing where one of the GDPR Article 18 (https://gdpr-info.eu/art-18-gdpr/) conditions is met.<br />
<br />
=== Right to data portability ===<br />
<br />
You have the right to request the transfer of data to another operator if the processing is based on your consent or by automated means. We make sure that all personal information is readily available and is in a structured, commonly used, and machine-readable format.<br />
<br />
=== Right to object ===<br />
<br />
You have the right to object to the way we handle your data if the processing of the data is based on the Scientific Computing Laboratory's legitimate interest, or in carrying out a task in the public interest/for an official authority. Besides, you may object to the processing of your data for direct marketing purposes, which includes profiling to the extent that it is related to such direct marketing, as well as scientific/historical research and statistics.<br />
<br />
=== Right to withdraw the consent ===<br />
<br />
If the data processing is based on your consent, you may withdraw the consent at any time.<br />
<br />
Please note that we will ask you to verify your identity through several forms before responding to such requests. Requests will be addressed as soon as possible.<br />
<br />
=== Contact information ===<br />
If you have any questions about this privacy policy, please contact us by e-mail: '''adatvedelem@kifu.gov.hu'''<br />
<br />
Changes to the privacy policy<br />
We regularly revise and update personal data processing information following changes in internal operational procedures or based on obligations arising from relevant regulations. Therefore, we reserve the right to modify and update the present policy whenever deemed necessary, whereas any changes shall become effective after publication on this [[https://wiki.niif.hu/index.php?title=Privacy_Policy Privacy Policy]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Privacy_Policy&diff=4919Privacy Policy2022-02-22T10:48:02Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>[[https://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en Back to the Leo Wiki page]]<br />
<br />
=== Privacy policy of HPC services ===<br />
This privacy policy explains how the LEO HPC cluster operational team (hereafter referred to as "we") collects, uses and manages users' data. Before going into details, we stress that we are committed to securing and respecting your privacy, in other words, the protection and security of your data in compliance with all the relevant regulations, as well as cultural, moral, and ethical norms.<br />
<br />
=== Official information ===<br />
<br />
* Name of the data controller: Governmental Agency for IT Development <br />
* Short name: KIFÜ <br />
* Address: Váci út 35, 1134 Budapest, Hungary <br />
* Tax number: 15598316-2-41 <br />
* Registration number: 598316 <br />
* Statistical number: 15598316 8411 312 01 <br />
* Website: https://kifu.gov.hu <br />
* Representative: President Zoltán Szijártó <br />
* E-mail: '''info@kifu.gov.hu''' <br />
* Phone number: +36 1 795 2861 <br />
* Data protection officer: Dr. Mária Kunhegyi <br />
* E-mail of data protection oficer: '''adatvedelem@kifu.gov.hu''' <br />
<br />
=== Legal framework of Data privacy ===<br />
* Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), hereinafter referred to as the GDPR<br />
* Act CXII of 2011 on the right to information self-determination and freedom of information law of Hungary<br />
* 5/20011. (II.3) Government Decree on the National Information Infrastructure Development Program<br />
* 268/2010. (XII.3) Government Decree on the : Governmental Agency for IT Development <br />
The purpose of this data privacy policy is to ensure that the supercomputer <br />
capacity provided by the KIFÜ and the related services are handling the data in accordance with Article 12 (1) of the GDPR.<br />
<br />
=== Legal basis for data management ===<br />
For the processing of personal data provided by the user is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller as defined Article 6 1 (e) of the GDPR.<br />
KIFÜ provides the service on 5/2011. (II.3) on the National Information Infrastructure Development Program to the group of beneficiaries specified in the Government Decree.<br />
<br />
=== To whom the policy applies ===<br />
The LEO HPC cluster privacy policy, as appropriate and in the relevant sections, applies to all users of the LEO HPC cluster.<br />
<br />
=== How, when we collect your data and what purpose ===<br />
We collect the following personal data:<br />
# Personal information received during federated authentication:<br />
* eduPersonPrincipalName: a unique identifier in the federation<br />
* mail: the email address used in the Service<br />
* sn: User's last name<br />
* givenName: the first name of the User<br />
We use this data to identify you and send possible reports via e-mail.<br />
# Personal data processed in connection with a project application: name; e-mail address; institution, department, telephone number. We us this data to contact project submitter after project approval<br />
# Profile data provided by the user on the portal: title, name, affiliation, organization, department, address, phone number, email. We use this data to contact the user in case of problem with the project<br />
# Data management related to error reporting: telephone number, e-mail We use this data to contact the user in case he or she reported a problem.<br />
# Log entries: The log entries generated during the use of the Service (eg. IP addresses, web server logs) KIFÜ retains for 30 days.<br />
<br />
Whenever you interact with the LEO HPC cluster, we also automatically receive and record information on our server log files, which may include network addresses you use to connect to the system, device identification, the type of client and/or device you're using to access our system, the time and date you accessed the cluster, and the time spent logged into the service. We may store details of any aspects of the cluster use, including, for example, the amount of processor time and storage space used by user accounts. This may include some personally identifiable data, such as the network addresses or type of device you use to connect to the system. We will use this information to help us manage and administer the cluster, to review, analyze and improve its performance, security, its patterns of use, and to plan for future upgrades.<br />
<br />
We reserve the right to monitor the use of the cluster by your account, including anything transmitted over the Internet, and any data or software stored on our systems, in order to ensure that you and all the other users are complying with the Terms of use and Conditions of access and not breaking the law.<br />
<br />
=== Where and how your data is stored ===<br />
All LEO HPC cluster equipment is hosted on a single site at KIFÜ, and personal information is stored and processed only on this equipment. Also, we have implemented appropriate technical and organizational security measures designed to protect the security of any personal information we collect and process.<br />
<br />
We aim to protect your account's privacy and other personal information we hold in our records; however, we cannot guarantee complete security. Unauthorized entry or use, hardware or software failure, and other factors may compromise user information security at any time.<br />
<br />
=== Who do we share your data with? ===<br />
Collected personal information will be available to the LEO HPC operational team only. Project Resource usage and statistical information might be shared with third parties and reported in publicly available documents.<br />
<br />
=== How long does the cluster keeps your data? ===<br />
After the end of your use of the LEO HPC cluster, we have no need to process your personal information, and it will be deleted, or, if this is not possible (for example, because your personal information has been stored in backup archives), we will securely store your personal information and isolate it from any further processing until deletion is possible. However, data that could be used to identify you directly will not be retained for longer than five years.<br />
<br />
=== Your rights ===<br />
As a person whose data we hold, in general, your rights are outlined by Europe's General Data Protection Regulation (GDPR) whether you are resident of European Economic Area (EEA) or not, and in certain circumstances includes:<br />
<br />
=== Right of access ===<br />
<br />
You have the right to submit a request to the LEO HPC operational team to obtain information on whether we are processing your data. In the case that your data is processed, you may request at any time a copy of the personal data we hold about you.<br />
<br />
=== Right to correction ===<br />
<br />
All data that we process is reviewed and verified as accurate, whenever possible, and updated regularly. If you believe that the data is incorrect or has changed, you have the right to request that we correct those data.<br />
<br />
=== Right to be forgotten or right to erasure ===<br />
<br />
You have the right to request from the LEO HPC operational team to delete your data unless there is a legal obligation to keep that data or there is a legal basis for refusing a request for deletion of data.<br />
<br />
=== Right to restriction of processing ===<br />
<br />
You have the right to obtain from the controller restriction of processing where one of the GDPR Article 18 (https://gdpr-info.eu/art-18-gdpr/) conditions is met.<br />
<br />
=== Right to data portability ===<br />
<br />
You have the right to request the transfer of data to another operator if the processing is based on your consent or by automated means. We make sure that all personal information is readily available and is in a structured, commonly used, and machine-readable format.<br />
<br />
=== Right to object ===<br />
<br />
You have the right to object to the way we handle your data if the processing of the data is based on the Scientific Computing Laboratory's legitimate interest, or in carrying out a task in the public interest/for an official authority. Besides, you may object to the processing of your data for direct marketing purposes, which includes profiling to the extent that it is related to such direct marketing, as well as scientific/historical research and statistics.<br />
<br />
=== Right to withdraw the consent ===<br />
<br />
If the data processing is based on your consent, you may withdraw the consent at any time.<br />
<br />
Please note that we will ask you to verify your identity through several forms before responding to such requests. Requests will be addressed as soon as possible.<br />
<br />
=== Contact information ===<br />
If you have any questions about this privacy policy, please contact us by e-mail: '''adatvedelem@kifu.gov.hu'''<br />
<br />
Changes to the privacy policy<br />
We regularly revise and update personal data processing information following changes in internal operational procedures or based on obligations arising from relevant regulations. Therefore, we reserve the right to modify and update the present policy whenever deemed necessary, whereas any changes shall become effective after publication on this [[https://wiki.niif.hu/index.php?title=Privacy_Policy Privacy Policy]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Access_Policy&diff=4918Access Policy2022-02-22T10:47:39Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>[[https://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en Back to the Leo Wiki page]]<br />
<br />
=== Summary of acceptable use policy ===<br />
The LEO HPC cluster (hereafter referred to as LEO, "cluster", "resources", "system") is used among multiple users in a shared way, thus your actions can have serious impact on the system and can affect other users. The following policies and rules are in place to ensure proper and fair use of the computing resources, as well as to prevent unauthorized or malicious use.<br />
<br />
=== User accountability ===<br />
LEO users are accountable for their actions. Violations of policy, procedure, and security rules may result in applicable administrative sanctions or legal actions. Users are requested to report any computer security issues and incidents of possible misuse or violation of the account policies to <br />
<pre>ugyfelszolgalat@kifu.hu</pre><br />
<br />
=== Resource use ===<br />
The use of LEO is restricted to academic and research purposes only. Only users affiliated with organisations in partnership with KIFÜ may get accounts. There may be exceptions for this rule in special cases, e.g. an urgent research/development of high impact. Such approval is at the sole discretion of KIFÜ.<br />
<br />
Only users with a valid account could use LEO (within the limits of the received quotas) and only for tasks related to their scientific project with which they applied for resources. Users are responsible for using resources in an efficient, effective, ethical, and lawful manner. LEO resources should be used with care to avoid using more capacity than needed. The use of (or the attempt of using) resources beyond the limits of the quotas received is prohibited. The use of resources for personal or private benefit is prohibited. The use of resources to support illegal, fraudulent, or malicious activities is prohibited. Users are required to use resources in accordance with these rules, efficiently, and in a way that does not interfere with or impair the work of other users.<br />
<br />
Users are solely responsible for all activities carried out by their account and for the resources made available to them.<br />
<br />
In addition to complying with the provisions of this policy users must obey the instructions of the LEO operational team at all times.<br />
<br />
=== Account usage ===<br />
All accounts on LEO are personal. Users of LEO are not permitted to share their accounts with each other or with persons that do not have an account. To request special data-sharing arrangements among a group of users, you should send a message to<br />
<pre>hpc-support@kifu.hu</pre><br />
The access to LEO is permitted only via a secure communication channel (e.g. SSH) to the respective master or gateway login node. Compute nodes are intended for handling heavy computational work and must be accessed via the resource management system (Slurm) only. Direct access to compute nodes is not permitted.<br />
It is forbidden to use LEO for activities that are not closely related to the research project. (e.g. e-mail, web browsing etc.) Deployment/operation of network services is also prohibited.<br />
Users should not make copies of system configuration files (e.g. the password file) for unauthorized personal use, nor to provide such information to other users or outside personnel.<br />
<br />
=== Software and data ===<br />
Users should not attempt to access any data or programs contained on the cluster system for which they do not have authorization or explicit consent of the owner of the data/program. Users shall not download, install, or run security-related programs or utilities that reveal weaknesses in the security of the LEO systems. Installation of software on the cluster must include a valid license (if applicable). No software will be installed on the cluster without prior proof of license eligibility. All such information has to be communicated to LEO operation team by sending a message to<br />
<pre>hpc-support@kifu.hu</pre><br />
<br />
LEO resources should be used to store only data directly related to the research being undertaken by the user on the cluster.<br />
<br />
Resources are never to be used for storing any data not directly related to the current research, unless explicitly approved by the LEO operational team.<br />
<br />
Data stored on LEO are not subject to backup.<br />
<br />
Users are responsible for securing and backing up any data copied to or generated on LEO.<br />
<br />
=== Contact information ===<br />
If you have any questions about this privacy policy, please contact us by e-mail: <br />
<pre>hpc-support@kifu.hu</pre><br />
<br />
=== Changes to the acceptable use policy ===<br />
The LEO cluster acceptable use policy will be amended and reviewed periodically, and KIFÜ reserves the right to modify and update the present terms whenever deemed necessary, whereas any changes shall become effective after publication on this [[https://wiki.niif.hu/index.php?title=Access_Policy Access Policy]].</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Access_Policy&diff=4917Access Policy2022-02-22T10:45:42Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>=== Summary of acceptable use policy ===<br />
The LEO HPC cluster (hereafter referred to as LEO, "cluster", "resources", "system") is used among multiple users in a shared way, thus your actions can have serious impact on the system and can affect other users. The following policies and rules are in place to ensure proper and fair use of the computing resources, as well as to prevent unauthorized or malicious use.<br />
<br />
=== User accountability ===<br />
LEO users are accountable for their actions. Violations of policy, procedure, and security rules may result in applicable administrative sanctions or legal actions. Users are requested to report any computer security issues and incidents of possible misuse or violation of the account policies to <br />
<pre>ugyfelszolgalat@kifu.hu</pre><br />
<br />
=== Resource use ===<br />
The use of LEO is restricted to academic and research purposes only. Only users affiliated with organisations in partnership with KIFÜ may get accounts. There may be exceptions for this rule in special cases, e.g. an urgent research/development of high impact. Such approval is at the sole discretion of KIFÜ.<br />
<br />
Only users with a valid account could use LEO (within the limits of the received quotas) and only for tasks related to their scientific project with which they applied for resources. Users are responsible for using resources in an efficient, effective, ethical, and lawful manner. LEO resources should be used with care to avoid using more capacity than needed. The use of (or the attempt of using) resources beyond the limits of the quotas received is prohibited. The use of resources for personal or private benefit is prohibited. The use of resources to support illegal, fraudulent, or malicious activities is prohibited. Users are required to use resources in accordance with these rules, efficiently, and in a way that does not interfere with or impair the work of other users.<br />
<br />
Users are solely responsible for all activities carried out by their account and for the resources made available to them.<br />
<br />
In addition to complying with the provisions of this policy users must obey the instructions of the LEO operational team at all times.<br />
<br />
=== Account usage ===<br />
All accounts on LEO are personal. Users of LEO are not permitted to share their accounts with each other or with persons that do not have an account. To request special data-sharing arrangements among a group of users, you should send a message to<br />
<pre>hpc-support@kifu.hu</pre><br />
The access to LEO is permitted only via a secure communication channel (e.g. SSH) to the respective master or gateway login node. Compute nodes are intended for handling heavy computational work and must be accessed via the resource management system (Slurm) only. Direct access to compute nodes is not permitted.<br />
It is forbidden to use LEO for activities that are not closely related to the research project. (e.g. e-mail, web browsing etc.) Deployment/operation of network services is also prohibited.<br />
Users should not make copies of system configuration files (e.g. the password file) for unauthorized personal use, nor to provide such information to other users or outside personnel.<br />
<br />
=== Software and data ===<br />
Users should not attempt to access any data or programs contained on the cluster system for which they do not have authorization or explicit consent of the owner of the data/program. Users shall not download, install, or run security-related programs or utilities that reveal weaknesses in the security of the LEO systems. Installation of software on the cluster must include a valid license (if applicable). No software will be installed on the cluster without prior proof of license eligibility. All such information has to be communicated to LEO operation team by sending a message to<br />
<pre>hpc-support@kifu.hu</pre><br />
<br />
LEO resources should be used to store only data directly related to the research being undertaken by the user on the cluster.<br />
<br />
Resources are never to be used for storing any data not directly related to the current research, unless explicitly approved by the LEO operational team.<br />
<br />
Data stored on LEO are not subject to backup.<br />
<br />
Users are responsible for securing and backing up any data copied to or generated on LEO.<br />
<br />
=== Contact information ===<br />
If you have any questions about this privacy policy, please contact us by e-mail: <br />
<pre>hpc-support@kifu.hu</pre><br />
<br />
=== Changes to the acceptable use policy ===<br />
The LEO cluster acceptable use policy will be amended and reviewed periodically, and KIFÜ reserves the right to modify and update the present terms whenever deemed necessary, whereas any changes shall become effective after publication on this [[https://wiki.niif.hu/index.php?title=Access_Policy Access Policy]].</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Privacy_Policy&diff=4916Privacy Policy2022-02-22T10:44:27Z<p>Itamas(AT)niif.hu: Új oldal, tartalma: „=== Privacy policy of HPC services === This privacy policy explains how the LEO HPC cluster operational team (hereafter referred to as "we") collects, uses and manages…”</p>
<hr />
<div>=== Privacy policy of HPC services ===<br />
This privacy policy explains how the LEO HPC cluster operational team (hereafter referred to as "we") collects, uses and manages users' data. Before going into details, we stress that we are committed to securing and respecting your privacy, in other words, the protection and security of your data in compliance with all the relevant regulations, as well as cultural, moral, and ethical norms.<br />
<br />
=== Official information ===<br />
<br />
* Name of the data controller: Governmental Agency for IT Development <br />
* Short name: KIFÜ <br />
* Address: Váci út 35, 1134 Budapest, Hungary <br />
* Tax number: 15598316-2-41 <br />
* Registration number: 598316 <br />
* Statistical number: 15598316 8411 312 01 <br />
* Website: https://kifu.gov.hu <br />
* Representative: President Zoltán Szijártó <br />
* E-mail: '''info@kifu.gov.hu''' <br />
* Phone number: +36 1 795 2861 <br />
* Data protection officer: Dr. Mária Kunhegyi <br />
* E-mail of data protection oficer: '''adatvedelem@kifu.gov.hu''' <br />
<br />
=== Legal framework of Data privacy ===<br />
* Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation), hereinafter referred to as the GDPR<br />
* Act CXII of 2011 on the right to information self-determination and freedom of information law of Hungary<br />
* 5/20011. (II.3) Government Decree on the National Information Infrastructure Development Program<br />
* 268/2010. (XII.3) Government Decree on the : Governmental Agency for IT Development <br />
The purpose of this data privacy policy is to ensure that the supercomputer <br />
capacity provided by the KIFÜ and the related services are handling the data in accordance with Article 12 (1) of the GDPR.<br />
<br />
=== Legal basis for data management ===<br />
For the processing of personal data provided by the user is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller as defined Article 6 1 (e) of the GDPR.<br />
KIFÜ provides the service on 5/2011. (II.3) on the National Information Infrastructure Development Program to the group of beneficiaries specified in the Government Decree.<br />
<br />
=== To whom the policy applies ===<br />
The LEO HPC cluster privacy policy, as appropriate and in the relevant sections, applies to all users of the LEO HPC cluster.<br />
<br />
=== How, when we collect your data and what purpose ===<br />
We collect the following personal data:<br />
# Personal information received during federated authentication:<br />
* eduPersonPrincipalName: a unique identifier in the federation<br />
* mail: the email address used in the Service<br />
* sn: User's last name<br />
* givenName: the first name of the User<br />
We use this data to identify you and send possible reports via e-mail.<br />
# Personal data processed in connection with a project application: name; e-mail address; institution, department, telephone number. We us this data to contact project submitter after project approval<br />
# Profile data provided by the user on the portal: title, name, affiliation, organization, department, address, phone number, email. We use this data to contact the user in case of problem with the project<br />
# Data management related to error reporting: telephone number, e-mail We use this data to contact the user in case he or she reported a problem.<br />
# Log entries: The log entries generated during the use of the Service (eg. IP addresses, web server logs) KIFÜ retains for 30 days.<br />
<br />
Whenever you interact with the LEO HPC cluster, we also automatically receive and record information on our server log files, which may include network addresses you use to connect to the system, device identification, the type of client and/or device you're using to access our system, the time and date you accessed the cluster, and the time spent logged into the service. We may store details of any aspects of the cluster use, including, for example, the amount of processor time and storage space used by user accounts. This may include some personally identifiable data, such as the network addresses or type of device you use to connect to the system. We will use this information to help us manage and administer the cluster, to review, analyze and improve its performance, security, its patterns of use, and to plan for future upgrades.<br />
<br />
We reserve the right to monitor the use of the cluster by your account, including anything transmitted over the Internet, and any data or software stored on our systems, in order to ensure that you and all the other users are complying with the Terms of use and Conditions of access and not breaking the law.<br />
<br />
=== Where and how your data is stored ===<br />
All LEO HPC cluster equipment is hosted on a single site at KIFÜ, and personal information is stored and processed only on this equipment. Also, we have implemented appropriate technical and organizational security measures designed to protect the security of any personal information we collect and process.<br />
<br />
We aim to protect your account's privacy and other personal information we hold in our records; however, we cannot guarantee complete security. Unauthorized entry or use, hardware or software failure, and other factors may compromise user information security at any time.<br />
<br />
=== Who do we share your data with? ===<br />
Collected personal information will be available to the LEO HPC operational team only. Project Resource usage and statistical information might be shared with third parties and reported in publicly available documents.<br />
<br />
=== How long does the cluster keeps your data? ===<br />
After the end of your use of the LEO HPC cluster, we have no need to process your personal information, and it will be deleted, or, if this is not possible (for example, because your personal information has been stored in backup archives), we will securely store your personal information and isolate it from any further processing until deletion is possible. However, data that could be used to identify you directly will not be retained for longer than five years.<br />
<br />
=== Your rights ===<br />
As a person whose data we hold, in general, your rights are outlined by Europe's General Data Protection Regulation (GDPR) whether you are resident of European Economic Area (EEA) or not, and in certain circumstances includes:<br />
<br />
=== Right of access ===<br />
<br />
You have the right to submit a request to the LEO HPC operational team to obtain information on whether we are processing your data. In the case that your data is processed, you may request at any time a copy of the personal data we hold about you.<br />
<br />
=== Right to correction ===<br />
<br />
All data that we process is reviewed and verified as accurate, whenever possible, and updated regularly. If you believe that the data is incorrect or has changed, you have the right to request that we correct those data.<br />
<br />
=== Right to be forgotten or right to erasure ===<br />
<br />
You have the right to request from the LEO HPC operational team to delete your data unless there is a legal obligation to keep that data or there is a legal basis for refusing a request for deletion of data.<br />
<br />
=== Right to restriction of processing ===<br />
<br />
You have the right to obtain from the controller restriction of processing where one of the GDPR Article 18 (https://gdpr-info.eu/art-18-gdpr/) conditions is met.<br />
<br />
=== Right to data portability ===<br />
<br />
You have the right to request the transfer of data to another operator if the processing is based on your consent or by automated means. We make sure that all personal information is readily available and is in a structured, commonly used, and machine-readable format.<br />
<br />
=== Right to object ===<br />
<br />
You have the right to object to the way we handle your data if the processing of the data is based on the Scientific Computing Laboratory's legitimate interest, or in carrying out a task in the public interest/for an official authority. Besides, you may object to the processing of your data for direct marketing purposes, which includes profiling to the extent that it is related to such direct marketing, as well as scientific/historical research and statistics.<br />
<br />
=== Right to withdraw the consent ===<br />
<br />
If the data processing is based on your consent, you may withdraw the consent at any time.<br />
<br />
Please note that we will ask you to verify your identity through several forms before responding to such requests. Requests will be addressed as soon as possible.<br />
<br />
=== Contact information ===<br />
If you have any questions about this privacy policy, please contact us by e-mail: '''adatvedelem@kifu.gov.hu'''<br />
<br />
Changes to the privacy policy<br />
We regularly revise and update personal data processing information following changes in internal operational procedures or based on obligations arising from relevant regulations. Therefore, we reserve the right to modify and update the present policy whenever deemed necessary, whereas any changes shall become effective after publication on this [[https://wiki.niif.hu/index.php?title=Privacy_Policy Privacy Policy]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Access_Policy&diff=4915Access Policy2022-02-22T10:27:11Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>=== Summary of acceptable use policy ===<br />
The LEO HPC cluster (hereafter referred to as LEO, "cluster", "resources", "system") is used among multiple users in a shared way, thus your actions can have serious impact on the system and can affect other users. The following policies and rules are in place to ensure proper and fair use of the computing resources, as well as to prevent unauthorized or malicious use.<br />
<br />
=== User accountability ===<br />
LEO users are accountable for their actions. Violations of policy, procedure, and security rules may result in applicable administrative sanctions or legal actions. Users are requested to report any computer security issues and incidents of possible misuse or violation of the account policies to <br />
<pre>ugyfelszolgalat at kifu.hu</pre><br />
<br />
=== Resource use ===<br />
The use of LEO is restricted to academic and research purposes only. Only users affiliated with organisations in partnership with KIFÜ may get accounts. There may be exceptions for this rule in special cases, e.g. an urgent research/development of high impact. Such approval is at the sole discretion of KIFÜ.<br />
<br />
Only users with a valid account could use LEO (within the limits of the received quotas) and only for tasks related to their scientific project with which they applied for resources. Users are responsible for using resources in an efficient, effective, ethical, and lawful manner. LEO resources should be used with care to avoid using more capacity than needed. The use of (or the attempt of using) resources beyond the limits of the quotas received is prohibited. The use of resources for personal or private benefit is prohibited. The use of resources to support illegal, fraudulent, or malicious activities is prohibited. Users are required to use resources in accordance with these rules, efficiently, and in a way that does not interfere with or impair the work of other users.<br />
<br />
Users are solely responsible for all activities carried out by their account and for the resources made available to them.<br />
<br />
In addition to complying with the provisions of this policy users must obey the instructions of the LEO operational team at all times.<br />
<br />
=== Account usage ===<br />
All accounts on LEO are personal. Users of LEO are not permitted to share their accounts with each other or with persons that do not have an account. To request special data-sharing arrangements among a group of users, you should send a message to<br />
<pre>hpc-support at kifu.hu</pre><br />
The access to LEO is permitted only via a secure communication channel (e.g. SSH) to the respective master or gateway login node. Compute nodes are intended for handling heavy computational work and must be accessed via the resource management system (Slurm) only. Direct access to compute nodes is not permitted.<br />
It is forbidden to use LEO for activities that are not closely related to the research project. (e.g. e-mail, web browsing etc.) Deployment/operation of network services is also prohibited.<br />
Users should not make copies of system configuration files (e.g. the password file) for unauthorized personal use, nor to provide such information to other users or outside personnel.<br />
<br />
=== Software and data ===<br />
Users should not attempt to access any data or programs contained on the cluster system for which they do not have authorization or explicit consent of the owner of the data/program. Users shall not download, install, or run security-related programs or utilities that reveal weaknesses in the security of the LEO systems. Installation of software on the cluster must include a valid license (if applicable). No software will be installed on the cluster without prior proof of license eligibility. All such information has to be communicated to LEO operation team by sending a message to<br />
<pre>hpc-support at kifu.hu</pre><br />
<br />
LEO resources should be used to store only data directly related to the research being undertaken by the user on the cluster.<br />
<br />
Resources are never to be used for storing any data not directly related to the current research, unless explicitly approved by the LEO operational team.<br />
<br />
Data stored on LEO are not subject to backup.<br />
<br />
Users are responsible for securing and backing up any data copied to or generated on LEO.<br />
<br />
=== Contact information ===<br />
If you have any questions about this privacy policy, please contact us by e-mail: <br />
<pre>hpc-support at kifu.hu</pre><br />
<br />
=== Changes to the acceptable use policy ===<br />
The LEO cluster acceptable use policy will be amended and reviewed periodically, and KIFÜ reserves the right to modify and update the present terms whenever deemed necessary, whereas any changes shall become effective after publication on this [[https://wiki.niif.hu/index.php?title=Access_Policy Access Policy]].</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Access_Policy&diff=4914Access Policy2022-02-22T10:24:41Z<p>Itamas(AT)niif.hu: Új oldal, tartalma: „=== Summary of acceptable use policy === The LEO HPC cluster (hereafter referred to as LEO, "cluster", "resources", "system") is used among multiple users in a shared w…”</p>
<hr />
<div>=== Summary of acceptable use policy ===<br />
The LEO HPC cluster (hereafter referred to as LEO, "cluster", "resources", "system") is used among multiple users in a shared way, thus your actions can have serious impact on the system and can affect other users. The following policies and rules are in place to ensure proper and fair use of the computing resources, as well as to prevent unauthorized or malicious use.<br />
<br />
=== User accountability ===<br />
LEO users are accountable for their actions. Violations of policy, procedure, and security rules may result in applicable administrative sanctions or legal actions. Users are requested to report any computer security issues and incidents of possible misuse or violation of the account policies to '''ugyfelszolgalat at kifu.hu'''<br />
<br />
=== Resource use ===<br />
The use of LEO is restricted to academic and research purposes only. Only users affiliated with organisations in partnership with KIFÜ may get accounts. There may be exceptions for this rule in special cases, e.g. an urgent research/development of high impact. Such approval is at the sole discretion of KIFÜ.<br />
<br />
Only users with a valid account could use LEO (within the limits of the received quotas) and only for tasks related to their scientific project with which they applied for resources. Users are responsible for using resources in an efficient, effective, ethical, and lawful manner. LEO resources should be used with care to avoid using more capacity than needed. The use of (or the attempt of using) resources beyond the limits of the quotas received is prohibited. The use of resources for personal or private benefit is prohibited. The use of resources to support illegal, fraudulent, or malicious activities is prohibited. Users are required to use resources in accordance with these rules, efficiently, and in a way that does not interfere with or impair the work of other users.<br />
<br />
Users are solely responsible for all activities carried out by their account and for the resources made available to them.<br />
<br />
In addition to complying with the provisions of this policy users must obey the instructions of the LEO operational team at all times.<br />
<br />
=== Account usage ===<br />
All accounts on LEO are personal. Users of LEO are not permitted to share their accounts with each other or with persons that do not have an account. To request special data-sharing arrangements among a group of users, you should send a message to '''hpc-support at kifu.hu'''<br />
The access to LEO is permitted only via a secure communication channel (e.g. SSH) to the respective master or gateway login node. Compute nodes are intended for handling heavy computational work and must be accessed via the resource management system (Slurm) only. Direct access to compute nodes is not permitted.<br />
It is forbidden to use LEO for activities that are not closely related to the research project. (e.g. e-mail, web browsing etc.) Deployment/operation of network services is also prohibited.<br />
Users should not make copies of system configuration files (e.g. the password file) for unauthorized personal use, nor to provide such information to other users or outside personnel.<br />
<br />
=== Software and data ===<br />
Users should not attempt to access any data or programs contained on the cluster system for which they do not have authorization or explicit consent of the owner of the data/program. Users shall not download, install, or run security-related programs or utilities that reveal weaknesses in the security of the LEO systems. Installation of software on the cluster must include a valid license (if applicable). No software will be installed on the cluster without prior proof of license eligibility. All such information has to be communicated to LEO operation team by sending a message to '''hpc-support at kifu.hu'''<br />
<br />
LEO resources should be used to store only data directly related to the research being undertaken by the user on the cluster.<br />
<br />
Resources are never to be used for storing any data not directly related to the current research, unless explicitly approved by the LEO operational team.<br />
<br />
Data stored on LEO are not subject to backup.<br />
<br />
Users are responsible for securing and backing up any data copied to or generated on LEO.<br />
<br />
=== Contact information ===<br />
If you have any questions about this privacy policy, please contact us by e-mail: <br />
'''hpc-support at kifu.hu'''<br />
<br />
=== Changes to the acceptable use policy ===<br />
The LEO cluster acceptable use policy will be amended and reviewed periodically, and KIFÜ reserves the right to modify and update the present terms whenever deemed necessary, whereas any changes shall become effective after publication on this [[https://wiki.niif.hu/index.php?title=Access_Policy Access Policy]].</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en&diff=4913Debrecen2 GPU klaszter en2022-02-22T10:14:51Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>[[https://wiki.niif.hu/index.php?title=Access_Policy Access Policy]]<br />
[[https://wiki.niif.hu/index.php?title=Privacy_Policy Privacy Policy]]<br />
<br />
{| class="wikitable" border="1" <br />
|- <br />
| Cluster<br />
| Debrecen2 (Leo)<br />
|- <br />
| Type<br />
| HP SL250s<br />
|- <br />
| Core / node<br />
| 8 × 2 Xeon E5-2650v2 2.60GHz <br />
|- <br />
| GPU / node<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x<br />
|- <br />
| # of compute nodes<br />
| 84<br />
|-<br />
| Max Walltime<br />
| 7-00:00:00<br />
|-<br />
| Max core / project<br />
| 336<br />
|-<br />
| Max mem / core<br />
| 7000 MB<br />
|}<br />
<br />
=== Requesting CPU time ===<br />
<br />
{{ATTENTION| When applying for CPU time, we expect a brief justification from the HPC project managers stating that the application to be run is capable of using a GPU (except when the purpose is to use licensed software – available on the machine – that is unable to use a GPU, e.g. Gaussian, Maple). This is necessary because most of the HPC resource performance comes from GPU acceleration, so a program without acceleration that allocates CPUs, would be limiting the use of GPUs leading to underutilization. NVIDIA released [http://www.nvidia.com/object/gpu-applications.html '''a list'''] of applications officially supported by NVIDIA GPUs but of course other programs that use GPUs are also likely to perform well on the machine.<br />
<br />
* For those who are interested in GPU programming we held a workshop the video materials of which are available here: [http://videotorium.hu/hu/events/details/1864,GPU_programozas_workshop GPU programming workshop (videotorium)]<br />
}}<br />
<br />
=== Login ===<br />
<pre><br />
ssh USER@login.debrecen2.hpc.niif.hu<br />
</pre><br />
If a non-default key is used, it must be specified with the -i KEY option (SSH and SCP commands).<br />
<br />
=== Copying files with SCP ===<br />
Download from the HOME directory and upload to the HOME directory:<br />
<pre><br />
Up: scp FILE USER@login.debrecen2.hpc.niif.hu: FILE<br />
Down: scp USER@login.debrecen2.hpc.niif.hu: FILE FILE<br />
</pre><br />
<br />
=== Data synchronization ===<br />
Larger files / directory structures shall be synchronized using the following commands<br />
<pre><br />
Up: rsync -a -e ssh DIRECTORY USER@login.debrecen2.hpc.niif.hu:/home/USER<br />
Down: rsync -a -e ssh USER@login.debrecen2.hpc.niif.hu:/home/USER/DIRECTORY<br />
</pre><br />
The --delete option must be specified to synchronize deleted files.<br />
<br />
== User interface ==<br />
<pre><br />
short form of CWD<br />
|<br />
DEBRECEN2[login] ~ (0)$<br />
| | |<br />
HPC station | |<br />
short machine name |<br />
exit code of the previous command<br />
</pre><br />
<br />
=== Module environment ===<br />
The list of available modules is obtained with the following command:<br />
<pre><br />
module avail<br />
</pre><br />
the list of already loaded modules:<br />
<pre><br />
module list<br />
</pre><br />
You can load an application with the following command:<br />
<pre><br />
module load APP<br />
</pre><br />
The environment variables set by KIFÜ are listed by the nce command.<br />
<br />
=== Data sharing for project members ===<br />
To share files / directories ACLs must be set. To make the HOME directory readable by another user (OTHER):<br />
<pre><br />
setfacl -m u:OTHER:rx $HOME<br />
</pre><br />
To make a specific directory (DIRECTORY) writable:<br />
<pre><br />
setfacl -m u:OTHER:rxw $HOME/DIRECTORY<br />
</pre><br />
You can list extended rights with the following command:<br />
<pre><br />
getfacl $HOME/DIRECTORY<br />
</pre><br />
<br />
== Using a shared home directory ==<br />
The common file system that is available for the login nodes of the supercomputers is accessible under the following path:<br />
<pre><br />
/mnt/fhgfs/home/$USER<br />
</pre><br />
Backups could be made into the shared directory with the following command:<br />
<pre><br />
rsync -avuP --delete $HOME/DIRECTORY /mnt/fhgfs/home/$USER<br />
</pre><br />
<br />
== Compiling applications ==<br />
<br />
Users are encouraged to try compiling needed applications in their own home directory first. If it fails for some reason, then the next step is to ask the Hungarian supercomputer users because there is a good chance that others have already run into the same problem. They can be reached at: <code>hpc-forum at listserv.niif.hu</code>. You can subscribe to this mailing list [https://listserv.niif.hu/mailman/listinfo/hpc-forum|here]. You should also check the archive when looking into the issue. KIFÜ HPC support has extremely limited capacity to handle individual compiling requests but still you may contact <code>hpc-support at niif.hu</code> with your problem. In the latter case please be patient for a few days while waiting for responses.<br />
<br />
== Using the SLURM scheduler ==<br />
The supercomputer has a CPU hour (machine time) based schedule. The following command provides information about the status of the user's Slurm projects (Account):<br />
<pre><br />
sbalance<br />
</pre><br />
The second column (Usage) shows the machine time spent by each user, and the fourth column shows the total machine time of the account. The last two columns provide information about the maximum (Account Limit) and available machine time.<br />
<pre><br />
Scheduler Account Balance<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
User Usage | Account Usage | Account Limit Available (CPU hrs)<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
bob * 7 | foobar 7 | 1,000 993<br />
alice 0 | foobar 7 | 1,000 993<br />
</pre><br />
<br />
=== Estimating CPU time ===<br />
It is advisable to estimate the wall clock time before large-scale (production) runs. To do this, use the following command:<br />
<pre><br />
sestimate -N NODES -t WALLTIME<br />
</pre><br />
<br />
where <code>NODES</code> is the number of nodes to be reserved and <code>WALLTIME</code> is the maximum run time.<br />
<br />
'''<br />
It is important to specify the wall clock time you want to reserve as accurately as possible, as the scheduler also ranks the jobs waiting to be run based on this. It is generally true that the shorter job will take place sooner. It is advisable to check the actual run time with the <code>sacct</code> command afterwards.'''<br />
<br />
=== Status information ===<br />
The <code>squeue</code> and the <code>sinfo</code> command provide information about the general state of the cluster. Each job submitted is assigned a unique identification number (JOBID). Knowing this, we can ask for more information. Characteristics of the submitted or already running job:<br />
<pre><br />
scontrol show job JOBID<br />
</pre><br />
<br />
Each job is also put into a so-called accounting database. From this you can retrieve the characteristics of the jobs you have run and the statistics of resource usage. You can view detailed statistics with the following command:<br />
<pre><br />
sacct -l -j JOBID<br />
</pre><br />
<br />
The following command provides information about the memory used:<br />
<pre><br />
smemory JOBID<br />
</pre><br />
<br />
The next one shows disk usage:<br />
<pre><br />
sdisk JOBID<br />
</pre><br />
<br />
==== SLURM warnings ====<br />
<pre><br />
Resources / AssociationResourceLimit - Waiting for a resource<br />
AssociationJobLimit / QOSJobLimit - Not enough CPU time or maximum CPU number is reserved<br />
Priority - Waiting due to low priority<br />
<br />
</pre><br />
In the latter case, the time to be reserved by the job must be reduced. Jobs for a given project can run on up to 512 CPUs at a given time.<br />
<br />
==== Checking licenses ====<br />
Az elérhető és éppen használt licenszekről a következő parancs ad információt:<br />
<pre><br />
slicenses<br />
</pre><br />
<br />
==== Checking maintenance ====<br />
In the maintenance time window, the scheduler does not start new jobs, but jobs could still be submitted. The following command provides information on maintenance dates:<br />
<pre><br />
sreservations<br />
</pre><br />
<br />
==== Aggregate consumption ====<br />
You can retrieve the CPU minutes consumed up to one month ago with the following command:<br />
<pre><br />
susage<br />
</pre><br />
<br />
==== Total consumption ====<br />
If you want to know how much CPU time you have been using for a certain period, you can query it with this command:<br />
<pre><br />
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01<br />
</pre><br />
<br />
=== Submitting jobs ===<br />
It is possible to run applications on supercomputers in batch mode. This means that for each run, a job script must be created that includes a description of the resources required and the commands required to run. Scheduler parameters (resource requirements) must be specified with the <code>#SBATCH</code> directive.<br />
<br />
==== Mandatory parameters ====<br />
The following parameters must be specified in each case:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=NAME<br />
#SBATCH --time=TIME<br />
</pre><br />
where <code>ACCOUNT</code> is the name of the account to be charged (your available accounts are indicated by the sbalance command), <code>NAME</code> is the short name of the job, and <code>TIME</code> is the maximum wall clock time (DD-HH:MM:SS). <br />
The following time formats can be used:<br />
<br />
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".<br />
<br />
==== Reservation of GPUs ====<br />
GPUs are reserved using the following directive:<br />
<pre><br />
#SBATCH --gres=gpu:N<br />
</pre><br />
<code>N</code> specifies the number of GPUs / node, which can be 1, 2, and a maximum of 3.<br />
<br />
==== Interactive use ====<br />
You can submit short interactive jobs with the 'srun' command, e.g.<br />
<pre><br />
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP<br />
</pre><br />
<br />
==== Submitting batch jobs ====<br />
To submit jobs use the following command:<br />
<pre><br />
sbatch slurm.sh<br />
</pre><br />
<br />
On successful submission you get the following output:<br />
<pre><br />
Submitted batch job JOBID<br />
</pre><br />
ahol a <code>JOBID</code> a feladat egyedi azonosítószáma.<br />
<br />
The following command stops the job:<br />
<pre><br />
scancel JOBID<br />
</pre><br />
<br />
==== Non-restarting jobs ====<br />
For non-restarting jobs, the following directive should be used:<br />
<pre><br />
#SBATCH --no-requeue<br />
</pre><br />
<br />
==== Partitions ====<br />
There are two non-overlapping queues (partitions) on the supercomputer: the <code>prod-gpu-k40</code> queue and the <code>prod-gpu-k20</code> queue. Both are for production purposes, the first featuring CN machines with Nvidia K40x GPUs and the second with Nvidia K20x GPUs. The default queue is <code> prod-gpu-k20</code>. The prod-gpu-k40 partition can be selected with the following directive:<br />
<pre><br />
#SBATCH --partition=prod-gpu-k40<br />
</pre><br />
<br />
==== Quality of Service (QoS) ====<br />
The default quality of the service is <code>normal</code>, i.e. it cannot be interrupted.<br />
<br />
===== High priority =====<br />
High-priority jobs can run for up to 24 hours and are charged for twice the time in return for prioritizing these jobs.<br />
<pre><br />
#SBATCH --qos=fast<br />
</pre><br />
<br />
===== Low priority =====<br />
It is also possible to post low-priority jobs. Such jobs can be interrupted at any time by any normal priority job, in exchange for being charged for only half the machine time spent. Interrupted jobs are automatically rescheduled. Only submit jobs with low priority that can withstand random interruptions and save their status regularly (checkpoint) so that they could be quickly restarted.<br />
<pre><br />
#SBATCH --qos=lowpri<br />
</pre><br />
<br />
==== Memory allocation ====<br />
By default, 1 CPU core is assigned 1000 MB of memory but more can be requested with the following directive:<br />
<pre><br />
#SBATCH --mem-per-cpu=MEMORY<br />
</pre><br />
where <code>MEMORY</code> is specified in MB. The maximum memory / core can be 7800 MB.<br />
<br />
==== Email notification ====<br />
Send mail when job status changes (start, stop, error):<br />
<pre><br />
#SBATCH --mail-type=ALL<br />
#SBATCH --mail-user=EMAIL<br />
</pre><br />
where <code>EMAIL</code> is the email address to be notified.<br />
<br />
==== Arrayjobs ====<br />
<br />
Arrayjobs are needed when a single threaded (serial) application is to be run in many instances (with different parameters) at once. For instances, the scheduler stores the unique identifier in the <code>SLURM_ARRAY_TASK_ID</code> environment variable. By querying this, the threads of the array job can be separated. The outputs of the threads are written to the <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> files. The scheduler performs the upload according to a tight pack. You may want to select the number of threads as a multiple of the number of processors in this case too. [http://slurm.schedmd.com/job_array.html|More information]<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=array<br />
#SBATCH --time=24:00:00<br />
#SBATCH --array=1-96<br />
srun envtest.sh<br />
</pre><br />
<br />
==== OpenMPI jobs ====<br />
For MPI jobs, you must also specify the number of MPI processes starting on each node (<code>#SBATCH --ntasks-per-node=</code>). In the most common case this is the number of CPU cores of a single node. The parallel program must be started with the <code>mpirun</code> command.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=mpi<br />
#SBATCH -N 2<br />
#SBATCH --ntasks-per-node=8<br />
#SBATCH --time=12:00:00<br />
mpirun --report-pid ${TMPDIR}/mpirun.pid PROGRAM<br />
</pre><br />
<br />
OpenMPI FAQ: http://www.open-mpi.org/faq<br />
<br />
==== OpenMP (OMP) jobs ====<br />
<br />
A maximum of 1 node can be reserved for OpenMP parallel applications. The number of OMP threads must be specified with the <code>OMP_NUM_THREADS</code> environment variable. The variable must either be set before the application (see example) or exported before the start command:<br />
<code><br />
export OMP_NUM_THREADS=8<br />
</code><br />
<br />
In the following example, we have assigned 8 CPU cores to a task, the 8 CPU cores must be on one node. The number of CPU cores is included in the <code>SLURM_CPUS_PER_TASK</code> variable, and it also sets the number of OMP threads.<br />
<br />
User Alice launches an 8-thread OMP application at the expense of the foobar account for a maximum of 6 hours.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=omp<br />
#SBATCH --time=06:00:00<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=8<br />
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./a.out<br />
</pre><br />
<br />
==== Hybrid MPI-OMP jobs ====<br />
We speak of a hybrid MPI-OMP mode when the parallel application uses both MPI and OMP. It is worth noting that MKL calls of programs linked with Intel MKL are OpenMP-capable. In general, the following distribution is recommended: the number of MPI processes from 1 to the number of CPU sockets in one node, the OMP threads to be the total number of CPU core numbers in one node, or half, or quarter (as appropriate). For the job script the parameters of the above two modes must be combined.<br />
<br />
In the following example, we start 2 nodes and 1-1 task per node with 10 threads per task. User Alice submitted a hybrid job to 2 nodes for 8 hours at the expense of the foobar account. Only 1 MPI process runs on one node at a time, which uses 8 OMP threads per node. The 2 machines run a total of 2 MPI processes and 2 x 8 OMP threads.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=mpiomp<br />
#SBATCH --time=08:00:00<br />
#SBATCH -N 2<br />
#SBATCH --ntasks=2<br />
#SBATCH --ntasks-per-node=1<br />
#SBATCH --cpus-per-task=8<br />
#SBATCH -o slurm.out<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
mpirun ./a.out<br />
</pre><br />
<br />
==== Maple Grid jobs ====<br />
<br />
Maple can be run on 1 node - like OMP tasks. You must also load the maple module to use it. Maple works in client-server mode so you must also start the grid server (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>) before running the Maple job. This application requires a license, which must be specified in the job script (<code>#SBATCH --licenses=maplegrid:1</code>). The Maple job must be started with the <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> command.<br />
<br />
User Alice starts Maple Grid for 6 hours from the foobar account:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=maple<br />
#SBATCH -N 1<br />
#SBATCH --ntasks-per-node=16<br />
#SBATCH --time=06:00:00<br />
#SBATCH -o slurm.out<br />
#SBATCH --licenses=maplegrid:1<br />
<br />
module load maple<br />
<br />
${MAPLE}/toolbox/Grid/bin/startserver<br />
${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl<br />
</pre><br />
<br />
<br />
[[Category: HPC]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en&diff=4874Debrecen2 GPU klaszter en2021-07-14T08:25:35Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>{| class="wikitable" border="1" <br />
|- <br />
| Cluster<br />
| Debrecen2 (Leo)<br />
|- <br />
| Type<br />
| HP SL250s<br />
|- <br />
| Core / node<br />
| 8 × 2 Xeon E5-2650v2 2.60GHz <br />
|- <br />
| GPU / node<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x<br />
|- <br />
| # of compute nodes<br />
| 84<br />
|-<br />
| Max Walltime<br />
| 7-00:00:00<br />
|-<br />
| Max core / project<br />
| 336<br />
|-<br />
| Max mem / core<br />
| 7000 MB<br />
|}<br />
<br />
=== Requesting CPU time ===<br />
<br />
{{ATTENTION| When applying for CPU time, we expect a brief justification from the HPC project managers stating that the application to be run is capable of using a GPU (except when the purpose is to use licensed software – available on the machine – that is unable to use a GPU, e.g. Gaussian, Maple). This is necessary because most of the HPC resource performance comes from GPU acceleration, so a program without acceleration that allocates CPUs, would be limiting the use of GPUs leading to underutilization. NVIDIA released [http://www.nvidia.com/object/gpu-applications.html '''a list'''] of applications officially supported by NVIDIA GPUs but of course other programs that use GPUs are also likely to perform well on the machine.<br />
<br />
* For those who are interested in GPU programming we held a workshop the video materials of which are available here: [http://videotorium.hu/hu/events/details/1864,GPU_programozas_workshop GPU programming workshop (videotorium)]<br />
}}<br />
<br />
=== Login ===<br />
<pre><br />
ssh USER@login.debrecen2.hpc.niif.hu<br />
</pre><br />
If a non-default key is used, it must be specified with the -i KEY option (SSH and SCP commands).<br />
<br />
=== Copying files with SCP ===<br />
Download from the HOME directory and upload to the HOME directory:<br />
<pre><br />
Up: scp FILE USER@login.debrecen2.hpc.niif.hu: FILE<br />
Down: scp USER@login.debrecen2.hpc.niif.hu: FILE FILE<br />
</pre><br />
<br />
=== Data synchronization ===<br />
Larger files / directory structures shall be synchronized using the following commands<br />
<pre><br />
Up: rsync -a -e ssh DIRECTORY USER@login.debrecen2.hpc.niif.hu:/home/USER<br />
Down: rsync -a -e ssh USER@login.debrecen2.hpc.niif.hu:/home/USER/DIRECTORY<br />
</pre><br />
The --delete option must be specified to synchronize deleted files.<br />
<br />
== User interface ==<br />
<pre><br />
short form of CWD<br />
|<br />
DEBRECEN2[login] ~ (0)$<br />
| | |<br />
HPC station | |<br />
short machine name |<br />
exit code of the previous command<br />
</pre><br />
<br />
=== Module environment ===<br />
The list of available modules is obtained with the following command:<br />
<pre><br />
module avail<br />
</pre><br />
the list of already loaded modules:<br />
<pre><br />
module list<br />
</pre><br />
You can load an application with the following command:<br />
<pre><br />
module load APP<br />
</pre><br />
The environment variables set by KIFÜ are listed by the nce command.<br />
<br />
=== Data sharing for project members ===<br />
To share files / directories ACLs must be set. To make the HOME directory readable by another user (OTHER):<br />
<pre><br />
setfacl -m u:OTHER:rx $HOME<br />
</pre><br />
To make a specific directory (DIRECTORY) writable:<br />
<pre><br />
setfacl -m u:OTHER:rxw $HOME/DIRECTORY<br />
</pre><br />
You can list extended rights with the following command:<br />
<pre><br />
getfacl $HOME/DIRECTORY<br />
</pre><br />
<br />
== Using a shared home directory ==<br />
The common file system that is available for the login nodes of the supercomputers is accessible under the following path:<br />
<pre><br />
/mnt/fhgfs/home/$USER<br />
</pre><br />
Backups could be made into the shared directory with the following command:<br />
<pre><br />
rsync -avuP --delete $HOME/DIRECTORY /mnt/fhgfs/home/$USER<br />
</pre><br />
<br />
== Compiling applications ==<br />
<br />
Users are encouraged to try compiling needed applications in their own home directory first. If it fails for some reason, then the next step is to ask the Hungarian supercomputer users because there is a good chance that others have already run into the same problem. They can be reached at: <code>hpc-forum at listserv.niif.hu</code>. You can subscribe to this mailing list [https://listserv.niif.hu/mailman/listinfo/hpc-forum|here]. You should also check the archive when looking into the issue. KIFÜ HPC support has extremely limited capacity to handle individual compiling requests but still you may contact <code>hpc-support at niif.hu</code> with your problem. In the latter case please be patient for a few days while waiting for responses.<br />
<br />
== Using the SLURM scheduler ==<br />
The supercomputer has a CPU hour (machine time) based schedule. The following command provides information about the status of the user's Slurm projects (Account):<br />
<pre><br />
sbalance<br />
</pre><br />
The second column (Usage) shows the machine time spent by each user, and the fourth column shows the total machine time of the account. The last two columns provide information about the maximum (Account Limit) and available machine time.<br />
<pre><br />
Scheduler Account Balance<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
User Usage | Account Usage | Account Limit Available (CPU hrs)<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
bob * 7 | foobar 7 | 1,000 993<br />
alice 0 | foobar 7 | 1,000 993<br />
</pre><br />
<br />
=== Estimating CPU time ===<br />
It is advisable to estimate the wall clock time before large-scale (production) runs. To do this, use the following command:<br />
<pre><br />
sestimate -N NODES -t WALLTIME<br />
</pre><br />
<br />
where <code>NODES</code> is the number of nodes to be reserved and <code>WALLTIME</code> is the maximum run time.<br />
<br />
'''<br />
It is important to specify the wall clock time you want to reserve as accurately as possible, as the scheduler also ranks the jobs waiting to be run based on this. It is generally true that the shorter job will take place sooner. It is advisable to check the actual run time with the <code>sacct</code> command afterwards.'''<br />
<br />
=== Status information ===<br />
The <code>squeue</code> and the <code>sinfo</code> command provide information about the general state of the cluster. Each job submitted is assigned a unique identification number (JOBID). Knowing this, we can ask for more information. Characteristics of the submitted or already running job:<br />
<pre><br />
scontrol show job JOBID<br />
</pre><br />
<br />
Each job is also put into a so-called accounting database. From this you can retrieve the characteristics of the jobs you have run and the statistics of resource usage. You can view detailed statistics with the following command:<br />
<pre><br />
sacct -l -j JOBID<br />
</pre><br />
<br />
The following command provides information about the memory used:<br />
<pre><br />
smemory JOBID<br />
</pre><br />
<br />
The next one shows disk usage:<br />
<pre><br />
sdisk JOBID<br />
</pre><br />
<br />
==== SLURM warnings ====<br />
<pre><br />
Resources / AssociationResourceLimit - Waiting for a resource<br />
AssociationJobLimit / QOSJobLimit - Not enough CPU time or maximum CPU number is reserved<br />
Priority - Waiting due to low priority<br />
<br />
</pre><br />
In the latter case, the time to be reserved by the job must be reduced. Jobs for a given project can run on up to 512 CPUs at a given time.<br />
<br />
==== Checking licenses ====<br />
Az elérhető és éppen használt licenszekről a következő parancs ad információt:<br />
<pre><br />
slicenses<br />
</pre><br />
<br />
==== Checking maintenance ====<br />
In the maintenance time window, the scheduler does not start new jobs, but jobs could still be submitted. The following command provides information on maintenance dates:<br />
<pre><br />
sreservations<br />
</pre><br />
<br />
==== Aggregate consumption ====<br />
You can retrieve the CPU minutes consumed up to one month ago with the following command:<br />
<pre><br />
susage<br />
</pre><br />
<br />
==== Total consumption ====<br />
If you want to know how much CPU time you have been using for a certain period, you can query it with this command:<br />
<pre><br />
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01<br />
</pre><br />
<br />
=== Submitting jobs ===<br />
It is possible to run applications on supercomputers in batch mode. This means that for each run, a job script must be created that includes a description of the resources required and the commands required to run. Scheduler parameters (resource requirements) must be specified with the <code>#SBATCH</code> directive.<br />
<br />
==== Mandatory parameters ====<br />
The following parameters must be specified in each case:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=NAME<br />
#SBATCH --time=TIME<br />
</pre><br />
where <code>ACCOUNT</code> is the name of the account to be charged (your available accounts are indicated by the sbalance command), <code>NAME</code> is the short name of the job, and <code>TIME</code> is the maximum wall clock time (DD-HH:MM:SS). <br />
The following time formats can be used:<br />
<br />
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".<br />
<br />
==== Reservation of GPUs ====<br />
GPUs are reserved using the following directive:<br />
<pre><br />
#SBATCH --gres=gpu:N<br />
</pre><br />
<code>N</code> specifies the number of GPUs / node, which can be 1, 2, and a maximum of 3.<br />
<br />
==== Interactive use ====<br />
You can submit short interactive jobs with the 'srun' command, e.g.<br />
<pre><br />
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP<br />
</pre><br />
<br />
==== Submitting batch jobs ====<br />
To submit jobs use the following command:<br />
<pre><br />
sbatch slurm.sh<br />
</pre><br />
<br />
On successful submission you get the following output:<br />
<pre><br />
Submitted batch job JOBID<br />
</pre><br />
ahol a <code>JOBID</code> a feladat egyedi azonosítószáma.<br />
<br />
The following command stops the job:<br />
<pre><br />
scancel JOBID<br />
</pre><br />
<br />
==== Non-restarting jobs ====<br />
For non-restarting jobs, the following directive should be used:<br />
<pre><br />
#SBATCH --no-requeue<br />
</pre><br />
<br />
==== Partitions ====<br />
There are two non-overlapping queues (partitions) on the supercomputer: the <code>prod-gpu-k40</code> queue and the <code>prod-gpu-k20</code> queue. Both are for production purposes, the first featuring CN machines with Nvidia K40x GPUs and the second with Nvidia K20x GPUs. The default queue is <code> prod-gpu-k20</code>. The prod-gpu-k40 partition can be selected with the following directive:<br />
<pre><br />
#SBATCH --partition=prod-gpu-k40<br />
</pre><br />
<br />
==== Quality of Service (QoS) ====<br />
The default quality of the service is <code>normal</code>, i.e. it cannot be interrupted.<br />
<br />
===== High priority =====<br />
High-priority jobs can run for up to 24 hours and are charged for twice the time in return for prioritizing these jobs.<br />
<pre><br />
#SBATCH --qos=fast<br />
</pre><br />
<br />
===== Low priority =====<br />
It is also possible to post low-priority jobs. Such jobs can be interrupted at any time by any normal priority job, in exchange for being charged for only half the machine time spent. Interrupted jobs are automatically rescheduled. Only submit jobs with low priority that can withstand random interruptions and save their status regularly (checkpoint) so that they could be quickly restarted.<br />
<pre><br />
#SBATCH --qos=lowpri<br />
</pre><br />
<br />
==== Memory allocation ====<br />
By default, 1 CPU core is assigned 1000 MB of memory but more can be requested with the following directive:<br />
<pre><br />
#SBATCH --mem-per-cpu=MEMORY<br />
</pre><br />
where <code>MEMORY</code> is specified in MB. The maximum memory / core can be 7800 MB.<br />
<br />
==== Email notification ====<br />
Send mail when job status changes (start, stop, error):<br />
<pre><br />
#SBATCH --mail-type=ALL<br />
#SBATCH --mail-user=EMAIL<br />
</pre><br />
where <code>EMAIL</code> is the email address to be notified.<br />
<br />
==== Arrayjobs ====<br />
<br />
Arrayjobs are needed when a single threaded (serial) application is to be run in many instances (with different parameters) at once. For instances, the scheduler stores the unique identifier in the <code>SLURM_ARRAY_TASK_ID</code> environment variable. By querying this, the threads of the array job can be separated. The outputs of the threads are written to the <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> files. The scheduler performs the upload according to a tight pack. You may want to select the number of threads as a multiple of the number of processors in this case too. [http://slurm.schedmd.com/job_array.html|More information]<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=array<br />
#SBATCH --time=24:00:00<br />
#SBATCH --array=1-96<br />
srun envtest.sh<br />
</pre><br />
<br />
==== OpenMPI jobs ====<br />
For MPI jobs, you must also specify the number of MPI processes starting on each node (<code>#SBATCH --ntasks-per-node=</code>). In the most common case this is the number of CPU cores of a single node. The parallel program must be started with the <code>mpirun</code> command.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=mpi<br />
#SBATCH -N 2<br />
#SBATCH --ntasks-per-node=8<br />
#SBATCH --time=12:00:00<br />
mpirun --report-pid ${TMPDIR}/mpirun.pid PROGRAM<br />
</pre><br />
<br />
OpenMPI FAQ: http://www.open-mpi.org/faq<br />
<br />
==== OpenMP (OMP) jobs ====<br />
<br />
A maximum of 1 node can be reserved for OpenMP parallel applications. The number of OMP threads must be specified with the <code>OMP_NUM_THREADS</code> environment variable. The variable must either be set before the application (see example) or exported before the start command:<br />
<code><br />
export OMP_NUM_THREADS=8<br />
</code><br />
<br />
In the following example, we have assigned 8 CPU cores to a task, the 8 CPU cores must be on one node. The number of CPU cores is included in the <code>SLURM_CPUS_PER_TASK</code> variable, and it also sets the number of OMP threads.<br />
<br />
User Alice launches an 8-thread OMP application at the expense of the foobar account for a maximum of 6 hours.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=omp<br />
#SBATCH --time=06:00:00<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=8<br />
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./a.out<br />
</pre><br />
<br />
==== Hybrid MPI-OMP jobs ====<br />
We speak of a hybrid MPI-OMP mode when the parallel application uses both MPI and OMP. It is worth noting that MKL calls of programs linked with Intel MKL are OpenMP-capable. In general, the following distribution is recommended: the number of MPI processes from 1 to the number of CPU sockets in one node, the OMP threads to be the total number of CPU core numbers in one node, or half, or quarter (as appropriate). For the job script the parameters of the above two modes must be combined.<br />
<br />
In the following example, we start 2 nodes and 1-1 task per node with 10 threads per task. User Alice submitted a hybrid job to 2 nodes for 8 hours at the expense of the foobar account. Only 1 MPI process runs on one node at a time, which uses 8 OMP threads per node. The 2 machines run a total of 2 MPI processes and 2 x 8 OMP threads.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=mpiomp<br />
#SBATCH --time=08:00:00<br />
#SBATCH -N 2<br />
#SBATCH --ntasks=2<br />
#SBATCH --ntasks-per-node=1<br />
#SBATCH --cpus-per-task=8<br />
#SBATCH -o slurm.out<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
mpirun ./a.out<br />
</pre><br />
<br />
==== Maple Grid jobs ====<br />
<br />
Maple can be run on 1 node - like OMP tasks. You must also load the maple module to use it. Maple works in client-server mode so you must also start the grid server (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>) before running the Maple job. This application requires a license, which must be specified in the job script (<code>#SBATCH --licenses=maplegrid:1</code>). The Maple job must be started with the <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> command.<br />
<br />
User Alice starts Maple Grid for 6 hours from the foobar account:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=maple<br />
#SBATCH -N 1<br />
#SBATCH --ntasks-per-node=16<br />
#SBATCH --time=06:00:00<br />
#SBATCH -o slurm.out<br />
#SBATCH --licenses=maplegrid:1<br />
<br />
module load maple<br />
<br />
${MAPLE}/toolbox/Grid/bin/startserver<br />
${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl<br />
</pre><br />
<br />
<br />
[[Category: HPC]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en&diff=4870Debrecen2 GPU klaszter en2021-06-24T10:38:17Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>{| class="wikitable" border="1" <br />
|- <br />
| Cluster<br />
| Debrecen2 (Leo)<br />
|- <br />
| Type<br />
| HP SL250s<br />
|- <br />
| Core / node<br />
| 8 × 2 Xeon E5-2650v2 2.60GHz <br />
|- <br />
| GPU / node<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x<br />
|- <br />
| # of compute nodes<br />
| 84<br />
|-<br />
| Max Walltime<br />
| 7-00:00:00<br />
|-<br />
| Max core / project<br />
| 336<br />
|-<br />
| Max mem / core<br />
| 7000 MB<br />
|}<br />
<br />
=== Requesting CPU time ===<br />
<br />
{{ATTENTION| When applying for CPU time, we expect a brief justification from the HPC project managers stating that the application to be run is capable of using a GPU (except when the purpose is to use licensed software – available on the machine – that is unable to use a GPU, e.g. Gaussian, Maple). This is necessary because most of the HPC resource performance comes from GPU acceleration, so a program without acceleration that allocates CPUs, would be limiting the use of GPUs leading to underutilization. NVIDIA released [http://www.nvidia.com/object/gpu-applications.html '''a list'''] of applications officially supported by NVIDIA GPUs but of course other programs that use GPUs are also likely to perform well on the machine.<br />
<br />
* For those who are interested in GPU programming we held a workshop the video materials of which are available here: [http://videotorium.hu/hu/events/details/1864,GPU_programozas_workshop GPU programming workshop (videotorium)]<br />
}}<br />
<br />
=== Login ===<br />
<pre><br />
ssh USER@login.debrecen2.hpc.niif.hu<br />
</pre><br />
If a non-default key is used, it must be specified with the -i KEY option (SSH and SCP commands).<br />
<br />
=== Copying files with SCP ===<br />
Download from the HOME directory and upload to the HOME directory:<br />
<pre><br />
Up: scp FILE USER@login.debrecen2.hpc.niif.hu: FILE<br />
Down: scp USER@login.debrecen2.hpc.niif.hu: FILE FILE<br />
</pre><br />
<br />
=== Data synchronization ===<br />
Larger files / directory structures shall be synchronized using the following commands<br />
<pre><br />
Up: rsync -a -e ssh DIRECTORY USER@login.debrecen2.hpc.niif.hu:/home/USER<br />
Down: rsync -a -e ssh USER@login.debrecen2.hpc.niif.hu:/home/USER/DIRECTORY<br />
</pre><br />
The --delete option must be specified to synchronize deleted files.<br />
<br />
== User interface ==<br />
<pre><br />
short form of CWD<br />
|<br />
DEBRECEN2[login] ~ (0)$<br />
| | |<br />
HPC station | |<br />
short machine name |<br />
exit code of the previous command<br />
</pre><br />
<br />
=== Module environment ===<br />
The list of available modules is obtained with the following command:<br />
<pre><br />
module avail<br />
</pre><br />
the list of already loaded modules:<br />
<pre><br />
module list<br />
</pre><br />
You can load an application with the following command:<br />
<pre><br />
module load APP<br />
</pre><br />
The environment variables set by KIFÜ are listed by the nce command.<br />
<br />
=== Data sharing for project members ===<br />
To share files / directories ACLs must be set. To make the HOME directory readable by another user (OTHER):<br />
<pre><br />
setfacl -m u:OTHER:rx $HOME<br />
</pre><br />
To make a specific directory (DIRECTORY) writable:<br />
<pre><br />
setfacl -m u:OTHER:rxw $HOME/DIRECTORY<br />
</pre><br />
You can list extended rights with the following command:<br />
<pre><br />
getfacl $HOME/DIRECTORY<br />
</pre><br />
<br />
== Using a shared home directory ==<br />
The common file system that is available for the login nodes of the supercomputers is accessible under the following path:<br />
<pre><br />
/mnt/fhgfs/home/$USER<br />
</pre><br />
Backups could be made into the shared directory with the following command:<br />
<pre><br />
rsync -avuP --delete $HOME/DIRECTORY /mnt/fhgfs/home/$USER<br />
</pre><br />
<br />
== Compiling applications ==<br />
<br />
Users are encouraged to try compiling needed applications in their own home directory first. If it fails for some reason, then the next step is to ask the Hungarian supercomputer users because there is a good chance that others have already run into the same problem. They can be reached at: <code>hpc-forum at listserv.niif.hu</code>. You can subscribe to this mailing list [https://listserv.niif.hu/mailman/listinfo/hpc-forum|here]. You should also check the archive when looking into the issue. KIFÜ HPC support has extremely limited capacity to handle individual compiling requests but still you may contact <code>hpc-support at niif.hu</code> with your problem. In the latter case please be patient for a few days while waiting for responses.<br />
<br />
== Using the SLURM scheduler ==<br />
The supercomputer has a CPU hour (machine time) based schedule. The following command provides information about the status of the user's Slurm projects (Account):<br />
<pre><br />
sbalance<br />
</pre><br />
The second column (Usage) shows the machine time spent by each user, and the fourth column shows the total machine time of the account. The last two columns provide information about the maximum (Account Limit) and available machine time.<br />
<pre><br />
Scheduler Account Balance<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
User Usage | Account Usage | Account Limit Available (CPU hrs)<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
bob * 7 | foobar 7 | 1,000 993<br />
alice 0 | foobar 7 | 1,000 993<br />
</pre><br />
<br />
=== Estimating CPU time ===<br />
It is advisable to estimate the wall clock time before large-scale (production) runs. To do this, use the following command:<br />
<pre><br />
sestimate -N NODES -t WALLTIME<br />
</pre><br />
<br />
where <code>NODES</code> is the number of nodes to be reserved and <code>WALLTIME</code> is the maximum run time.<br />
<br />
'''<br />
It is important to specify the wall clock time you want to reserve as accurately as possible, as the scheduler also ranks the jobs waiting to be run based on this. It is generally true that the shorter job will take place sooner. It is advisable to check the actual run time with the <code>sacct</code> command afterwards.'''<br />
<br />
=== Status information ===<br />
The <code>squeue</code> and the <code>sinfo</code> command provide information about the general state of the cluster. Each job submitted is assigned a unique identification number (JOBID). Knowing this, we can ask for more information. Characteristics of the submitted or already running job:<br />
<pre><br />
scontrol show job JOBID<br />
</pre><br />
<br />
Each job is also put into a so-called accounting database. From this you can retrieve the characteristics of the jobs you have run and the statistics of resource usage. You can view detailed statistics with the following command:<br />
<pre><br />
sacct -l -j JOBID<br />
</pre><br />
<br />
The following command provides information about the memory used:<br />
<pre><br />
smemory JOBID<br />
</pre><br />
<br />
The next one shows disk usage:<br />
<pre><br />
sdisk JOBID<br />
</pre><br />
<br />
==== SLURM warnings ====<br />
<pre><br />
Resources / AssociationResourceLimit - Waiting for a resource<br />
AssociationJobLimit / QOSJobLimit - Not enough CPU time or maximum CPU number is reserved<br />
Priority - Waiting due to low priority<br />
<br />
</pre><br />
In the latter case, the time to be reserved by the job must be reduced. Jobs for a given project can run on up to 512 CPUs at a given time.<br />
<br />
==== Checking licenses ====<br />
Az elérhető és éppen használt licenszekről a következő parancs ad információt:<br />
<pre><br />
slicenses<br />
</pre><br />
<br />
==== Checking maintenance ====<br />
In the maintenance time window, the scheduler does not start new jobs, but jobs could still be submitted. The following command provides information on maintenance dates:<br />
<pre><br />
sreservations<br />
</pre><br />
<br />
==== Aggregate consumption ====<br />
You can retrieve the CPU minutes consumed up to one month ago with the following command:<br />
<pre><br />
susage<br />
</pre><br />
<br />
==== Total consumption ====<br />
If you want to know how much CPU time you have been using for a certain period, you can query it with this command:<br />
<pre><br />
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01<br />
</pre><br />
<br />
=== Submitting jobs ===<br />
It is possible to run applications on supercomputers in batch mode. This means that for each run, a job script must be created that includes a description of the resources required and the commands required to run. Scheduler parameters (resource requirements) must be specified with the <code>#SBATCH</code> directive.<br />
<br />
==== Mandatory parameters ====<br />
The following parameters must be specified in each case:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=NAME<br />
#SBATCH --time=TIME<br />
</pre><br />
where <code>ACCOUNT</code> is the name of the account to be charged (your available accounts are indicated by the sbalance command), <code>NAME</code> is the short name of the job, and <code>TIME</code> is the maximum wall clock time (DD-HH:MM:SS). <br />
The following time formats can be used:<br />
<br />
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".<br />
<br />
==== Reservation of GPUs ====<br />
GPUs are reserved using the following directive:<br />
<pre><br />
#SBATCH --gres=gpu:N<br />
</pre><br />
<code>N</code> specifies the number of GPUs / node, which can be 1, 2, and a maximum of 3.<br />
<br />
==== Interactive use ====<br />
You can submit short interactive jobs with the 'srun' command, e.g.<br />
<pre><br />
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP<br />
</pre><br />
<br />
==== Submitting batch jobs ====<br />
To submit jobs use the following command:<br />
<pre><br />
sbatch slurm.sh<br />
</pre><br />
<br />
On successful submission you get the following output:<br />
<pre><br />
Submitted batch job JOBID<br />
</pre><br />
ahol a <code>JOBID</code> a feladat egyedi azonosítószáma.<br />
<br />
A feladat leállítását a következő parancs végzi:<br />
<pre><br />
scancel JOBID<br />
</pre><br />
<br />
==== Nem újrainduló jobok ====<br />
Nem újrainduló jobokhoz a következő direktívát kell használni:<br />
<pre><br />
#SBATCH --no-requeue<br />
</pre><br />
<br />
==== Feladat sorok ====<br />
A szupergépen két, egymást nem átfedő, sor (partíció) áll rendelkezésre, a <code>prod-gpu-k40</code> sor és a <code>prod-gpu-k20</code> sor. Mind a kettő éles számolásokra való, az első olyan CN gépeket tartalmaz amikben Nvidia K40x GPU-k, a másodikban pedig Nvidia K20x GPU-k vannak. Az alapértelmezett sor a <code> prod-gpu-k20</code>. A prod-gpu-k40 partíciót a következő direktívával lehet kiválasztani:<br />
<pre><br />
#SBATCH --partition=prod-gpu-k40<br />
</pre><br />
<br />
==== A szolgáltatás minősége (QOS) ====<br />
A szolgáltatást alapértelmezett minősége <code>normal</code>, azaz nem megszakítható a futás.<br />
<br />
===== Magas prioritás =====<br />
A magas prioritású jobok maximum 24 óráig futhatnak, és kétszer gyorsabb időelszámolással rendelkeznek, cserébe az ütemező előreveszi ezeket a feladatokat.<br />
<pre><br />
#SBATCH --qos=fast<br />
</pre><br />
<br />
===== Alacsony prioritás =====<br />
Lehetőség van alacsony prioritású jobok feladására is. Az ilyen feladatokat bármilyen normál prioritású job bármikor megszakíthatja, cserébe az elhasznált gépidő fele számlázódik csak. A megszakított jobok automatikusan újraütemeződnek. Fontos, hogy olyan feladatokat indítsunk alacsony prioritással, amelyek kibírják a véletlenszerű megszakításokat, rendszeresen elmentik az állapotukat (checkpoint) és ebből gyorsan újra tudnak indulni.<br />
<pre><br />
#SBATCH --qos=lowpri<br />
</pre><br />
<br />
==== Memória foglalás ====<br />
Alapértelmezetten 1 CPU core-hoz 1000 MB memória van rendelve, ennél többet a következő direktívával igényelhetünk:<br />
<pre><br />
#SBATCH --mem-per-cpu=MEMORY<br />
</pre><br />
ahol <code>MEMORY</code> MB egységben van megadva. A maximális memória/core 7800 MB lehet.<br />
<br />
==== Email értesítés ====<br />
Levél küldése job állapotának változásakor (elindulás,leállás,hiba):<br />
<pre><br />
#SBATCH --mail-type=ALL<br />
#SBATCH --mail-user=EMAIL<br />
</pre><br />
ahol az <code>EMAIL</code> az értesítendő emial cím.<br />
<br />
==== Tömbfeladatok (arrayjob) ====<br />
Tömbfeladatokra akkor van szükségünk, egy szálon futó (soros) alkalmazást szeretnénk egyszerre sok példányban (más-más adatokkal) futtatni. A példányok számára az ütemező a <code>SLURM_ARRAY_TASK_ID</code> környezeti változóban tárolja az egyedi azonosítót. Ennek lekérdezésével lehet az arrayjob szálait elkülöníteni. A szálak kimenetei a <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> fájlokba íródnak. Az ütemező a feltöltést szoros pakolás szerint végzi. Ebben az esetben is érdemes a processzorszám többszörösének választani a szálak számát. [http://slurm.schedmd.com/job_array.html Bővebb ismertető]<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=array<br />
#SBATCH --time=24:00:00<br />
#SBATCH --array=1-96<br />
srun envtest.sh<br />
</pre><br />
<br />
==== OpenMPI feladatok ====<br />
MPI feladatok esetén meg kell adnunk az egy node-on elinduló MPI processzek számát is (<code>#SBATCH --ntasks-per-node=</code>). A leggyakoribb esetben ez az egy node-ban található CPU core-ok száma. A párhuzamos programot az <code>mpirun</code> paranccsal kell indítani.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=mpi<br />
#SBATCH -N 2<br />
#SBATCH --ntasks-per-node=8<br />
#SBATCH --time=12:00:00<br />
mpirun --report-pid ${TMPDIR}/mpirun.pid PROGRAM<br />
</pre><br />
<br />
OpenMPI FAQ: http://www.open-mpi.org/faq<br />
<br />
==== OpenMP (OMP) feladatok ====<br />
OpenMP párhuzamos alkalmazásokhoz maximum 1 node-ot lehet lefoglalni. Az OMP szálák számát az <code>OMP_NUM_THREADS</code> környezeti változóval kell megadni. A változót vagy az alkamazás elé kell írni (ld. példa), vagy exportálni kell az indító parancs előtt:<br />
<code><br />
export OMP_NUM_THREADS=8<br />
</code><br />
<br />
A következő példában egy taskhoz 8 CPU core-t rendeltunk, a 8 CPU core-nak egy node-on kell lennie. A CPU core-ok számát a <code><br />
SLURM_CPUS_PER_TASK</code> változó tartalmazza, és ez állítja be az OMP szálak számát is.<br />
<br />
Alice felhasználó a foobar számla terhére, maximum 6 órára indít el egy 8 szálas OMP alkalmazást.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=omp<br />
#SBATCH --time=06:00:00<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=8<br />
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./a.out<br />
</pre><br />
<br />
==== Hibrid MPI-OMP feladatok ====<br />
Hibrid MPI-OMP módról akkor beszélünk, ha a párhuzamos alkalmazás MPI-t és OMP-t is használ. Érdemes tudni, hogy az Intel MKL-el linkelt programok MKL hívásai OpenMP képesek. Általában a következő elosztás javasolt: az MPI processzek száma 1-től az egy node-ban található CPU foglalatok száma, az OMP szálak ennek megfelelően az egy node-ban található összes CPU core szám vagy annak fele, negyede (értelem szerűen). A jobszkipthez a fenti két mód paramétereit kombinálni kell.<br />
<br />
A következő példában 2 node-ot, és node-onként 1-1 taskot indítunk taskonként 10 szállal. Alice felhasználó a foobar számla terhére, 8 órára, 2 node-ra küldött be egy hibrid jobot. Egy node-on egyszerre csak 1 db MPI processz fut ami node-onként 8 OMP szálat használ. A 2 gépen összesen 2 MPI proceszz és 2 x 8 OMP szál fut.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=mpiomp<br />
#SBATCH --time=08:00:00<br />
#SBATCH -N 2<br />
#SBATCH --ntasks=2<br />
#SBATCH --ntasks-per-node=1<br />
#SBATCH --cpus-per-task=8<br />
#SBATCH -o slurm.out<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
mpirun ./a.out<br />
</pre><br />
<br />
==== Maple Grid feladatok ====<br />
Maple-t az OMP feladatokhoz hasonlóan 1 node-on lehet futtatni. Használatához be kell tölteni a maple modult is. A Maple kliens-szerver üzemmódban működik ezért a Maple feladat futtatása előtt szükség van a grid szerver elindítására is (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>). Ez az alkalmazás licensz köteles, amit a jobszkriptben meg kell adni (<code>#SBATCH --licenses=maplegrid:1</code>). A Maple feladat indátását a <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> paranccsal kell elvégezni.<br />
<br />
Alice felhasználó a foobar számla terhére, 6 órára indítja el a Maple Grid alkalmazást:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=maple<br />
#SBATCH -N 1<br />
#SBATCH --ntasks-per-node=16<br />
#SBATCH --time=06:00:00<br />
#SBATCH -o slurm.out<br />
#SBATCH --licenses=maplegrid:1<br />
<br />
module load maple<br />
<br />
${MAPLE}/toolbox/Grid/bin/startserver<br />
${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl<br />
</pre><br />
<br />
<br />
[[Category: HPC]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en&diff=4869Debrecen2 GPU klaszter en2021-06-24T10:16:48Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>{| class="wikitable" border="1" <br />
|- <br />
| Cluster<br />
| Debrecen2 (Leo)<br />
|- <br />
| Type<br />
| HP SL250s<br />
|- <br />
| Core / node<br />
| 8 × 2 Xeon E5-2650v2 2.60GHz <br />
|- <br />
| GPU / node<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x<br />
|- <br />
| # of compute nodes<br />
| 84<br />
|-<br />
| Max Walltime<br />
| 7-00:00:00<br />
|-<br />
| Max core / project<br />
| 336<br />
|-<br />
| Max mem / core<br />
| 7000 MB<br />
|}<br />
<br />
=== Requesting CPU time ===<br />
<br />
{{ATTENTION| When applying for CPU time, we expect a brief justification from the HPC project managers stating that the application to be run is capable of using a GPU (except when the purpose is to use licensed software – available on the machine – that is unable to use a GPU, e.g. Gaussian, Maple). This is necessary because most of the HPC resource performance comes from GPU acceleration, so a program without acceleration that allocates CPUs, would be limiting the use of GPUs leading to underutilization. NVIDIA released [http://www.nvidia.com/object/gpu-applications.html '''a list'''] of applications officially supported by NVIDIA GPUs but of course other programs that use GPUs are also likely to perform well on the machine.<br />
<br />
* For those who are interested in GPU programming we held a workshop the video materials of which are available here: [http://videotorium.hu/hu/events/details/1864,GPU_programozas_workshop GPU programming workshop (videotorium)]<br />
}}<br />
<br />
=== Login ===<br />
<pre><br />
ssh USER@login.debrecen2.hpc.niif.hu<br />
</pre><br />
If a non-default key is used, it must be specified with the -i KEY option (SSH and SCP commands).<br />
<br />
=== Copying files with SCP ===<br />
Download from the HOME directory and upload to the HOME directory:<br />
<pre><br />
Up: scp FILE USER@login.debrecen2.hpc.niif.hu: FILE<br />
Down: scp USER@login.debrecen2.hpc.niif.hu: FILE FILE<br />
</pre><br />
<br />
=== Data synchronization ===<br />
Larger files / directory structures shall be synchronized using the following commands<br />
<pre><br />
Up: rsync -a -e ssh DIRECTORY USER@login.debrecen2.hpc.niif.hu:/home/USER<br />
Down: rsync -a -e ssh USER@login.debrecen2.hpc.niif.hu:/home/USER/DIRECTORY<br />
</pre><br />
The --delete option must be specified to synchronize deleted files.<br />
<br />
== User interface ==<br />
<pre><br />
short form of CWD<br />
|<br />
DEBRECEN2[login] ~ (0)$<br />
| | |<br />
HPC station | |<br />
short machine name |<br />
exit code of the previous command<br />
</pre><br />
<br />
=== Module environment ===<br />
The list of available modules is obtained with the following command:<br />
<pre><br />
module avail<br />
</pre><br />
the list of already loaded modules:<br />
<pre><br />
module list<br />
</pre><br />
You can load an application with the following command:<br />
<pre><br />
module load APP<br />
</pre><br />
The environment variables set by KIFÜ are listed by the nce command.<br />
<br />
=== Data sharing for project members ===<br />
To share files / directories ACLs must be set. To make the HOME directory readable by another user (OTHER):<br />
<pre><br />
setfacl -m u:OTHER:rx $HOME<br />
</pre><br />
To make a specific directory (DIRECTORY) writable:<br />
<pre><br />
setfacl -m u:OTHER:rxw $HOME/DIRECTORY<br />
</pre><br />
You can list extended rights with the following command:<br />
<pre><br />
getfacl $HOME/DIRECTORY<br />
</pre><br />
<br />
== Using a shared home directory ==<br />
The common file system that is available for the login nodes of the supercomputers is accessible under the following path:<br />
<pre><br />
/mnt/fhgfs/home/$USER<br />
</pre><br />
Backups could be made into the shared directory with the following command:<br />
<pre><br />
rsync -avuP --delete $HOME/DIRECTORY /mnt/fhgfs/home/$USER<br />
</pre><br />
<br />
== Compiling applications ==<br />
<br />
Users are encouraged to try compiling needed applications in their own home directory first. If it fails for some reason, then the next step is to ask the Hungarian supercomputer users because there is a good chance that others have already run into the same problem. They can be reached at: <code>hpc-forum at listserv.niif.hu</code>. You can subscribe to this mailing list [https://listserv.niif.hu/mailman/listinfo/hpc-forum|here]. You should also check the archive when looking into the issue. KIFÜ HPC support has extremely limited capacity to handle individual compiling requests but still you may contact <code>hpc-support at niif.hu</code> with your problem. In the latter case please be patient for a few days while waiting for responses.<br />
<br />
== Using the SLURM scheduler ==<br />
The supercomputer has a CPU hour (machine time) based schedule. The following command provides information about the status of the user's Slurm projects (Account):<br />
<pre><br />
sbalance<br />
</pre><br />
The second column (Usage) shows the machine time spent by each user, and the fourth column shows the total machine time of the account. The last two columns provide information about the maximum (Account Limit) and available machine time.<br />
<pre><br />
Scheduler Account Balance<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
User Usage | Account Usage | Account Limit Available (CPU hrs)<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
bob * 7 | foobar 7 | 1,000 993<br />
alice 0 | foobar 7 | 1,000 993<br />
</pre><br />
<br />
=== Estimating CPU time ===<br />
It is advisable to estimate the wall clock time before large-scale (production) runs. To do this, use the following command:<br />
<pre><br />
sestimate -N NODES -t WALLTIME<br />
</pre><br />
<br />
where <code>NODES</code> is the number of nodes to be reserved and <code>WALLTIME</code> is the maximum run time.<br />
<br />
'''<br />
It is important to specify the wall clock time you want to reserve as accurately as possible, as the scheduler also ranks the jobs waiting to be run based on this. It is generally true that the shorter job will take place sooner. It is advisable to check the actual run time with the <code>sacct</code> command afterwards.'''<br />
<br />
=== Status information ===<br />
The <code>squeue</code> and the <code>sinfo</code> command provide information about the general state of the cluster. Each job submitted is assigned a unique identification number (JOBID). Knowing this, we can ask for more information. Characteristics of the submitted or already running job:<br />
<pre><br />
scontrol show job JOBID<br />
</pre><br />
<br />
Each job is also put into a so-called accounting database. From this you can retrieve the characteristics of the jobs you have run and the statistics of resource usage. You can view detailed statistics with the following command:<br />
<pre><br />
sacct -l -j JOBID<br />
</pre><br />
<br />
The following command provides information about the memory used:<br />
<pre><br />
smemory JOBID<br />
</pre><br />
<br />
The next one shows disk usage:<br />
<pre><br />
sdisk JOBID<br />
</pre><br />
<br />
==== SLURM warnings ====<br />
<pre><br />
Resources / AssociationResourceLimit - Waiting for a resource<br />
AssociationJobLimit / QOSJobLimit - Not enough CPU time or maximum CPU number is reserved<br />
Priority - Waiting due to low priority<br />
<br />
</pre><br />
In the latter case, the time to be reserved by the job must be reduced. Jobs for a given project can run on up to 512 CPUs at a given time.<br />
<br />
==== Checking licenses ====<br />
Az elérhető és éppen használt licenszekről a következő parancs ad információt:<br />
<pre><br />
slicenses<br />
</pre><br />
<br />
==== Checking maintenance ====<br />
In the maintenance time window, the scheduler does not start new jobs, but jobs could still be submitted. The following command provides information on maintenance dates:<br />
<pre><br />
sreservations<br />
</pre><br />
<br />
==== Aggregate consumption ====<br />
You can retrieve the CPU minutes consumed up to one month ago with the following command:<br />
<pre><br />
susage<br />
</pre><br />
<br />
==== Total consumption ====<br />
If you want to know how much CPU time you have been using for a certain period, you can query it with this command:<br />
<pre><br />
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01<br />
</pre><br />
<br />
=== Submitting jobs ===<br />
It is possible to run applications on supercomputers in batch mode. This means that for each run, a job script must be created that includes a description of the resources required and the commands required to run. Scheduler parameters (resource requirements) must be specified with the <code>#SBATCH</code> directive.<br />
<br />
==== Kötelező paraméterek ====<br />
A következő paramétereket minden esetben meg kell adni:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=NAME<br />
#SBATCH --time=TIME<br />
</pre><br />
ahol az <code>ACCOUNT</code> a terhelendő számla neve (elérhető számláinkről az <code>sbalance</code> parancs ad felvilágosítást), a <code>NAME</code> a job rövid neve, a <code>TIME</code> pedig a maximális walltime idő (<code>DD-HH:MM:SS</code>). A következő időformátumok használhatók: <br />
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" és "days-hours:minutes:seconds".<br />
<br />
==== GPU-k lefoglalása ====<br />
A GPU-k lefoglalása a következő direktívával törénik:<br />
<pre><br />
#SBATCH --gres=gpu:N<br />
</pre><br />
Az <code>N</code> a GPU-k/node számát adja meg, ami 1, 2 és 3 lehet maximum.<br />
<br />
==== Interaktív használat ====<br />
Rövid interaktív feladatokat az 'srun' paranccsal tudunk beküldeni, pl.:<br />
<pre><br />
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP<br />
</pre><br />
<br />
==== Batch job-ok indítása ====<br />
A jobok feladását a következő parancs végzi:<br />
<pre><br />
sbatch slurm.sh<br />
</pre><br />
<br />
Sikeres feladás esetén a következő kimenetet kapjuk:<br />
<pre><br />
Submitted batch job JOBID<br />
</pre><br />
ahol a <code>JOBID</code> a feladat egyedi azonosítószáma.<br />
<br />
A feladat leállítását a következő parancs végzi:<br />
<pre><br />
scancel JOBID<br />
</pre><br />
<br />
==== Nem újrainduló jobok ====<br />
Nem újrainduló jobokhoz a következő direktívát kell használni:<br />
<pre><br />
#SBATCH --no-requeue<br />
</pre><br />
<br />
==== Feladat sorok ====<br />
A szupergépen két, egymást nem átfedő, sor (partíció) áll rendelkezésre, a <code>prod-gpu-k40</code> sor és a <code>prod-gpu-k20</code> sor. Mind a kettő éles számolásokra való, az első olyan CN gépeket tartalmaz amikben Nvidia K40x GPU-k, a másodikban pedig Nvidia K20x GPU-k vannak. Az alapértelmezett sor a <code> prod-gpu-k20</code>. A prod-gpu-k40 partíciót a következő direktívával lehet kiválasztani:<br />
<pre><br />
#SBATCH --partition=prod-gpu-k40<br />
</pre><br />
<br />
==== A szolgáltatás minősége (QOS) ====<br />
A szolgáltatást alapértelmezett minősége <code>normal</code>, azaz nem megszakítható a futás.<br />
<br />
===== Magas prioritás =====<br />
A magas prioritású jobok maximum 24 óráig futhatnak, és kétszer gyorsabb időelszámolással rendelkeznek, cserébe az ütemező előreveszi ezeket a feladatokat.<br />
<pre><br />
#SBATCH --qos=fast<br />
</pre><br />
<br />
===== Alacsony prioritás =====<br />
Lehetőség van alacsony prioritású jobok feladására is. Az ilyen feladatokat bármilyen normál prioritású job bármikor megszakíthatja, cserébe az elhasznált gépidő fele számlázódik csak. A megszakított jobok automatikusan újraütemeződnek. Fontos, hogy olyan feladatokat indítsunk alacsony prioritással, amelyek kibírják a véletlenszerű megszakításokat, rendszeresen elmentik az állapotukat (checkpoint) és ebből gyorsan újra tudnak indulni.<br />
<pre><br />
#SBATCH --qos=lowpri<br />
</pre><br />
<br />
==== Memória foglalás ====<br />
Alapértelmezetten 1 CPU core-hoz 1000 MB memória van rendelve, ennél többet a következő direktívával igényelhetünk:<br />
<pre><br />
#SBATCH --mem-per-cpu=MEMORY<br />
</pre><br />
ahol <code>MEMORY</code> MB egységben van megadva. A maximális memória/core 7800 MB lehet.<br />
<br />
==== Email értesítés ====<br />
Levél küldése job állapotának változásakor (elindulás,leállás,hiba):<br />
<pre><br />
#SBATCH --mail-type=ALL<br />
#SBATCH --mail-user=EMAIL<br />
</pre><br />
ahol az <code>EMAIL</code> az értesítendő emial cím.<br />
<br />
==== Tömbfeladatok (arrayjob) ====<br />
Tömbfeladatokra akkor van szükségünk, egy szálon futó (soros) alkalmazást szeretnénk egyszerre sok példányban (más-más adatokkal) futtatni. A példányok számára az ütemező a <code>SLURM_ARRAY_TASK_ID</code> környezeti változóban tárolja az egyedi azonosítót. Ennek lekérdezésével lehet az arrayjob szálait elkülöníteni. A szálak kimenetei a <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> fájlokba íródnak. Az ütemező a feltöltést szoros pakolás szerint végzi. Ebben az esetben is érdemes a processzorszám többszörösének választani a szálak számát. [http://slurm.schedmd.com/job_array.html Bővebb ismertető]<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=array<br />
#SBATCH --time=24:00:00<br />
#SBATCH --array=1-96<br />
srun envtest.sh<br />
</pre><br />
<br />
==== OpenMPI feladatok ====<br />
MPI feladatok esetén meg kell adnunk az egy node-on elinduló MPI processzek számát is (<code>#SBATCH --ntasks-per-node=</code>). A leggyakoribb esetben ez az egy node-ban található CPU core-ok száma. A párhuzamos programot az <code>mpirun</code> paranccsal kell indítani.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=mpi<br />
#SBATCH -N 2<br />
#SBATCH --ntasks-per-node=8<br />
#SBATCH --time=12:00:00<br />
mpirun --report-pid ${TMPDIR}/mpirun.pid PROGRAM<br />
</pre><br />
<br />
OpenMPI FAQ: http://www.open-mpi.org/faq<br />
<br />
==== OpenMP (OMP) feladatok ====<br />
OpenMP párhuzamos alkalmazásokhoz maximum 1 node-ot lehet lefoglalni. Az OMP szálák számát az <code>OMP_NUM_THREADS</code> környezeti változóval kell megadni. A változót vagy az alkamazás elé kell írni (ld. példa), vagy exportálni kell az indító parancs előtt:<br />
<code><br />
export OMP_NUM_THREADS=8<br />
</code><br />
<br />
A következő példában egy taskhoz 8 CPU core-t rendeltunk, a 8 CPU core-nak egy node-on kell lennie. A CPU core-ok számát a <code><br />
SLURM_CPUS_PER_TASK</code> változó tartalmazza, és ez állítja be az OMP szálak számát is.<br />
<br />
Alice felhasználó a foobar számla terhére, maximum 6 órára indít el egy 8 szálas OMP alkalmazást.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=omp<br />
#SBATCH --time=06:00:00<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=8<br />
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./a.out<br />
</pre><br />
<br />
==== Hibrid MPI-OMP feladatok ====<br />
Hibrid MPI-OMP módról akkor beszélünk, ha a párhuzamos alkalmazás MPI-t és OMP-t is használ. Érdemes tudni, hogy az Intel MKL-el linkelt programok MKL hívásai OpenMP képesek. Általában a következő elosztás javasolt: az MPI processzek száma 1-től az egy node-ban található CPU foglalatok száma, az OMP szálak ennek megfelelően az egy node-ban található összes CPU core szám vagy annak fele, negyede (értelem szerűen). A jobszkipthez a fenti két mód paramétereit kombinálni kell.<br />
<br />
A következő példában 2 node-ot, és node-onként 1-1 taskot indítunk taskonként 10 szállal. Alice felhasználó a foobar számla terhére, 8 órára, 2 node-ra küldött be egy hibrid jobot. Egy node-on egyszerre csak 1 db MPI processz fut ami node-onként 8 OMP szálat használ. A 2 gépen összesen 2 MPI proceszz és 2 x 8 OMP szál fut.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=mpiomp<br />
#SBATCH --time=08:00:00<br />
#SBATCH -N 2<br />
#SBATCH --ntasks=2<br />
#SBATCH --ntasks-per-node=1<br />
#SBATCH --cpus-per-task=8<br />
#SBATCH -o slurm.out<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
mpirun ./a.out<br />
</pre><br />
<br />
==== Maple Grid feladatok ====<br />
Maple-t az OMP feladatokhoz hasonlóan 1 node-on lehet futtatni. Használatához be kell tölteni a maple modult is. A Maple kliens-szerver üzemmódban működik ezért a Maple feladat futtatása előtt szükség van a grid szerver elindítására is (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>). Ez az alkalmazás licensz köteles, amit a jobszkriptben meg kell adni (<code>#SBATCH --licenses=maplegrid:1</code>). A Maple feladat indátását a <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> paranccsal kell elvégezni.<br />
<br />
Alice felhasználó a foobar számla terhére, 6 órára indítja el a Maple Grid alkalmazást:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=maple<br />
#SBATCH -N 1<br />
#SBATCH --ntasks-per-node=16<br />
#SBATCH --time=06:00:00<br />
#SBATCH -o slurm.out<br />
#SBATCH --licenses=maplegrid:1<br />
<br />
module load maple<br />
<br />
${MAPLE}/toolbox/Grid/bin/startserver<br />
${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl<br />
</pre><br />
<br />
<br />
[[Category: HPC]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en&diff=4868Debrecen2 GPU klaszter en2021-06-24T10:02:30Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>{| class="wikitable" border="1" <br />
|- <br />
| Cluster<br />
| Debrecen2 (Leo)<br />
|- <br />
| Type<br />
| HP SL250s<br />
|- <br />
| Core / node<br />
| 8 × 2 Xeon E5-2650v2 2.60GHz <br />
|- <br />
| GPU / node<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x<br />
|- <br />
| # of compute nodes<br />
| 84<br />
|-<br />
| Max Walltime<br />
| 7-00:00:00<br />
|-<br />
| Max core / project<br />
| 336<br />
|-<br />
| Max mem / core<br />
| 7000 MB<br />
|}<br />
<br />
=== Requesting CPU time ===<br />
<br />
{{ATTENTION| When applying for CPU time, we expect a brief justification from the HPC project managers stating that the application to be run is capable of using a GPU (except when the purpose is to use licensed software – available on the machine – that is unable to use a GPU, e.g. Gaussian, Maple). This is necessary because most of the HPC resource performance comes from GPU acceleration, so a program without acceleration that allocates CPUs, would be limiting the use of GPUs leading to underutilization. NVIDIA released [http://www.nvidia.com/object/gpu-applications.html '''a list'''] of applications officially supported by NVIDIA GPUs but of course other programs that use GPUs are also likely to perform well on the machine.<br />
<br />
* For those who are interested in GPU programming we held a workshop the video materials of which are available here: [http://videotorium.hu/hu/events/details/1864,GPU_programozas_workshop GPU programming workshop (videotorium)]<br />
}}<br />
<br />
=== Login ===<br />
<pre><br />
ssh USER@login.debrecen2.hpc.niif.hu<br />
</pre><br />
If a non-default key is used, it must be specified with the -i KEY option (SSH and SCP commands).<br />
<br />
=== Copying files with SCP ===<br />
Download from the HOME directory and upload to the HOME directory:<br />
<pre><br />
Up: scp FILE USER@login.debrecen2.hpc.niif.hu: FILE<br />
Down: scp USER@login.debrecen2.hpc.niif.hu: FILE FILE<br />
</pre><br />
<br />
=== Data synchronization ===<br />
Larger files / directory structures shall be synchronized using the following commands<br />
<pre><br />
Up: rsync -a -e ssh DIRECTORY USER@login.debrecen2.hpc.niif.hu:/home/USER<br />
Down: rsync -a -e ssh USER@login.debrecen2.hpc.niif.hu:/home/USER/DIRECTORY<br />
</pre><br />
The --delete option must be specified to synchronize deleted files.<br />
<br />
== User interface ==<br />
<pre><br />
short form of CWD<br />
|<br />
DEBRECEN2[login] ~ (0)$<br />
| | |<br />
HPC station | |<br />
short machine name |<br />
exit code of the previous command<br />
</pre><br />
<br />
=== Module environment ===<br />
The list of available modules is obtained with the following command:<br />
<pre><br />
module avail<br />
</pre><br />
the list of already loaded modules:<br />
<pre><br />
module list<br />
</pre><br />
You can load an application with the following command:<br />
<pre><br />
module load APP<br />
</pre><br />
The environment variables set by KIFÜ are listed by the nce command.<br />
<br />
=== Data sharing for project members ===<br />
To share files / directories ACLs must be set. To make the HOME directory readable by another user (OTHER):<br />
<pre><br />
setfacl -m u:OTHER:rx $HOME<br />
</pre><br />
To make a specific directory (DIRECTORY) writable:<br />
<pre><br />
setfacl -m u:OTHER:rxw $HOME/DIRECTORY<br />
</pre><br />
You can list extended rights with the following command:<br />
<pre><br />
getfacl $HOME/DIRECTORY<br />
</pre><br />
<br />
== Using a shared home directory ==<br />
The common file system that is available for the login nodes of the supercomputers is accessible under the following path:<br />
<pre><br />
/mnt/fhgfs/home/$USER<br />
</pre><br />
Backups could be made into the shared directory with the following command:<br />
<pre><br />
rsync -avuP --delete $HOME/DIRECTORY /mnt/fhgfs/home/$USER<br />
</pre><br />
<br />
== Compiling applications ==<br />
<br />
Users are encouraged to try compiling needed applications in their own home directory first. If it fails for some reason, then the next step is to ask the Hungarian supercomputer users because there is a good chance that others have already run into the same problem. They can be reached at: <code>hpc-forum at listserv.niif.hu</code>. You can subscribe to this mailing list [https://listserv.niif.hu/mailman/listinfo/hpc-forum|here]. You should also check the archive when looking into the issue. KIFÜ HPC support has extremely limited capacity to handle individual compiling requests but still you may contact <code>hpc-support at niif.hu</code> with your problem. In the latter case please be patient for a few days while waiting for responses.<br />
<br />
== Using the SLURM scheduler ==<br />
The supercomputer has a CPU hour (machine time) based schedule. The following command provides information about the status of the user's Slurm projects (Account):<br />
<pre><br />
sbalance<br />
</pre><br />
The second column (Usage) shows the machine time spent by each user, and the fourth column shows the total machine time of the account. The last two columns provide information about the maximum (Account Limit) and available machine time.<br />
<pre><br />
Scheduler Account Balance<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
User Usage | Account Usage | Account Limit Available (CPU hrs)<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
bob * 7 | foobar 7 | 1,000 993<br />
alice 0 | foobar 7 | 1,000 993<br />
</pre><br />
<br />
=== Estimating CPU time ===<br />
It is advisable to estimate the wall clock time before large-scale (production) runs. To do this, use the following command:<br />
<pre><br />
sestimate -N NODES -t WALLTIME<br />
</pre><br />
<br />
where <code>NODES</code> is the number of nodes to be reserved and <code>WALLTIME</code> is the maximum run time.<br />
<br />
'''<br />
It is important to specify the wall clock time you want to reserve as accurately as possible, as the scheduler also ranks the jobs waiting to be run based on this. It is generally true that the shorter job will take place sooner. It is advisable to check the actual run time with the <code>sacct</code> command afterwards.'''<br />
<br />
=== Status information ===<br />
The <code>squeue</code> and the <code>sinfo</code> command provide information about the general state of the cluster. Each job submitted is assigned a unique identification number (JOBID). Knowing this, we can ask for more information. Characteristics of the submitted or already running job:<br />
<pre><br />
scontrol show job JOBID<br />
</pre><br />
<br />
Each job is also put into a so-called accounting database. From this you can retrieve the characteristics of the jobs you have run and the statistics of resource usage. You can view detailed statistics with the following command:<br />
<pre><br />
sacct -l -j JOBID<br />
</pre><br />
<br />
The following command provides information about the memory used:<br />
<pre><br />
smemory JOBID<br />
</pre><br />
<br />
The next one shows disk usage:<br />
<pre><br />
sdisk JOBID<br />
</pre><br />
<br />
==== SLURM warnings ====<br />
<pre><br />
Resources/AssociationResourceLimit - Erőforrásra vár<br />
AssociationJobLimit/QOSJobLimit - Nincs elég CPU idő vagy a maximális CPU szám le van foglalva<br />
Piority - Alacsony prioritás miatt várakozik<br />
</pre><br />
Az utóbbi esetben, csökkenteni kell a job által lefoglalni kívánt időt. Egy adott projekt részére maximálisan 512 CPU-n futhatnak jobok egy adott időben.<br />
<br />
==== Licenszek ellenőrzése ====<br />
Az elérhető és éppen használt licenszekről a következő parancs ad információt:<br />
<pre><br />
slicenses<br />
</pre><br />
<br />
==== Karbantartás ellenőrzése ====<br />
A karbantartási időablakban az ütemező nem indít új jobokat, de beküldeni lehet. A karbantartások időpontjairól a következő parancs ad tájékoztatást:<br />
<pre><br />
sreservations<br />
</pre><br />
<br />
==== Összesített felhasználás ====<br />
Egy hónapra visszamenőleg az elfogyasztott CPU perceket a következő paranccsal kérhetjük le:<br />
<pre><br />
susage<br />
</pre><br />
<br />
==== Teljes fogyasztás ====<br />
Ha szeretnénk tájékozódni arról, hogy egy bizony idő óta mennyi a CPU idő felhasználásunk akkor azt ezzel paranccsal tudjuk lekérdezni:<br />
<pre><br />
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01<br />
</pre><br />
<br />
=== Feladatok futtatása ===<br />
Alkalmazások futtatása a szupergépeken kötegelt (batch) üzemmódban lehetséges. Ez azt jelenti, hogy minden futtatáshoz egy job szkriptet kell elkészíteni, amely tartalmazza az igényelt erőforrások leírását és a futtatáshoz szükséges parancsokat. Az ütemező paramétereit (erőforrás igények) a <code>#SBATCH</code> direktívával kell megadni.<br />
<br />
==== Kötelező paraméterek ====<br />
A következő paramétereket minden esetben meg kell adni:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=NAME<br />
#SBATCH --time=TIME<br />
</pre><br />
ahol az <code>ACCOUNT</code> a terhelendő számla neve (elérhető számláinkről az <code>sbalance</code> parancs ad felvilágosítást), a <code>NAME</code> a job rövid neve, a <code>TIME</code> pedig a maximális walltime idő (<code>DD-HH:MM:SS</code>). A következő időformátumok használhatók: <br />
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" és "days-hours:minutes:seconds".<br />
<br />
==== GPU-k lefoglalása ====<br />
A GPU-k lefoglalása a következő direktívával törénik:<br />
<pre><br />
#SBATCH --gres=gpu:N<br />
</pre><br />
Az <code>N</code> a GPU-k/node számát adja meg, ami 1, 2 és 3 lehet maximum.<br />
<br />
==== Interaktív használat ====<br />
Rövid interaktív feladatokat az 'srun' paranccsal tudunk beküldeni, pl.:<br />
<pre><br />
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP<br />
</pre><br />
<br />
==== Batch job-ok indítása ====<br />
A jobok feladását a következő parancs végzi:<br />
<pre><br />
sbatch slurm.sh<br />
</pre><br />
<br />
Sikeres feladás esetén a következő kimenetet kapjuk:<br />
<pre><br />
Submitted batch job JOBID<br />
</pre><br />
ahol a <code>JOBID</code> a feladat egyedi azonosítószáma.<br />
<br />
A feladat leállítását a következő parancs végzi:<br />
<pre><br />
scancel JOBID<br />
</pre><br />
<br />
==== Nem újrainduló jobok ====<br />
Nem újrainduló jobokhoz a következő direktívát kell használni:<br />
<pre><br />
#SBATCH --no-requeue<br />
</pre><br />
<br />
==== Feladat sorok ====<br />
A szupergépen két, egymást nem átfedő, sor (partíció) áll rendelkezésre, a <code>prod-gpu-k40</code> sor és a <code>prod-gpu-k20</code> sor. Mind a kettő éles számolásokra való, az első olyan CN gépeket tartalmaz amikben Nvidia K40x GPU-k, a másodikban pedig Nvidia K20x GPU-k vannak. Az alapértelmezett sor a <code> prod-gpu-k20</code>. A prod-gpu-k40 partíciót a következő direktívával lehet kiválasztani:<br />
<pre><br />
#SBATCH --partition=prod-gpu-k40<br />
</pre><br />
<br />
==== A szolgáltatás minősége (QOS) ====<br />
A szolgáltatást alapértelmezett minősége <code>normal</code>, azaz nem megszakítható a futás.<br />
<br />
===== Magas prioritás =====<br />
A magas prioritású jobok maximum 24 óráig futhatnak, és kétszer gyorsabb időelszámolással rendelkeznek, cserébe az ütemező előreveszi ezeket a feladatokat.<br />
<pre><br />
#SBATCH --qos=fast<br />
</pre><br />
<br />
===== Alacsony prioritás =====<br />
Lehetőség van alacsony prioritású jobok feladására is. Az ilyen feladatokat bármilyen normál prioritású job bármikor megszakíthatja, cserébe az elhasznált gépidő fele számlázódik csak. A megszakított jobok automatikusan újraütemeződnek. Fontos, hogy olyan feladatokat indítsunk alacsony prioritással, amelyek kibírják a véletlenszerű megszakításokat, rendszeresen elmentik az állapotukat (checkpoint) és ebből gyorsan újra tudnak indulni.<br />
<pre><br />
#SBATCH --qos=lowpri<br />
</pre><br />
<br />
==== Memória foglalás ====<br />
Alapértelmezetten 1 CPU core-hoz 1000 MB memória van rendelve, ennél többet a következő direktívával igényelhetünk:<br />
<pre><br />
#SBATCH --mem-per-cpu=MEMORY<br />
</pre><br />
ahol <code>MEMORY</code> MB egységben van megadva. A maximális memória/core 7800 MB lehet.<br />
<br />
==== Email értesítés ====<br />
Levél küldése job állapotának változásakor (elindulás,leállás,hiba):<br />
<pre><br />
#SBATCH --mail-type=ALL<br />
#SBATCH --mail-user=EMAIL<br />
</pre><br />
ahol az <code>EMAIL</code> az értesítendő emial cím.<br />
<br />
==== Tömbfeladatok (arrayjob) ====<br />
Tömbfeladatokra akkor van szükségünk, egy szálon futó (soros) alkalmazást szeretnénk egyszerre sok példányban (más-más adatokkal) futtatni. A példányok számára az ütemező a <code>SLURM_ARRAY_TASK_ID</code> környezeti változóban tárolja az egyedi azonosítót. Ennek lekérdezésével lehet az arrayjob szálait elkülöníteni. A szálak kimenetei a <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> fájlokba íródnak. Az ütemező a feltöltést szoros pakolás szerint végzi. Ebben az esetben is érdemes a processzorszám többszörösének választani a szálak számát. [http://slurm.schedmd.com/job_array.html Bővebb ismertető]<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=array<br />
#SBATCH --time=24:00:00<br />
#SBATCH --array=1-96<br />
srun envtest.sh<br />
</pre><br />
<br />
==== OpenMPI feladatok ====<br />
MPI feladatok esetén meg kell adnunk az egy node-on elinduló MPI processzek számát is (<code>#SBATCH --ntasks-per-node=</code>). A leggyakoribb esetben ez az egy node-ban található CPU core-ok száma. A párhuzamos programot az <code>mpirun</code> paranccsal kell indítani.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=mpi<br />
#SBATCH -N 2<br />
#SBATCH --ntasks-per-node=8<br />
#SBATCH --time=12:00:00<br />
mpirun --report-pid ${TMPDIR}/mpirun.pid PROGRAM<br />
</pre><br />
<br />
OpenMPI FAQ: http://www.open-mpi.org/faq<br />
<br />
==== OpenMP (OMP) feladatok ====<br />
OpenMP párhuzamos alkalmazásokhoz maximum 1 node-ot lehet lefoglalni. Az OMP szálák számát az <code>OMP_NUM_THREADS</code> környezeti változóval kell megadni. A változót vagy az alkamazás elé kell írni (ld. példa), vagy exportálni kell az indító parancs előtt:<br />
<code><br />
export OMP_NUM_THREADS=8<br />
</code><br />
<br />
A következő példában egy taskhoz 8 CPU core-t rendeltunk, a 8 CPU core-nak egy node-on kell lennie. A CPU core-ok számát a <code><br />
SLURM_CPUS_PER_TASK</code> változó tartalmazza, és ez állítja be az OMP szálak számát is.<br />
<br />
Alice felhasználó a foobar számla terhére, maximum 6 órára indít el egy 8 szálas OMP alkalmazást.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=omp<br />
#SBATCH --time=06:00:00<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=8<br />
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./a.out<br />
</pre><br />
<br />
==== Hibrid MPI-OMP feladatok ====<br />
Hibrid MPI-OMP módról akkor beszélünk, ha a párhuzamos alkalmazás MPI-t és OMP-t is használ. Érdemes tudni, hogy az Intel MKL-el linkelt programok MKL hívásai OpenMP képesek. Általában a következő elosztás javasolt: az MPI processzek száma 1-től az egy node-ban található CPU foglalatok száma, az OMP szálak ennek megfelelően az egy node-ban található összes CPU core szám vagy annak fele, negyede (értelem szerűen). A jobszkipthez a fenti két mód paramétereit kombinálni kell.<br />
<br />
A következő példában 2 node-ot, és node-onként 1-1 taskot indítunk taskonként 10 szállal. Alice felhasználó a foobar számla terhére, 8 órára, 2 node-ra küldött be egy hibrid jobot. Egy node-on egyszerre csak 1 db MPI processz fut ami node-onként 8 OMP szálat használ. A 2 gépen összesen 2 MPI proceszz és 2 x 8 OMP szál fut.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=mpiomp<br />
#SBATCH --time=08:00:00<br />
#SBATCH -N 2<br />
#SBATCH --ntasks=2<br />
#SBATCH --ntasks-per-node=1<br />
#SBATCH --cpus-per-task=8<br />
#SBATCH -o slurm.out<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
mpirun ./a.out<br />
</pre><br />
<br />
==== Maple Grid feladatok ====<br />
Maple-t az OMP feladatokhoz hasonlóan 1 node-on lehet futtatni. Használatához be kell tölteni a maple modult is. A Maple kliens-szerver üzemmódban működik ezért a Maple feladat futtatása előtt szükség van a grid szerver elindítására is (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>). Ez az alkalmazás licensz köteles, amit a jobszkriptben meg kell adni (<code>#SBATCH --licenses=maplegrid:1</code>). A Maple feladat indátását a <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> paranccsal kell elvégezni.<br />
<br />
Alice felhasználó a foobar számla terhére, 6 órára indítja el a Maple Grid alkalmazást:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=maple<br />
#SBATCH -N 1<br />
#SBATCH --ntasks-per-node=16<br />
#SBATCH --time=06:00:00<br />
#SBATCH -o slurm.out<br />
#SBATCH --licenses=maplegrid:1<br />
<br />
module load maple<br />
<br />
${MAPLE}/toolbox/Grid/bin/startserver<br />
${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl<br />
</pre><br />
<br />
<br />
[[Category: HPC]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en&diff=4867Debrecen2 GPU klaszter en2021-06-24T08:24:35Z<p>Itamas(AT)niif.hu: </p>
<hr />
<div>{| class="wikitable" border="1" <br />
|- <br />
| Cluster<br />
| Debrecen2 (Leo)<br />
|- <br />
| Type<br />
| HP SL250s<br />
|- <br />
| Core / node<br />
| 8 × 2 Xeon E5-2650v2 2.60GHz <br />
|- <br />
| GPU / node<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x<br />
|- <br />
| # of compute nodes<br />
| 84<br />
|-<br />
| Max Walltime<br />
| 7-00:00:00<br />
|-<br />
| Max core / project<br />
| 336<br />
|-<br />
| Max mem / core<br />
| 7000 MB<br />
|}<br />
<br />
=== Requesting CPU time ===<br />
<br />
{{ATTENTION| When applying for CPU time, we expect a brief justification from the HPC project managers stating that the application to be run is capable of using a GPU (except when the purpose is to use licensed software – available on the machine – that is unable to use a GPU, e.g. Gaussian, Maple). This is necessary because most of the HPC resource performance comes from GPU acceleration, so a program without acceleration that allocates CPUs, would be limiting the use of GPUs leading to underutilization. NVIDIA released [http://www.nvidia.com/object/gpu-applications.html '''a list'''] of applications officially supported by NVIDIA GPUs but of course other programs that use GPUs are also likely to perform well on the machine.<br />
<br />
* For those who are interested in GPU programming we held a workshop the video materials of which are available here: [http://videotorium.hu/hu/events/details/1864,GPU_programozas_workshop GPU programming workshop (videotorium)]<br />
}}<br />
<br />
=== Login ===<br />
<pre><br />
ssh USER@login.debrecen2.hpc.niif.hu<br />
</pre><br />
If a non-default key is used, it must be specified with the -i KEY option (SSH and SCP commands).<br />
<br />
=== Copying files with SCP ===<br />
Download from the HOME directory and upload to the HOME directory:<br />
<pre><br />
Up: scp FILE USER@login.debrecen2.hpc.niif.hu: FILE<br />
Down: scp USER@login.debrecen2.hpc.niif.hu: FILE FILE<br />
</pre><br />
<br />
=== Data synchronization ===<br />
Larger files / directory structures shall be synchronized using the following commands<br />
<pre><br />
Up: rsync -a -e ssh DIRECTORY USER@login.debrecen2.hpc.niif.hu:/home/USER<br />
Down: rsync -a -e ssh USER@login.debrecen2.hpc.niif.hu:/home/USER/DIRECTORY<br />
</pre><br />
The --delete option must be specified to synchronize deleted files.<br />
<br />
== User interface ==<br />
<pre><br />
short form of CWD<br />
|<br />
DEBRECEN2[login] ~ (0)$<br />
| | |<br />
HPC station | |<br />
short machine name |<br />
exit code of the previous command<br />
</pre><br />
<br />
=== Module environment ===<br />
The list of available modules is obtained with the following command:<br />
<pre><br />
module avail<br />
</pre><br />
the list of already loaded modules:<br />
<pre><br />
module list<br />
</pre><br />
You can load an application with the following command:<br />
<pre><br />
module load APP<br />
</pre><br />
The environment variables set by KIFÜ are listed by the nce command.<br />
<br />
=== Data sharing for project members ===<br />
To share files / directories ACLs must be set. To make the HOME directory readable by another user (OTHER):<br />
<pre><br />
setfacl -m u:OTHER:rx $HOME<br />
</pre><br />
To make a specific directory (DIRECTORY) writable:<br />
<pre><br />
setfacl -m u:OTHER:rxw $HOME/DIRECTORY<br />
</pre><br />
You can list extended rights with the following command:<br />
<pre><br />
getfacl $HOME/DIRECTORY<br />
</pre><br />
<br />
== Using a shared home directory ==<br />
The common file system that is available for the login nodes of the supercomputers is accessible under the following path:<br />
<pre><br />
/mnt/fhgfs/home/$USER<br />
</pre><br />
Backups could be made into the shared directory with the following command:<br />
<pre><br />
rsync -avuP --delete $HOME/DIRECTORY /mnt/fhgfs/home/$USER<br />
</pre><br />
<br />
== Compiling applications ==<br />
<br />
Users are encouraged to try compiling needed applications in their own home directory first. If it fails for some reason, then the next step is to ask the Hungarian supercomputer users because there is a good chance that others have already run into the same problem. They can be reached at: <code>hpc-forum at listserv.niif.hu</code>. You can subscribe to this mailing list [https://listserv.niif.hu/mailman/listinfo/hpc-forum|here]. You should also check the archive when looking into the issue. KIFÜ HPC support has extremely limited capacity to handle individual compiling requests but still you may contact <code>hpc-support at niif.hu</code> with your problem. In the latter case please be patient for a few days while waiting for responses.<br />
<br />
== Using the SLURM scheduler ==<br />
The supercomputer has a CPU hour (machine time) based schedule. The following command provides information about the status of the user's Slurm projects (Account):<br />
<pre><br />
sbalance<br />
</pre><br />
A második oszlopban (Usage) az egyes felhasználók elhasznált gépideje, a negyedik oszlopban pedig a számla összesített gépideje látható. Az utolsó két oszlop a maximális (Account Limit) és a még elérhető (Available) gépidőről ad tájékoztatást.<br />
<pre><br />
Scheduler Account Balance<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
User Usage | Account Usage | Account Limit Available (CPU hrs)<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
bob * 7 | foobar 7 | 1,000 993<br />
alice 0 | foobar 7 | 1,000 993<br />
</pre><br />
<br />
=== A gépidő becslése ===<br />
Nagyüzemi (production) futtatások előtt gépidőbecslést érdemes végezni. Ehhez a következő parancs használható:<br />
<pre><br />
sestimate -N NODES -t WALLTIME<br />
</pre><br />
ahol a <code>NODES</code> a lefoglalni kívánt node-ok száma, a <code>WALLTIME</code> pedig a futás maximális ideje.<br />
<br />
'''Fontos, hogy a lefoglalni kívánt gépidőt a lehető legpontosabban adjuk meg, mivel az ütemező ez alapján is rangsorolja a futtatásra váró feladatokat. Általában igaz, hogy a rövidebb job hamarabb sorra kerül. Érdemes minden futás idejét utólag az <code>sacct</code> paranccsal is ellenőrizni.'''<br />
<br />
=== Állapotinformációk ===<br />
Az ütemezőben lévő jobokról az <code>squeue</code>, a klaszter általános állapotáról az <code>sinfo</code> parancs ad tájékoztatást. Minden beküldött jobhoz egy egyedi azonosítószám (JOBID) rendelődik. Ennek ismeretében további információkat kérhetünk. Feladott vagy már futó job jellemzői:<br />
<pre><br />
scontrol show job JOBID<br />
</pre><br />
<br />
Minden job egy ún. számlázási adatbázisba (accounting) is bekerül. Ebből az adatbázisból visszakereshetők a lefuttatott feladatok jellemzői és erőforrás-felhasználás statisztikái. A részletes statisztikát a következő paranccsal tudjuk megnézni:<br />
<pre><br />
sacct -l -j JOBID<br />
</pre><br />
<br />
A felhasznált memóriáról a következő parancs ad tájékoztatást:<br />
<pre><br />
smemory JOBID<br />
</pre><br />
<br />
A lemezhasználatról pedig a<br />
<pre><br />
sdisk JOBID<br />
</pre><br />
<br />
==== Slurm figyelmeztető üzenetek ====<br />
<pre><br />
Resources/AssociationResourceLimit - Erőforrásra vár<br />
AssociationJobLimit/QOSJobLimit - Nincs elég CPU idő vagy a maximális CPU szám le van foglalva<br />
Piority - Alacsony prioritás miatt várakozik<br />
</pre><br />
Az utóbbi esetben, csökkenteni kell a job által lefoglalni kívánt időt. Egy adott projekt részére maximálisan 512 CPU-n futhatnak jobok egy adott időben.<br />
<br />
==== Licenszek ellenőrzése ====<br />
Az elérhető és éppen használt licenszekről a következő parancs ad információt:<br />
<pre><br />
slicenses<br />
</pre><br />
<br />
==== Karbantartás ellenőrzése ====<br />
A karbantartási időablakban az ütemező nem indít új jobokat, de beküldeni lehet. A karbantartások időpontjairól a következő parancs ad tájékoztatást:<br />
<pre><br />
sreservations<br />
</pre><br />
<br />
==== Összesített felhasználás ====<br />
Egy hónapra visszamenőleg az elfogyasztott CPU perceket a következő paranccsal kérhetjük le:<br />
<pre><br />
susage<br />
</pre><br />
<br />
==== Teljes fogyasztás ====<br />
Ha szeretnénk tájékozódni arról, hogy egy bizony idő óta mennyi a CPU idő felhasználásunk akkor azt ezzel paranccsal tudjuk lekérdezni:<br />
<pre><br />
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01<br />
</pre><br />
<br />
=== Feladatok futtatása ===<br />
Alkalmazások futtatása a szupergépeken kötegelt (batch) üzemmódban lehetséges. Ez azt jelenti, hogy minden futtatáshoz egy job szkriptet kell elkészíteni, amely tartalmazza az igényelt erőforrások leírását és a futtatáshoz szükséges parancsokat. Az ütemező paramétereit (erőforrás igények) a <code>#SBATCH</code> direktívával kell megadni.<br />
<br />
==== Kötelező paraméterek ====<br />
A következő paramétereket minden esetben meg kell adni:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=NAME<br />
#SBATCH --time=TIME<br />
</pre><br />
ahol az <code>ACCOUNT</code> a terhelendő számla neve (elérhető számláinkről az <code>sbalance</code> parancs ad felvilágosítást), a <code>NAME</code> a job rövid neve, a <code>TIME</code> pedig a maximális walltime idő (<code>DD-HH:MM:SS</code>). A következő időformátumok használhatók: <br />
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" és "days-hours:minutes:seconds".<br />
<br />
==== GPU-k lefoglalása ====<br />
A GPU-k lefoglalása a következő direktívával törénik:<br />
<pre><br />
#SBATCH --gres=gpu:N<br />
</pre><br />
Az <code>N</code> a GPU-k/node számát adja meg, ami 1, 2 és 3 lehet maximum.<br />
<br />
==== Interaktív használat ====<br />
Rövid interaktív feladatokat az 'srun' paranccsal tudunk beküldeni, pl.:<br />
<pre><br />
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP<br />
</pre><br />
<br />
==== Batch job-ok indítása ====<br />
A jobok feladását a következő parancs végzi:<br />
<pre><br />
sbatch slurm.sh<br />
</pre><br />
<br />
Sikeres feladás esetén a következő kimenetet kapjuk:<br />
<pre><br />
Submitted batch job JOBID<br />
</pre><br />
ahol a <code>JOBID</code> a feladat egyedi azonosítószáma.<br />
<br />
A feladat leállítását a következő parancs végzi:<br />
<pre><br />
scancel JOBID<br />
</pre><br />
<br />
==== Nem újrainduló jobok ====<br />
Nem újrainduló jobokhoz a következő direktívát kell használni:<br />
<pre><br />
#SBATCH --no-requeue<br />
</pre><br />
<br />
==== Feladat sorok ====<br />
A szupergépen két, egymást nem átfedő, sor (partíció) áll rendelkezésre, a <code>prod-gpu-k40</code> sor és a <code>prod-gpu-k20</code> sor. Mind a kettő éles számolásokra való, az első olyan CN gépeket tartalmaz amikben Nvidia K40x GPU-k, a másodikban pedig Nvidia K20x GPU-k vannak. Az alapértelmezett sor a <code> prod-gpu-k20</code>. A prod-gpu-k40 partíciót a következő direktívával lehet kiválasztani:<br />
<pre><br />
#SBATCH --partition=prod-gpu-k40<br />
</pre><br />
<br />
==== A szolgáltatás minősége (QOS) ====<br />
A szolgáltatást alapértelmezett minősége <code>normal</code>, azaz nem megszakítható a futás.<br />
<br />
===== Magas prioritás =====<br />
A magas prioritású jobok maximum 24 óráig futhatnak, és kétszer gyorsabb időelszámolással rendelkeznek, cserébe az ütemező előreveszi ezeket a feladatokat.<br />
<pre><br />
#SBATCH --qos=fast<br />
</pre><br />
<br />
===== Alacsony prioritás =====<br />
Lehetőség van alacsony prioritású jobok feladására is. Az ilyen feladatokat bármilyen normál prioritású job bármikor megszakíthatja, cserébe az elhasznált gépidő fele számlázódik csak. A megszakított jobok automatikusan újraütemeződnek. Fontos, hogy olyan feladatokat indítsunk alacsony prioritással, amelyek kibírják a véletlenszerű megszakításokat, rendszeresen elmentik az állapotukat (checkpoint) és ebből gyorsan újra tudnak indulni.<br />
<pre><br />
#SBATCH --qos=lowpri<br />
</pre><br />
<br />
==== Memória foglalás ====<br />
Alapértelmezetten 1 CPU core-hoz 1000 MB memória van rendelve, ennél többet a következő direktívával igényelhetünk:<br />
<pre><br />
#SBATCH --mem-per-cpu=MEMORY<br />
</pre><br />
ahol <code>MEMORY</code> MB egységben van megadva. A maximális memória/core 7800 MB lehet.<br />
<br />
==== Email értesítés ====<br />
Levél küldése job állapotának változásakor (elindulás,leállás,hiba):<br />
<pre><br />
#SBATCH --mail-type=ALL<br />
#SBATCH --mail-user=EMAIL<br />
</pre><br />
ahol az <code>EMAIL</code> az értesítendő emial cím.<br />
<br />
==== Tömbfeladatok (arrayjob) ====<br />
Tömbfeladatokra akkor van szükségünk, egy szálon futó (soros) alkalmazást szeretnénk egyszerre sok példányban (más-más adatokkal) futtatni. A példányok számára az ütemező a <code>SLURM_ARRAY_TASK_ID</code> környezeti változóban tárolja az egyedi azonosítót. Ennek lekérdezésével lehet az arrayjob szálait elkülöníteni. A szálak kimenetei a <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> fájlokba íródnak. Az ütemező a feltöltést szoros pakolás szerint végzi. Ebben az esetben is érdemes a processzorszám többszörösének választani a szálak számát. [http://slurm.schedmd.com/job_array.html Bővebb ismertető]<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=array<br />
#SBATCH --time=24:00:00<br />
#SBATCH --array=1-96<br />
srun envtest.sh<br />
</pre><br />
<br />
==== OpenMPI feladatok ====<br />
MPI feladatok esetén meg kell adnunk az egy node-on elinduló MPI processzek számát is (<code>#SBATCH --ntasks-per-node=</code>). A leggyakoribb esetben ez az egy node-ban található CPU core-ok száma. A párhuzamos programot az <code>mpirun</code> paranccsal kell indítani.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=mpi<br />
#SBATCH -N 2<br />
#SBATCH --ntasks-per-node=8<br />
#SBATCH --time=12:00:00<br />
mpirun --report-pid ${TMPDIR}/mpirun.pid PROGRAM<br />
</pre><br />
<br />
OpenMPI FAQ: http://www.open-mpi.org/faq<br />
<br />
==== OpenMP (OMP) feladatok ====<br />
OpenMP párhuzamos alkalmazásokhoz maximum 1 node-ot lehet lefoglalni. Az OMP szálák számát az <code>OMP_NUM_THREADS</code> környezeti változóval kell megadni. A változót vagy az alkamazás elé kell írni (ld. példa), vagy exportálni kell az indító parancs előtt:<br />
<code><br />
export OMP_NUM_THREADS=8<br />
</code><br />
<br />
A következő példában egy taskhoz 8 CPU core-t rendeltunk, a 8 CPU core-nak egy node-on kell lennie. A CPU core-ok számát a <code><br />
SLURM_CPUS_PER_TASK</code> változó tartalmazza, és ez állítja be az OMP szálak számát is.<br />
<br />
Alice felhasználó a foobar számla terhére, maximum 6 órára indít el egy 8 szálas OMP alkalmazást.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=omp<br />
#SBATCH --time=06:00:00<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=8<br />
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./a.out<br />
</pre><br />
<br />
==== Hibrid MPI-OMP feladatok ====<br />
Hibrid MPI-OMP módról akkor beszélünk, ha a párhuzamos alkalmazás MPI-t és OMP-t is használ. Érdemes tudni, hogy az Intel MKL-el linkelt programok MKL hívásai OpenMP képesek. Általában a következő elosztás javasolt: az MPI processzek száma 1-től az egy node-ban található CPU foglalatok száma, az OMP szálak ennek megfelelően az egy node-ban található összes CPU core szám vagy annak fele, negyede (értelem szerűen). A jobszkipthez a fenti két mód paramétereit kombinálni kell.<br />
<br />
A következő példában 2 node-ot, és node-onként 1-1 taskot indítunk taskonként 10 szállal. Alice felhasználó a foobar számla terhére, 8 órára, 2 node-ra küldött be egy hibrid jobot. Egy node-on egyszerre csak 1 db MPI processz fut ami node-onként 8 OMP szálat használ. A 2 gépen összesen 2 MPI proceszz és 2 x 8 OMP szál fut.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=mpiomp<br />
#SBATCH --time=08:00:00<br />
#SBATCH -N 2<br />
#SBATCH --ntasks=2<br />
#SBATCH --ntasks-per-node=1<br />
#SBATCH --cpus-per-task=8<br />
#SBATCH -o slurm.out<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
mpirun ./a.out<br />
</pre><br />
<br />
==== Maple Grid feladatok ====<br />
Maple-t az OMP feladatokhoz hasonlóan 1 node-on lehet futtatni. Használatához be kell tölteni a maple modult is. A Maple kliens-szerver üzemmódban működik ezért a Maple feladat futtatása előtt szükség van a grid szerver elindítására is (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>). Ez az alkalmazás licensz köteles, amit a jobszkriptben meg kell adni (<code>#SBATCH --licenses=maplegrid:1</code>). A Maple feladat indátását a <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> paranccsal kell elvégezni.<br />
<br />
Alice felhasználó a foobar számla terhére, 6 órára indítja el a Maple Grid alkalmazást:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=maple<br />
#SBATCH -N 1<br />
#SBATCH --ntasks-per-node=16<br />
#SBATCH --time=06:00:00<br />
#SBATCH -o slurm.out<br />
#SBATCH --licenses=maplegrid:1<br />
<br />
module load maple<br />
<br />
${MAPLE}/toolbox/Grid/bin/startserver<br />
${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl<br />
</pre><br />
<br />
<br />
[[Category: HPC]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=NIIF_HPC_(English)&diff=4866NIIF HPC (English)2021-06-15T11:19:36Z<p>Itamas(AT)niif.hu: /* More information */</p>
<hr />
<div>[[HPC|Magyar változat]]<br />
== Supercomputing ==<br />
NIIF Institute has been operating supercomputer since 2001. The service provides scientific computation and data storage facilities. The components of the supercomputing infrastructure found in five locations:<br />
<br />
* NIIF centre Budapest<br />
* University of Debrecen<br />
* University of Pécs<br />
* University of Szeged<br />
* University of Miskolc<br />
<br />
<br />
[[Fájl:Orszagos-attekinto_hpc.png|center|700px|Az NIIF szuperszámítógépei térképen]]<br />
<br />
<br />
<br />
Various supercomputer architectures are available in different locations, because we need to support different types of scientific computing tasks. In order to support all kinds of scientific computational needs different computer architectures are available. We have cluster based supercomputers in three locations and two ccNUMA machine. The supercomputers are operated by NIIF Institute staff. The latest development tools and scientific applications are available for the users. The documentation of the supercomputers can be found here.<br />
<br />
Any person or research group who has a contractual relationship with the NIIF Institute can use the service. Usage of the supercomputers is free for authorised users. New users should request a HPC project here. After the project is accepted invitations can be sent out for group members by the principal investigator. Invitations for the HPC portal can be managed [https://portal.hpc.niif.hu/ here].<br />
<br />
[http://hpc.niif.hu/index_en.php NIIF HPC website]<br />
<br />
==== NIIF HPC resources ====<br />
<br />
{| class="wikitable" border="1" <br />
|- <br />
| Location<br />
| Budapest<br />
| [[Budapest2_klaszter|Budapest2]]<br />
| Szeged<br />
| Debrecen<br />
| [[Debrecen2_GPU_klaszter|Debrecen2-GPU (LEO)]] <br />
| [[Debrecen2_Phi_klaszter|Debrecen3-Phi (Apollo) ]]<br />
| Pécs<br />
| [[Miskolc_UV_2000|Miskolc]]<br />
|- <br />
| Type<br />
| HP CP4000SL<br />
| HP SL250s <br />
| HP CP4000BL<br />
| SGI ICE8400EX<br />
| HP SL250s<br />
<br />
| HP Apollo 8000<br />
| SGI UV 1000<br />
<br />
| SGI UV 2000<br />
|- <br />
| CPU / node <br />
| 2<br />
| 2 <br />
| 4 <br />
| 2<br />
| 2<br />
<br />
| 2 <br />
| 192<br />
<br />
| 44 <br />
|- <br />
| Core / CPU <br />
| 12<br />
| 10<br />
| 12<br />
| 6<br />
| 8<br />
| 12<br />
| 6<br />
| 8<br />
|- <br />
| Memory / node <br />
| 66 GB<br />
| 63 GB <br />
| 132 GB<br />
| 47 GB<br />
| 125 GB<br />
| 125 GB<br />
| 6 TB<br />
| 1.4 TB<br />
|- <br />
| Memory / core <br />
| 2.6 GB<br />
| 3 GB <br />
| 2.6 GB<br />
| 2.6 GB<br />
| 7.5 GB<br />
| 7.5 GB<br />
| 5 GB<br />
| 4 GB<br />
|- <br />
| CPU <br />
| AMD Opteron 6174 @ 2.2GHz<br />
| Intel Xeon E5-2680 v2 @ 2.80GHz <br />
| AMD Opteron 6174 @ 2.2GHz <br />
| Intel Xeon X5680 @ 3.33 GHz<br />
| Intel Xeon E5-2650 v2 @ 2.60GHz <br />
<br />
| Intel Xeon E5-2670 v3 @ 2.30GHz<br />
| Intel Xeon X7542 @ 2.66 GHz<br />
| Intel Xeon E5-4627 v2 @ 3.33 GHz<br />
|- <br />
| GPU <br />
| -<br />
| - <br />
| 2 * 6 Nvidia M2070<br />
| -<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x <br />
| -<br />
| -<br />
| -<br />
|-<br />
| Intel Xeon Phi (KNC)<br />
| -<br />
| 14 * 2 * Intel(R) Xeon Phi(TM) MIC SE10/7120<br />
| -<br />
| - <br />
| -<br />
<br />
| 45 * 2 * Intel(R) Xeon Phi(TM) MIC SE10/7120<br />
| -<br />
| -<br />
|- <br />
| Linpack Performance (Rmax)<br />
| 5 Tflops<br />
| 27 Tflops<br />
| 20 Tflops<br />
| 18 Tlops<br />
| 254 Tflops<br />
| ~106 Tflops<br />
| 10 Tflops<br />
| 8 Tflops<br />
|- <br />
| Compute nodes<br />
| 32<br />
| 14<br />
| 50<br />
| 128<br />
| 84<br />
| 45<br />
| 1<br />
<br />
| 1<br />
|- <br />
| Dedicated storage <br />
| 50 TB<br />
| 500 TB<br />
| 250 TB<br />
| 500 TB<br />
| 585 TB (combined with Phi)<br />
| 585 TB (combined with GPU)<br />
| 500 TB<br />
| 240 TB<br />
|- <br />
| Interconnect <br />
| IB QDR<br />
| IB NB FDR<br />
| IB QDR<br />
| IB QDR<br />
| IB NB FDR<br />
| IB NB FDR<br />
| Numalink 6<br />
| Numalink 6<br />
|- <br />
| Scheduler <br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM] <br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM] <br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM] <br />
| [http://slurm.schedmd.com/ SLURM] <br />
|- <br />
| MPI <br />
| OpenMPI (ompi)<br />
| IntelMPI (impi) <br />
| OpenMPI (ompi) <br />
| SGI MPT (mpt)<br />
| OpenMPI (ompi)<br />
<br />
| OpenMPI (ompi)<br />
| SGI MPT (mpt)<br />
<br />
| SGI MPT (mpt)<br />
|}<br />
<br />
==== Contact ====<br />
<br />
If you have any questions send email to the following address: hpc-support@niif.hu<br />
<br />
Or contact the HPC users in Hungary on the following mailing list: hpc-forum@listserv.niif.hu, you can subscribe [https://listserv.niif.hu/mailman/listinfo/hpc-forum here].<br />
<br />
== More information ==<br />
* [[Debrecen2 GPU klaszter en|Debrecen 2 GPU cluster (LEO)]]<br />
* [http://www.niif.hu/node/679 HPC information at NIIF.HU]<br />
* [[HPC_software|List of Software installed on HPCs]]<br />
<br />
* [[PRACE_User_Support|User Support (PRACE, and general)]]<br />
* [http://www.training.prace-ri.eu PRACE training portal]<br />
* [http://colfaxresearch.com/how-16-04/ Xeon Phi Workshop / Tutorial]<br />
<br />
[[Kategória: HPC]]<br />
[[Kategória: Összefoglaló lapok]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=Debrecen2_GPU_klaszter_en&diff=4865Debrecen2 GPU klaszter en2021-06-15T11:16:16Z<p>Itamas(AT)niif.hu: Új oldal, tartalma: „{| class="wikitable" border="1" |- | Cluster | Debrecen2 (Leo) |- | Type | HP SL250s |- | Core / node | 8 × 2 Xeon E5-2650v2 2.60GHz |- | GPU / node | 68 * 3 Nv…”</p>
<hr />
<div>{| class="wikitable" border="1" <br />
|- <br />
| Cluster<br />
| Debrecen2 (Leo)<br />
|- <br />
| Type<br />
| HP SL250s<br />
|- <br />
| Core / node<br />
| 8 × 2 Xeon E5-2650v2 2.60GHz <br />
|- <br />
| GPU / node<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x<br />
|- <br />
| # of compute nodes<br />
| 84<br />
|-<br />
| Max Walltime<br />
| 7-00:00:00<br />
|-<br />
| Max core / project<br />
| 336<br />
|-<br />
| Max mem / core<br />
| 7000 MB<br />
|}<br />
<br />
=== Requesting CPU time ===<br />
<br />
{{ATTENTION| When applying for CPU time, we expect a brief justification from the HPC project managers stating that the application to be run is capable of using a GPU (except when the purpose is to use licensed software – available on the machine – that is unable to use a GPU, e.g. Gaussian, Maple). This is necessary because most of the HPC resource performance comes from GPU acceleration, so a program without acceleration that allocates CPUs, would be limiting the use of GPUs leading to underutilization. NVIDIA released [http://www.nvidia.com/object/gpu-applications.html '''a list'''] of applications officially supported by NVIDIA GPUs but of course other programs that use GPUs are also likely to perform well on the machine.<br />
<br />
* For those who are interested in GPU programming we held a workshop the video materials of which are available here: [http://videotorium.hu/hu/events/details/1864,GPU_programozas_workshop GPU programming workshop (videotorium)]<br />
}}<br />
<br />
=== Login ===<br />
<pre><br />
ssh USER@login.debrecen2.hpc.niif.hu<br />
</pre><br />
If a non-default key is used, it must be specified with the -i KEY option (SSH and SCP commands).<br />
<br />
=== Copying files with SCP ===<br />
Download from the HOME directory and upload to the HOME directory:<br />
<pre><br />
Up: scp FILE USER@login.debrecen2.hpc.niif.hu: FILE<br />
Down: scp USER@login.debrecen2.hpc.niif.hu: FILE FILE<br />
</pre><br />
<br />
=== Data synchronization ===<br />
Larger files / directory structures shall be synchronized using the following commands<br />
<pre><br />
Up: rsync -a -e ssh DIRECTORY USER@login.debrecen2.hpc.niif.hu:/home/USER<br />
Down: rsync -a -e ssh USER@login.debrecen2.hpc.niif.hu:/home/USER/DIRECTORY<br />
</pre><br />
The --delete option must be specified to synchronize deleted files.<br />
<br />
== User interface ==<br />
<pre><br />
short form of CWD<br />
|<br />
DEBRECEN2[login] ~ (0)$<br />
| | |<br />
HPC station | |<br />
short machine name |<br />
exit code of the previous command<br />
</pre><br />
<br />
=== Module environment ===<br />
The list of available modules is obtained with the following command:<br />
<pre><br />
module avail<br />
</pre><br />
the list of already loaded modules:<br />
<pre><br />
module list<br />
</pre><br />
You can load an application with the following command:<br />
<pre><br />
module load APP<br />
</pre><br />
A NIIF által beállított környezeti változókat <code>nce</code> parancs listázza ki.<br />
<br />
=== Adatok megosztása projekt tagok számára ===<br />
Fájlok ill. könyvtárak megosztásához ACL-eket kell beállítani. A HOME könyvtárat más felhasználó (OTHER) számára olvashatóvá, így tehetjük<br />
<pre><br />
setfacl -m u:OTHER:rx $HOME<br />
</pre><br />
Addott könyvtárat (DIRECTORY) írahtóvá:<br />
<pre><br />
setfacl -m u:OTHER:rxw $HOME/DIRECTORY<br />
</pre><br />
A kiterjesztett jogokat a következő paranccsal kérhetjük le:<br />
<pre><br />
getfacl $HOME/DIRECTORY<br />
</pre><br />
<br />
== Közös home könyvtár használata ==<br />
A szuperszámítógépek login gépeit összekötő közös fájlrendszer a következő könyvtár alatt található:<br />
<pre><br />
/mnt/fhgfs/home/$USER<br />
</pre><br />
Biztonsági mentést a közös könyvtárba a következő paranccsal tudunk végezni:<br />
<pre><br />
rsync -avuP --delete $HOME/DIRECTORY /mnt/fhgfs/home/$USER<br />
</pre><br />
<br />
== Alkalmazások lefordítása ==<br />
<br />
Mindenkitől azt kérjük, hogy először próbálja meg saját maga lefordítani az alkalmazását. Ha ez valamilyen oknál fogva mégsem sikerülne, akkor következő lépésként a magyarországi szuperszámítógép felhasználóktól érdemes kérdezni, mert nagy esély van rá hogy mások is belefutottak ugyanabba a problémába. Ezen a címen lehet őket elérni: <code>hpc-forum kukac listserv.niif.hu</code>. Feliratkozni [https://listserv.niif.hu/mailman/listinfo/hpc-forum itt lehet erre] a levelezőlistára. Az archívumban is érdemes utánanézni a kérdésnek. Az NIIF HPC support csak igen korlátozottan képes egyéni fordítási kérésekkel foglalkozni, de problémával felkeresheti a <code>hpc-support kukac niif.hu</code> címet. Az utóbbi esetben pár napos türelmüket mindenképp kérjük válaszunkig.<br />
<br />
== SLURM ütemező használata ==<br />
A szuperszámítógépen CPU óra (gépidő) alapú ütemezés működik. A felhasználóhoz tartozó Slurm projektek (Account) állapotáról a következő parancs ad információt:<br />
<pre><br />
sbalance<br />
</pre><br />
A második oszlopban (Usage) az egyes felhasználók elhasznált gépideje, a negyedik oszlopban pedig a számla összesített gépideje látható. Az utolsó két oszlop a maximális (Account Limit) és a még elérhető (Available) gépidőről ad tájékoztatást.<br />
<pre><br />
Scheduler Account Balance<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
User Usage | Account Usage | Account Limit Available (CPU hrs)<br />
---------- ----------- + ---------------- ----------- + ------------- -----------<br />
bob * 7 | foobar 7 | 1,000 993<br />
alice 0 | foobar 7 | 1,000 993<br />
</pre><br />
<br />
=== A gépidő becslése ===<br />
Nagyüzemi (production) futtatások előtt gépidőbecslést érdemes végezni. Ehhez a következő parancs használható:<br />
<pre><br />
sestimate -N NODES -t WALLTIME<br />
</pre><br />
ahol a <code>NODES</code> a lefoglalni kívánt node-ok száma, a <code>WALLTIME</code> pedig a futás maximális ideje.<br />
<br />
'''Fontos, hogy a lefoglalni kívánt gépidőt a lehető legpontosabban adjuk meg, mivel az ütemező ez alapján is rangsorolja a futtatásra váró feladatokat. Általában igaz, hogy a rövidebb job hamarabb sorra kerül. Érdemes minden futás idejét utólag az <code>sacct</code> paranccsal is ellenőrizni.'''<br />
<br />
=== Állapotinformációk ===<br />
Az ütemezőben lévő jobokról az <code>squeue</code>, a klaszter általános állapotáról az <code>sinfo</code> parancs ad tájékoztatást. Minden beküldött jobhoz egy egyedi azonosítószám (JOBID) rendelődik. Ennek ismeretében további információkat kérhetünk. Feladott vagy már futó job jellemzői:<br />
<pre><br />
scontrol show job JOBID<br />
</pre><br />
<br />
Minden job egy ún. számlázási adatbázisba (accounting) is bekerül. Ebből az adatbázisból visszakereshetők a lefuttatott feladatok jellemzői és erőforrás-felhasználás statisztikái. A részletes statisztikát a következő paranccsal tudjuk megnézni:<br />
<pre><br />
sacct -l -j JOBID<br />
</pre><br />
<br />
A felhasznált memóriáról a következő parancs ad tájékoztatást:<br />
<pre><br />
smemory JOBID<br />
</pre><br />
<br />
A lemezhasználatról pedig a<br />
<pre><br />
sdisk JOBID<br />
</pre><br />
<br />
==== Slurm figyelmeztető üzenetek ====<br />
<pre><br />
Resources/AssociationResourceLimit - Erőforrásra vár<br />
AssociationJobLimit/QOSJobLimit - Nincs elég CPU idő vagy a maximális CPU szám le van foglalva<br />
Piority - Alacsony prioritás miatt várakozik<br />
</pre><br />
Az utóbbi esetben, csökkenteni kell a job által lefoglalni kívánt időt. Egy adott projekt részére maximálisan 512 CPU-n futhatnak jobok egy adott időben.<br />
<br />
==== Licenszek ellenőrzése ====<br />
Az elérhető és éppen használt licenszekről a következő parancs ad információt:<br />
<pre><br />
slicenses<br />
</pre><br />
<br />
==== Karbantartás ellenőrzése ====<br />
A karbantartási időablakban az ütemező nem indít új jobokat, de beküldeni lehet. A karbantartások időpontjairól a következő parancs ad tájékoztatást:<br />
<pre><br />
sreservations<br />
</pre><br />
<br />
==== Összesített felhasználás ====<br />
Egy hónapra visszamenőleg az elfogyasztott CPU perceket a következő paranccsal kérhetjük le:<br />
<pre><br />
susage<br />
</pre><br />
<br />
==== Teljes fogyasztás ====<br />
Ha szeretnénk tájékozódni arról, hogy egy bizony idő óta mennyi a CPU idő felhasználásunk akkor azt ezzel paranccsal tudjuk lekérdezni:<br />
<pre><br />
sreport -t Hours Cluster AccountUtilizationByUser Accounts=ACCOUNT Start=2015-01-01<br />
</pre><br />
<br />
=== Feladatok futtatása ===<br />
Alkalmazások futtatása a szupergépeken kötegelt (batch) üzemmódban lehetséges. Ez azt jelenti, hogy minden futtatáshoz egy job szkriptet kell elkészíteni, amely tartalmazza az igényelt erőforrások leírását és a futtatáshoz szükséges parancsokat. Az ütemező paramétereit (erőforrás igények) a <code>#SBATCH</code> direktívával kell megadni.<br />
<br />
==== Kötelező paraméterek ====<br />
A következő paramétereket minden esetben meg kell adni:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=NAME<br />
#SBATCH --time=TIME<br />
</pre><br />
ahol az <code>ACCOUNT</code> a terhelendő számla neve (elérhető számláinkről az <code>sbalance</code> parancs ad felvilágosítást), a <code>NAME</code> a job rövid neve, a <code>TIME</code> pedig a maximális walltime idő (<code>DD-HH:MM:SS</code>). A következő időformátumok használhatók: <br />
"minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" és "days-hours:minutes:seconds".<br />
<br />
==== GPU-k lefoglalása ====<br />
A GPU-k lefoglalása a következő direktívával törénik:<br />
<pre><br />
#SBATCH --gres=gpu:N<br />
</pre><br />
Az <code>N</code> a GPU-k/node számát adja meg, ami 1, 2 és 3 lehet maximum.<br />
<br />
==== Interaktív használat ====<br />
Rövid interaktív feladatokat az 'srun' paranccsal tudunk beküldeni, pl.:<br />
<pre><br />
srun -l -n 1 -t TIME --gres=gpu:1 -A ACCOUNT APP<br />
</pre><br />
<br />
==== Batch job-ok indítása ====<br />
A jobok feladását a következő parancs végzi:<br />
<pre><br />
sbatch slurm.sh<br />
</pre><br />
<br />
Sikeres feladás esetén a következő kimenetet kapjuk:<br />
<pre><br />
Submitted batch job JOBID<br />
</pre><br />
ahol a <code>JOBID</code> a feladat egyedi azonosítószáma.<br />
<br />
A feladat leállítását a következő parancs végzi:<br />
<pre><br />
scancel JOBID<br />
</pre><br />
<br />
==== Nem újrainduló jobok ====<br />
Nem újrainduló jobokhoz a következő direktívát kell használni:<br />
<pre><br />
#SBATCH --no-requeue<br />
</pre><br />
<br />
==== Feladat sorok ====<br />
A szupergépen két, egymást nem átfedő, sor (partíció) áll rendelkezésre, a <code>prod-gpu-k40</code> sor és a <code>prod-gpu-k20</code> sor. Mind a kettő éles számolásokra való, az első olyan CN gépeket tartalmaz amikben Nvidia K40x GPU-k, a másodikban pedig Nvidia K20x GPU-k vannak. Az alapértelmezett sor a <code> prod-gpu-k20</code>. A prod-gpu-k40 partíciót a következő direktívával lehet kiválasztani:<br />
<pre><br />
#SBATCH --partition=prod-gpu-k40<br />
</pre><br />
<br />
==== A szolgáltatás minősége (QOS) ====<br />
A szolgáltatást alapértelmezett minősége <code>normal</code>, azaz nem megszakítható a futás.<br />
<br />
===== Magas prioritás =====<br />
A magas prioritású jobok maximum 24 óráig futhatnak, és kétszer gyorsabb időelszámolással rendelkeznek, cserébe az ütemező előreveszi ezeket a feladatokat.<br />
<pre><br />
#SBATCH --qos=fast<br />
</pre><br />
<br />
===== Alacsony prioritás =====<br />
Lehetőség van alacsony prioritású jobok feladására is. Az ilyen feladatokat bármilyen normál prioritású job bármikor megszakíthatja, cserébe az elhasznált gépidő fele számlázódik csak. A megszakított jobok automatikusan újraütemeződnek. Fontos, hogy olyan feladatokat indítsunk alacsony prioritással, amelyek kibírják a véletlenszerű megszakításokat, rendszeresen elmentik az állapotukat (checkpoint) és ebből gyorsan újra tudnak indulni.<br />
<pre><br />
#SBATCH --qos=lowpri<br />
</pre><br />
<br />
==== Memória foglalás ====<br />
Alapértelmezetten 1 CPU core-hoz 1000 MB memória van rendelve, ennél többet a következő direktívával igényelhetünk:<br />
<pre><br />
#SBATCH --mem-per-cpu=MEMORY<br />
</pre><br />
ahol <code>MEMORY</code> MB egységben van megadva. A maximális memória/core 7800 MB lehet.<br />
<br />
==== Email értesítés ====<br />
Levél küldése job állapotának változásakor (elindulás,leállás,hiba):<br />
<pre><br />
#SBATCH --mail-type=ALL<br />
#SBATCH --mail-user=EMAIL<br />
</pre><br />
ahol az <code>EMAIL</code> az értesítendő emial cím.<br />
<br />
==== Tömbfeladatok (arrayjob) ====<br />
Tömbfeladatokra akkor van szükségünk, egy szálon futó (soros) alkalmazást szeretnénk egyszerre sok példányban (más-más adatokkal) futtatni. A példányok számára az ütemező a <code>SLURM_ARRAY_TASK_ID</code> környezeti változóban tárolja az egyedi azonosítót. Ennek lekérdezésével lehet az arrayjob szálait elkülöníteni. A szálak kimenetei a <code>slurm-SLURM_ARRAY_JOB_ID-SLURM_ARRAY_TASK_ID.out</code> fájlokba íródnak. Az ütemező a feltöltést szoros pakolás szerint végzi. Ebben az esetben is érdemes a processzorszám többszörösének választani a szálak számát. [http://slurm.schedmd.com/job_array.html Bővebb ismertető]<br />
<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=array<br />
#SBATCH --time=24:00:00<br />
#SBATCH --array=1-96<br />
srun envtest.sh<br />
</pre><br />
<br />
==== OpenMPI feladatok ====<br />
MPI feladatok esetén meg kell adnunk az egy node-on elinduló MPI processzek számát is (<code>#SBATCH --ntasks-per-node=</code>). A leggyakoribb esetben ez az egy node-ban található CPU core-ok száma. A párhuzamos programot az <code>mpirun</code> paranccsal kell indítani.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A ACCOUNT<br />
#SBATCH --job-name=mpi<br />
#SBATCH -N 2<br />
#SBATCH --ntasks-per-node=8<br />
#SBATCH --time=12:00:00<br />
mpirun --report-pid ${TMPDIR}/mpirun.pid PROGRAM<br />
</pre><br />
<br />
OpenMPI FAQ: http://www.open-mpi.org/faq<br />
<br />
==== OpenMP (OMP) feladatok ====<br />
OpenMP párhuzamos alkalmazásokhoz maximum 1 node-ot lehet lefoglalni. Az OMP szálák számát az <code>OMP_NUM_THREADS</code> környezeti változóval kell megadni. A változót vagy az alkamazás elé kell írni (ld. példa), vagy exportálni kell az indító parancs előtt:<br />
<code><br />
export OMP_NUM_THREADS=8<br />
</code><br />
<br />
A következő példában egy taskhoz 8 CPU core-t rendeltunk, a 8 CPU core-nak egy node-on kell lennie. A CPU core-ok számát a <code><br />
SLURM_CPUS_PER_TASK</code> változó tartalmazza, és ez állítja be az OMP szálak számát is.<br />
<br />
Alice felhasználó a foobar számla terhére, maximum 6 órára indít el egy 8 szálas OMP alkalmazást.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=omp<br />
#SBATCH --time=06:00:00<br />
#SBATCH --ntasks=1<br />
#SBATCH --cpus-per-task=8<br />
OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK ./a.out<br />
</pre><br />
<br />
==== Hibrid MPI-OMP feladatok ====<br />
Hibrid MPI-OMP módról akkor beszélünk, ha a párhuzamos alkalmazás MPI-t és OMP-t is használ. Érdemes tudni, hogy az Intel MKL-el linkelt programok MKL hívásai OpenMP képesek. Általában a következő elosztás javasolt: az MPI processzek száma 1-től az egy node-ban található CPU foglalatok száma, az OMP szálak ennek megfelelően az egy node-ban található összes CPU core szám vagy annak fele, negyede (értelem szerűen). A jobszkipthez a fenti két mód paramétereit kombinálni kell.<br />
<br />
A következő példában 2 node-ot, és node-onként 1-1 taskot indítunk taskonként 10 szállal. Alice felhasználó a foobar számla terhére, 8 órára, 2 node-ra küldött be egy hibrid jobot. Egy node-on egyszerre csak 1 db MPI processz fut ami node-onként 8 OMP szálat használ. A 2 gépen összesen 2 MPI proceszz és 2 x 8 OMP szál fut.<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=mpiomp<br />
#SBATCH --time=08:00:00<br />
#SBATCH -N 2<br />
#SBATCH --ntasks=2<br />
#SBATCH --ntasks-per-node=1<br />
#SBATCH --cpus-per-task=8<br />
#SBATCH -o slurm.out<br />
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK<br />
mpirun ./a.out<br />
</pre><br />
<br />
==== Maple Grid feladatok ====<br />
Maple-t az OMP feladatokhoz hasonlóan 1 node-on lehet futtatni. Használatához be kell tölteni a maple modult is. A Maple kliens-szerver üzemmódban működik ezért a Maple feladat futtatása előtt szükség van a grid szerver elindítására is (<code>${MAPLE}/toolbox/Grid/bin/startserver</code>). Ez az alkalmazás licensz köteles, amit a jobszkriptben meg kell adni (<code>#SBATCH --licenses=maplegrid:1</code>). A Maple feladat indátását a <code>${MAPLE}/toolbox/Grid/bin/joblauncher</code> paranccsal kell elvégezni.<br />
<br />
Alice felhasználó a foobar számla terhére, 6 órára indítja el a Maple Grid alkalmazást:<br />
<pre><br />
#!/bin/bash<br />
#SBATCH -A foobar<br />
#SBATCH --job-name=maple<br />
#SBATCH -N 1<br />
#SBATCH --ntasks-per-node=16<br />
#SBATCH --time=06:00:00<br />
#SBATCH -o slurm.out<br />
#SBATCH --licenses=maplegrid:1<br />
<br />
module load maple<br />
<br />
${MAPLE}/toolbox/Grid/bin/startserver<br />
${MAPLE}/toolbox/Grid/bin/joblauncher ${MAPLE}/toolbox/Grid/samples/Simple.mpl<br />
</pre><br />
<br />
<br />
[[Category: HPC]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=HPC&diff=4864HPC2021-06-01T13:43:29Z<p>Itamas(AT)niif.hu: /* A KIFÜ HPC-k dokumentációja */</p>
<hr />
<div><br />
[[HPC_en|KIFÜ HPC in ENGLISH]]<br />
<br />
== A szuperszámítógépekről ==<br />
A Kormányzati Informatikai Fejlesztési Ügynökség (KIFÜ) szuperszámítógép szolgáltatása tudományos számítási feladatok futtatására, valamint tudományos célú adattárolásra szolgál. A jelenleg integrált szuperszámítógép-rendszer komponensei négy helyszínen találhatók:<br />
<br />
* Debreceni Egyetem<br />
* KIFÜ központ<br />
* Pécsi Tudományegyetem<br />
* Szegedi Tudományegyetem<br />
* Miskolci Egyetem<br />
<br />
<br />
[[Fájl:Orszagos-attekinto_hpc.png|center|700px|alt=A KIFÜ szuperszámítógépei térképen]]<br />
<br />
<br />
A tudományos számítási feladatok különböző típusainak minél szélesebb körű lefedettsége érdekében az egyes helyszíneken különböző felépítésű gépek találhatók: két helyszínen ccNUMA, három helyszínen pedig "fat-node" fürtözött megoldás. Az alrendszereket a KIFÜ nagy sávszélességű, alacsony késleltetésű HBONE+ adathálózatán keresztül, ARC grid köztesréteg, valamint harmonizált felhasználói azonosítás segítségével integráljuk egységes elvek mentén elérhető számítási egységgé. Az erőforrás jelenleg Magyarország legnagyobb tudományos számítás céljára felhasználható erőforrása, amely összesen 50 billió lebegőpontos művelet elvégzését teszi lehetővé másodpercenként.<br />
A szuperszámítógép-rendszert a KIFÜ üzemelteti és fejleszti. A rendszerhez regisztrációt követően, minden olyan személy vagy kutatócsoport hozzáférhet, amely tagintézményi szerződéses kapcsolatban áll a KIFÜ-vel.<br />
A gépen megtalálhatók és futtathatók a legkorszerűbb fejlesztőeszközök és tudományos számításra szolgáló alkalmazások, valamint, az NIIF Program adatközpontban elhelyezett alrendszer kivételével, valamennyi alrendszer kiegészül a számítási feladatok eredményeit megjelenítő vizualizációs eszközökkel.<br />
<br />
==== Szuperszámítógépeink összehasonlítása ====<br />
<br />
{| class="wikitable" border="1" <br />
|- <br />
| Helyszín<br />
| Budapest<br />
| [[Budapest2_klaszter|Budapest2]]<br />
| Szeged<br />
| Debrecen<br />
| [[Debrecen2_GPU_klaszter|Debrecen2-GPU (Leo)]] <br />
| [[Debrecen2_Phi_klaszter|Debrecen3-Phi (Apollo) ]] <br />
| Pécs<br />
<br />
| [[Miskolc_UV_2000|Miskolc]]<br />
|- <br />
| Típus<br />
| HP CP4000SL<br />
| HP SL250s <br />
| HP CP4000BL<br />
| SGI ICE8400EX<br />
| HP SL250s<br />
<br />
| HP Apollo 8000<br />
| SGI UV 1000<br />
<br />
| SGI UV 2000<br />
|- <br />
| CPU-k / node <br />
| 2<br />
| 2 <br />
| 4 <br />
| 2<br />
| 2<br />
<br />
| 2 <br />
| 192<br />
<br />
| 44 <br />
|- <br />
| Core-ok / CPU <br />
| 12<br />
| 10<br />
| 12<br />
| 6<br />
| 8<br />
| 12<br />
| 6<br />
| 8<br />
|- <br />
| Memória / node <br />
| 66 GB<br />
| 63 GB <br />
| 132 GB<br />
| 47 GB<br />
| 125 GB<br />
| 125 GB<br />
| 6 TB<br />
| 1.4 TB<br />
|- <br />
| Memória / core <br />
| 2.6 GB<br />
| 3 GB <br />
| 2.6 GB<br />
| 2.6 GB<br />
| 7 GB<br />
| 5 GB<br />
| 5 GB<br />
| 3.75 GB<br />
|- <br />
| CPU <br />
| AMD Opteron 6174 @ 2.2GHz<br />
| Intel Xeon E5-2680 v2 @ 2.80GHz <br />
| AMD Opteron 6174 @ 2.2GHz <br />
| Intel Xeon X5680 @ 3.33 GHz<br />
| Intel Xeon E5-2650 v2 @ 2.60GHz <br />
| Intel Xeon E5-2670 v3 @ 2.30GHz<br />
| Intel Xeon X7542 @ 2.66 GHz<br />
| Intel Xeon E5-4627 v2 @ 3.33 GHz<br />
|- <br />
| GPU <br />
| -<br />
| - <br />
| 2 * 6 Nvidia M2070<br />
| -<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x <br />
| -<br />
| -<br />
| -<br />
|-<br />
| Intel Xeon Phi (KNC)<br />
| -<br />
| [[Intel_Xeon_Phi |14 * 2 * Intel(R) Xeon Phi(TM) MIC SE10/7120]]<br />
| -<br />
| - <br />
| -<br />
| [[Intel_Xeon_Phi |45 * 2 * Intel(R) Xeon Phi(TM) MIC SE10/7120]]<br />
| -<br />
| -<br />
|- <br />
| Linpack teljesítmény (Rmax)<br />
| 5 Tflops<br />
| 27 Tflops<br />
| 20 Tflops<br />
| 18 Tlops<br />
| 254 Tflops<br />
| ~106 Tflops<br />
| 10 Tflops<br />
| 8 Tflops<br />
|- <br />
| Compute node-ok száma <br />
| 32<br />
| 14<br />
| 50<br />
| 128<br />
| 84<br />
| 45<br />
| 1<br />
| 1<br />
|- <br />
| Dedikált storage <br />
| 50 TB<br />
| 500 TB<br />
| 250 TB<br />
| 500 TB<br />
| 585 TB (Phi-vel közös)<br />
| 585 TB (GPU-val közös)<br />
| 500 TB<br />
| 240 TB<br />
|- <br />
| Interconnect <br />
| IB QDR<br />
| IB NB FDR<br />
| IB QDR<br />
| IB QDR<br />
| IB NB FDR<br />
| IB NB FDR<br />
| Numalink 5<br />
| Numalink 6<br />
|- <br />
| Scheduler <br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM] <br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM] <br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM] <br />
| [http://slurm.schedmd.com/ SLURM] <br />
|- <br />
| MPI <br />
| OpenMPI (ompi)<br />
| IntelMPI (impi) <br />
| OpenMPI (ompi) <br />
| SGI MPT (mpt)<br />
| OpenMPI (ompi)<br />
<br />
| OpenMPI (ompi)<br />
| SGI MPT (mpt)<br />
<br />
| SGI MPT (mpt)<br />
|}<br />
<br />
== További információ ==<br />
=== A KIFÜ HPC-k dokumentációja ===<br />
* [[Budapest2_klaszter|Budapest2]]<br />
* [[Miskolc_UV_2000|Miskolc]]<br />
* [[Debrecen2_GPU_klaszter|Debrecen2 GPU klaszter (LEO)]]<br />
* [[Debrecen2_Phi_klaszter|Debrecen3 Phi klaszter (Apollo)]]<br />
<br />
=== Kapcsolódó oldalak===<br />
* [https://hpc.kifu.hu/wp-content/uploads/2020/09/KIFU_HPC_Portal_manual.pdf HPC portál: projekt és témaszámok kezelése, leállások, statisztikák]<br />
* [[NIIF_szuperszámítógépek_használata|A szuperszámítógépek használata]]<br />
* [[HPC-GYIK|GY.I.K]]<br />
* [[HPC_software|Telepített HPC Szoftverek ]]<br />
* [https://wiki.niif.hu/index.php?title=HPC_felhasznaloi_alapelvek#Az_NIIF_Szupersz.C3.A1m.C3.ADt.C3.B3g.C3.A9p_Felhaszn.C3.A1l.C3.B3i_Szab.C3.A1lyzata Az NIIF Szuperszámítógép Felhasználói Szabályzata]<br />
* [https://wiki.niif.hu/index.php?title=HPC_konyvtarak#El.C3.A9rhet.C5.91_f.C3.A1jlrendszerek_a_HPC_k.C3.B6zpontokban Elérhető fájlrendszerek a HPC központokban]<br />
* [https://wiki.niif.hu/index.php?title=HPC_vizualizacio HPC vizualizációs infrastruktúra elérése]<br />
<br />
=== Saját HPC oktató tartalmaink ===<br />
<br />
* [http://videotorium.hu/hu/channels/details/1895 HPC oktató videók]<br />
* [http://conference.niif.hu/event/3/session/23/contribution/25 HPC tutorial (NWS 2015) anyagai ]<br />
* [http://videotorium.hu/hu/events/details/1864,GPU_programozas_workshop GPU programozás workshop (videotorium)]<br />
* [[Intel_Xeon_Phi |Xeon Phi kártyák használata és alkamazások optimalizációja]]<br />
<br />
=== Angol nyelvű tréningek, videók, anyagok ===<br />
* [http://www.training.prace-ri.eu PRACE training portal]<br />
* [[PRACE_User_Support|PRACE User Support]]<br />
* [http://colfaxresearch.com/how-16-04/ Xeon Phi Training]<br />
[[Kategória: HPC]]<br />
[[Kategória: Összefoglaló lapok]]</div>Itamas(AT)niif.huhttps://wiki.niif.hu/index.php?title=HPC&diff=4863HPC2021-06-01T13:42:48Z<p>Itamas(AT)niif.hu: /* A KIFÜ HPC-k dokumentációja */</p>
<hr />
<div><br />
[[HPC_en|KIFÜ HPC in ENGLISH]]<br />
<br />
== A szuperszámítógépekről ==<br />
A Kormányzati Informatikai Fejlesztési Ügynökség (KIFÜ) szuperszámítógép szolgáltatása tudományos számítási feladatok futtatására, valamint tudományos célú adattárolásra szolgál. A jelenleg integrált szuperszámítógép-rendszer komponensei négy helyszínen találhatók:<br />
<br />
* Debreceni Egyetem<br />
* KIFÜ központ<br />
* Pécsi Tudományegyetem<br />
* Szegedi Tudományegyetem<br />
* Miskolci Egyetem<br />
<br />
<br />
[[Fájl:Orszagos-attekinto_hpc.png|center|700px|alt=A KIFÜ szuperszámítógépei térképen]]<br />
<br />
<br />
A tudományos számítási feladatok különböző típusainak minél szélesebb körű lefedettsége érdekében az egyes helyszíneken különböző felépítésű gépek találhatók: két helyszínen ccNUMA, három helyszínen pedig "fat-node" fürtözött megoldás. Az alrendszereket a KIFÜ nagy sávszélességű, alacsony késleltetésű HBONE+ adathálózatán keresztül, ARC grid köztesréteg, valamint harmonizált felhasználói azonosítás segítségével integráljuk egységes elvek mentén elérhető számítási egységgé. Az erőforrás jelenleg Magyarország legnagyobb tudományos számítás céljára felhasználható erőforrása, amely összesen 50 billió lebegőpontos művelet elvégzését teszi lehetővé másodpercenként.<br />
A szuperszámítógép-rendszert a KIFÜ üzemelteti és fejleszti. A rendszerhez regisztrációt követően, minden olyan személy vagy kutatócsoport hozzáférhet, amely tagintézményi szerződéses kapcsolatban áll a KIFÜ-vel.<br />
A gépen megtalálhatók és futtathatók a legkorszerűbb fejlesztőeszközök és tudományos számításra szolgáló alkalmazások, valamint, az NIIF Program adatközpontban elhelyezett alrendszer kivételével, valamennyi alrendszer kiegészül a számítási feladatok eredményeit megjelenítő vizualizációs eszközökkel.<br />
<br />
==== Szuperszámítógépeink összehasonlítása ====<br />
<br />
{| class="wikitable" border="1" <br />
|- <br />
| Helyszín<br />
| Budapest<br />
| [[Budapest2_klaszter|Budapest2]]<br />
| Szeged<br />
| Debrecen<br />
| [[Debrecen2_GPU_klaszter|Debrecen2-GPU (Leo)]] <br />
| [[Debrecen2_Phi_klaszter|Debrecen3-Phi (Apollo) ]] <br />
| Pécs<br />
<br />
| [[Miskolc_UV_2000|Miskolc]]<br />
|- <br />
| Típus<br />
| HP CP4000SL<br />
| HP SL250s <br />
| HP CP4000BL<br />
| SGI ICE8400EX<br />
| HP SL250s<br />
<br />
| HP Apollo 8000<br />
| SGI UV 1000<br />
<br />
| SGI UV 2000<br />
|- <br />
| CPU-k / node <br />
| 2<br />
| 2 <br />
| 4 <br />
| 2<br />
| 2<br />
<br />
| 2 <br />
| 192<br />
<br />
| 44 <br />
|- <br />
| Core-ok / CPU <br />
| 12<br />
| 10<br />
| 12<br />
| 6<br />
| 8<br />
| 12<br />
| 6<br />
| 8<br />
|- <br />
| Memória / node <br />
| 66 GB<br />
| 63 GB <br />
| 132 GB<br />
| 47 GB<br />
| 125 GB<br />
| 125 GB<br />
| 6 TB<br />
| 1.4 TB<br />
|- <br />
| Memória / core <br />
| 2.6 GB<br />
| 3 GB <br />
| 2.6 GB<br />
| 2.6 GB<br />
| 7 GB<br />
| 5 GB<br />
| 5 GB<br />
| 3.75 GB<br />
|- <br />
| CPU <br />
| AMD Opteron 6174 @ 2.2GHz<br />
| Intel Xeon E5-2680 v2 @ 2.80GHz <br />
| AMD Opteron 6174 @ 2.2GHz <br />
| Intel Xeon X5680 @ 3.33 GHz<br />
| Intel Xeon E5-2650 v2 @ 2.60GHz <br />
| Intel Xeon E5-2670 v3 @ 2.30GHz<br />
| Intel Xeon X7542 @ 2.66 GHz<br />
| Intel Xeon E5-4627 v2 @ 3.33 GHz<br />
|- <br />
| GPU <br />
| -<br />
| - <br />
| 2 * 6 Nvidia M2070<br />
| -<br />
| 68 * 3 Nvidia K20x + 16 * 3 Nvidia K40x <br />
| -<br />
| -<br />
| -<br />
|-<br />
| Intel Xeon Phi (KNC)<br />
| -<br />
| [[Intel_Xeon_Phi |14 * 2 * Intel(R) Xeon Phi(TM) MIC SE10/7120]]<br />
| -<br />
| - <br />
| -<br />
| [[Intel_Xeon_Phi |45 * 2 * Intel(R) Xeon Phi(TM) MIC SE10/7120]]<br />
| -<br />
| -<br />
|- <br />
| Linpack teljesítmény (Rmax)<br />
| 5 Tflops<br />
| 27 Tflops<br />
| 20 Tflops<br />
| 18 Tlops<br />
| 254 Tflops<br />
| ~106 Tflops<br />
| 10 Tflops<br />
| 8 Tflops<br />
|- <br />
| Compute node-ok száma <br />
| 32<br />
| 14<br />
| 50<br />
| 128<br />
| 84<br />
| 45<br />
| 1<br />
| 1<br />
|- <br />
| Dedikált storage <br />
| 50 TB<br />
| 500 TB<br />
| 250 TB<br />
| 500 TB<br />
| 585 TB (Phi-vel közös)<br />
| 585 TB (GPU-val közös)<br />
| 500 TB<br />
| 240 TB<br />
|- <br />
| Interconnect <br />
| IB QDR<br />
| IB NB FDR<br />
| IB QDR<br />
| IB QDR<br />
| IB NB FDR<br />
| IB NB FDR<br />
| Numalink 5<br />
| Numalink 6<br />
|- <br />
| Scheduler <br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM] <br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM] <br />
| [http://slurm.schedmd.com/ SLURM]<br />
| [http://slurm.schedmd.com/ SLURM] <br />
| [http://slurm.schedmd.com/ SLURM] <br />
|- <br />
| MPI <br />
| OpenMPI (ompi)<br />
| IntelMPI (impi) <br />
| OpenMPI (ompi) <br />
| SGI MPT (mpt)<br />
| OpenMPI (ompi)<br />
<br />
| OpenMPI (ompi)<br />
| SGI MPT (mpt)<br />
<br />
| SGI MPT (mpt)<br />
|}<br />
<br />
== További információ ==<br />
=== A KIFÜ HPC-k dokumentációja ===<br />
* [[Budapest2_klaszter|Budapest2]]<br />
* [[Miskolc_UV_2000|Miskolc]]<br />
* [[Debrecen2_GPU_klaszter|Debrecen2 GPU klaszter (LEO)]]<br />
* [[Debrecen2_GPU_klaszter|Debrecen2 GPU klaszter (LEO) en]]<br />
* [[Debrecen2_Phi_klaszter|Debrecen3 Phi klaszter (Apollo)]]<br />
<br />
=== Kapcsolódó oldalak===<br />
* [https://hpc.kifu.hu/wp-content/uploads/2020/09/KIFU_HPC_Portal_manual.pdf HPC portál: projekt és témaszámok kezelése, leállások, statisztikák]<br />
* [[NIIF_szuperszámítógépek_használata|A szuperszámítógépek használata]]<br />
* [[HPC-GYIK|GY.I.K]]<br />
* [[HPC_software|Telepített HPC Szoftverek ]]<br />
* [https://wiki.niif.hu/index.php?title=HPC_felhasznaloi_alapelvek#Az_NIIF_Szupersz.C3.A1m.C3.ADt.C3.B3g.C3.A9p_Felhaszn.C3.A1l.C3.B3i_Szab.C3.A1lyzata Az NIIF Szuperszámítógép Felhasználói Szabályzata]<br />
* [https://wiki.niif.hu/index.php?title=HPC_konyvtarak#El.C3.A9rhet.C5.91_f.C3.A1jlrendszerek_a_HPC_k.C3.B6zpontokban Elérhető fájlrendszerek a HPC központokban]<br />
* [https://wiki.niif.hu/index.php?title=HPC_vizualizacio HPC vizualizációs infrastruktúra elérése]<br />
<br />
=== Saját HPC oktató tartalmaink ===<br />
<br />
* [http://videotorium.hu/hu/channels/details/1895 HPC oktató videók]<br />
* [http://conference.niif.hu/event/3/session/23/contribution/25 HPC tutorial (NWS 2015) anyagai ]<br />
* [http://videotorium.hu/hu/events/details/1864,GPU_programozas_workshop GPU programozás workshop (videotorium)]<br />
* [[Intel_Xeon_Phi |Xeon Phi kártyák használata és alkamazások optimalizációja]]<br />
<br />
=== Angol nyelvű tréningek, videók, anyagok ===<br />
* [http://www.training.prace-ri.eu PRACE training portal]<br />
* [[PRACE_User_Support|PRACE User Support]]<br />
* [http://colfaxresearch.com/how-16-04/ Xeon Phi Training]<br />
[[Kategória: HPC]]<br />
[[Kategória: Összefoglaló lapok]]</div>Itamas(AT)niif.hu