High performance computing (HPC), and in particular high performance scientific computing, is a complex and challenging field that constitutes an entire research area in its own right. Effectively using an HPC platform, such as INCLINE, to do research is an involved process. If you don't know what you are doing, you can become frustrated by spending a lot of your time and energy for no performance improvement at all, and worse, you could accidentally direct computing resources away from other users, resulting in a net loss for the user community. The purpose of this article is to help you understand the basics of HPC, in order to ensure that the university's HPC resources are used judiciously.
If you have never used an HPC resource before, you should take some time to understand everything involved in adapting your research problem to INCLINE. Below, we have put together some questions that you should work through before attempting to use INCLINE.
Can I use Linux?
INCLINE is reached through a Linux terminal. There is no graphical user interface (GUI), and so you must be comfortable working in a command line environment. Even if you are running a commercial code that has a GUI, you will not be able to use it on INCLINE. If you are not comfortable with Linux, you should familiarize yourself with the basic Linux commands.
How will my code benefit from INCLINE?
Different applications can benefit from HPC in different ways. Before you pursue HPC, you need to understand what you are trying to achieve. INCLINE is not a magic box that will make your code run faster without any effort on your part. Please consider the following questions to determine, quantitatively, how INCLINE will help your code perform.
- Job count vs job size:
- Are you running a lot of small simulations and looking to increase throughput?
- Are you running a single massive simulation that would take too long to run on a single computer?
- Memory requirements:
- How much RAM do you expect your code to use?
- How much RAM per processor do you expect your code to use?
- Processor requirements:
- How many nodes do you need?
- How many processors do you need?
- I/O requirements:
- Is your code working with very large datasets? How large and how frequently are these datasets accessed? If you are working with terabyte-sized datasets, you need to think about how this might slow down your code.
- Does your code output very large datasets? How frequently and how much data will be generated?
- Compile-time optimization
- If you are building your code yourself, and if so, do you know how to enable processor-specific optimizations?
- If you are using a pre-built code, has it been optimized for INCLINE's architecture?
- Quantitative metrics
- How many node-hours do you expect to use?
- How long to you expect your jobs to run?
Does my code run in parallel?
One key to INCLINE's power is its ability to split calculations up among multiple processors. Please look at the code provided at https://rookiehpc.com/mpi/docs/mpi_iallgather.php, and other code in that part of the website. Is your code equipped with this type of technology? If not, INCLINE might not be any more powerful than your laptop. If you are using a commercial code, ask the vendor if it is "parallel" or "MPI enabled."
If your code is compiled using MPI, then you can probably run it on INCLINE with multiple nodes. (Each node has 128 processors.) If not, it may still be able to run on INCLINE on a single node - see the next section.
Is my code threaded?
Another key to INCLINE's power is its ability to split calculations up among multiple cores within a single CPU. Please look at the code provided here https://rookiehpc.com/openmp/docs/firstprivate.php, and other code in that part of the site. Is your code equipped with this type of technology, or a similar technology? If not, and especially if your code is also not parallel, INCLINE will not be any more powerful than your laptop.
If your code is not compiled with threading or MPI, then it will not benefit from running on INCLINE's compute nodes. However, if it is GPU-enabled, you can take advantage of INCLINE's GPU nodes - see the next section.
Is my code GPU enabled?
INCLINE provides two GPU nodes, each with two NVidia A100 cards. GPUs provide an extra performance boost that can be significant. Many data-oriented applications and machine learning applications are GPU enabled. But if your code is not equipped with the special commands that send calculations to the GPU explicitly, INCLINE's GPU capability will be of no benefit.
How does my code scale?
In HPC, understanding scaling is essential. Essential scaling concepts are:
- Speedup: if I run my code on N processors, how much faster will it run? (Usually the best you can hope for is that it runs N times faster, although sometimes it can run even faster.)
- Problem scaling: what are the limits on what performance I can expect?
Even if your code is MPI enabled, multithreaded, or GPU enabled (or any combination of these), there are still limits to how well your code will perform on INCLINE. If you are using a pre-packaged or commercial code, you should consult with the developer to determine how well your code scales. Some codes will have minimal benefit scaling beyond more than one node. Some codes scale well up to thousands of processors ("massively parallel"). Some will scale up to a few cores, but exhibit reverse scaling beyond that due to communication overhead. Some are "embarrassingly parallel" and exhibit near perfect scaling for any number of cores. It is your responsibility to understand how your code scales so that you can request appropriate resources.