A high performance computing cluster has a complex infrastructure. All users are expected to have a basic understanding of this structure before deploying resources.
A node on INCLINE represents a computing unit comprising its own chassis, motherboard, processors, memory, etc. Some jobs may require only one node, some may require multiple nodes run together. INCLINE contains multiple different kinds of nodes as outlined below.
|Login Nodes (l001, l002)
|Compute Nodes (c001-c026)
|High Memory Nodes (b001,b002)
|GPU Nodes (g001, g002)
INCLINE has three login nodes. When you first log in, you will be on one of the first two nodes. For example,
user "hostname" is on the first login node.
It is important to remember that the login nodes are shared between all users. Therefore, you should limit your use of the login nodes to basic operations, and do not attempt to run anything in parallel on these nodes.
INCLINE has 26 compute nodes. Each compute node has two AMD EPYC 7662 CPUS, for a total of 128 cores. Hyperthreading can allow up to 256 effective threads, although only 128 MPI tasks are currently allowed.
Each compute node has 256GB of memory.
Most users will want to use compute nodes, unless their codes are specifically designed for GPUs or have special memory requirements.
INCLINE has two high memory nodes. These are identical to the compute nodes except that they have 2048 GB of memory.
You should use high memory nodes only if you specifically require high memory applications. Please note that high memory is not equivalent to "big data", which may work with large quantities of data but may keep the majority of the data on disk.
That is, you should plan to use high memory nodes if your application requires a great deal of RAM.
INCLINE has two GPU nodes. Each GPU node has two AMD 7452 CPUs, for a total of 64 cores (128 threads), and 1024 GB of RAM.
Each GPU node also has two NVIDIA A100 GPUs.
You should only plan to use the GPU nodes if your code is specifically designed for GPU hardware. Many deep learning codes (TensorFlow) have been designed for GPUs. It is recommended that you test your code on a separate GPU platform before attempting to run on incline.
There are two primary file locations available for your use. Note that both locations are mounted in a shared file system accessible across all nodes. In other words, even if you move from one node to another, the file system will appear exactly the same.
- Home directory (/home/username)
- High speed scratch file system (/mmfs1/home/username/)
Your home directory is for small files, compiled code, input files, etc. You should not store large files in your home directory, nor should you use your home directory as a location for large file IO during an HPC run. The home directory has a 100GB limit.
The high speed file system (HSFS) is specifically designed for parallel IO during HPC runs. If you are running large codes on multiple nodes with large amounts of output (for instance, a CFD simulation), you should set /mmfs1/home/username as your output location. This will significantly speed up your code, and will prevent the home file system from being bogged down. There are no current limits on how much data you store on the HSFS. However, please keep the following in mind:
- Any data that has been sitting on the HSFS for more than 30 days may be automatically deleted. (So get any important data off ASAP).
- The HSFS is shared among all users, so please use consideration when storing your files on it.
- Your account may be suspended and data deleted without warning if your HSFS usage goes out of control. Make sure you know what you are doing.