HPC FAQ - Batch Jobs
Supercomputer documentation is always a work in progress! Please email questions, corrections, or suggestions to the HPC support team at email@example.com as usual. Thanks!
Frequently Asked Questions
- 1. What is a batch job?
- 2. How do I submit a batch job?
- 3. Can I run jobs on the login node?
- 4. What types of jobs must be run in batch?
- 5. Which batch queue should I use?
- 6. How do I check the status of my job?
- 7. Why has my job been Pending for so long?
- 8. How many jobs I can run at once?
- 9. Can I kill a batch job?
- 10. How do I checkpoint or restart a job?
- 11. What nodes is my job running on?
- 12. What are MOAB and SLURM?
- 13. Where can I get more information on MOAB and SLURM?
- 14. How do I do a timing run?
- 15. Can I run a job using multiple-level parallelism?
A batch job is a program or series of programs run on the cluster without manual intervention. Usually a script is used to supply the input data and parameters necessary for the job to run. The script is submitted to the batch schedule will be run when the resources are available. You don't need to specify which node or nodes on which the job will run. The batch system will select a node or nodes appropriately for the job. The batch system also allows for accounting and tracking of jobs.
A batch job is submitted to a batch queue using the sbatch command. For details on the commands for batch execution, see the examples (including batch job scripts) in /share/cluster/examples/dlx on the cluster.
The login nodes are intended for editing files, compiling code, running short tests, and similar activities. Please your jobs as batch jobs whenever you can. As a special case, interactive jobs up to 120 cpu-minutes may be run in on the login node. If you your job exceeds this limit it may be canceled. The login nodes are shared by all of the cluster users, so any job that does intensive computing, produces heavy I/O, or spawns large number of processes will adversely affect the whole node. Please respect your fellow users!
The batch system (MOAB and SLURM) must be used whenever possible. Non-batch jobs on any node except the login node will be killed, unless special permission has been obtained in advance. Send email to the Help HPC list at firstname.lastname@example.org to make arrangements.
Use the sinfo command when logged in onto the cluster to show the names of the queues and other information about them. Use sinfo --help or man sinfo to get more information about the command.
Note that the queues are defined with particular types of jobs in mind, to make sure jobs have the appropriate resources and don't interfere with one another. Please run your job in the appropriate queue! If you have questions, or you need to do something you can't do under the queuing system, please send email to the Help HPC list at email@example.com describing your problem or question.
Use the squeue command to show the status of batch jobs. The default is to show all pending and running jobs. Use squeue -u myloginid to see your own jobs. Use squeue --help or man squeue to get more information about the command.
Use checkjob -v to see why your job is pending.
There may be pending jobs ahead of yours. You can see the pending jobs in your queue with the squeue -t pending command.
The batch system manages the flow of jobs and allows jobs to run when the system resources are available to do so in a fair and orderly fashion. When the system is less busy you may submit jobs that are able to schedule and execute quickly. When the system is busier, your jobs could wait for longer periods. This interval will vary based on many factors.
If your job has been pending for longer than you expect, please try to find out why before you submit a trouble report. There are commands to get this information. More often than not; your job is just waiting on the resources you requested.
Depending on factors involved, such as your job's anticipated run time, you might be able to pick a different queue for quicker turn-around. See ??? for more queue info.
Your job will be scheduled automatically when the requested resources are available.
Job queues and node allocations have been established to assure equitable distribution and access to the entire complex.
See the README First page for specific information.
Use scancel job_id to terminate a batch job. The time required for the job to actually terminate will vary some, depending on how busy the system and the network are, and on how many parallel processes the job is running. You can use the squeue command to find the jobid (see above).
This answer needs revision.There are two webpages which go over this topic. One concerns checkpointing, the other concerns restarting checkpointed jobs. The checkpointing page is here; the restart page is here. There are several methods of checkpointing, read both pages before attempting to checkpoint or restart a job.
The squeue command shows the status of batch jobs (see above).
MOAB is the batch job scheduler that decides which jobs should run next. SLURM is the Resource Manager that allocates compute nodes upon request. Together they submit the jobs, select the most suitable hosts, and interact with the individual tasks of parallel batch jobs.
A batch job is submitted to a queue with the sbatch command. The batch system then attends to all of the details associated with running it. A batch job may not run immediately after being submitted if the resources it needs (usually compute nodes) are not available. The job will wait until it reaches the front of the queue and the resources become available. At that time the batch job will be dispatched to the most suitable host or hosts available for execution.
The vendor's documentation on MOAB is at http://www.clusterresources.com/products/mwm/docs/.
Also see the Lawrence Livermore Moab Quick-Start User Guide at https://computing.llnl.gov/jobs/moab/QuickStartGuide.pdf.
The Lawrence Livermore SLURM: A Highly Scalable Resource Manager is at https://computing.llnl.gov/linux/slurm/.
This answer needs revision.A timing run - a type of job used to determine how effectively a program or algorithm performs - needs to be run without sharing a cpu with any other job.
Otherwise, the measurements aren't valid because some cpu time will be lost in time-sharing with the other job(s) - exactly how much would be lost is unpredictable, and would vary between runs. The cluster is set up to allow timing runs on up to 32 processors; if you need a larger timing run, please send email to the Help HPC list at firstname.lastname@example.org your request.
Note: Depending on the cluster load, larger timing runs may not be possible even given several weeks prior notice.
Do not try a timing run until you are certain your program runs properly. Debug your program, algorithm, and test data using the normal batch queues before trying to do a timing run.
This answer needs revision.A multiple-level parallel job is an MPI job with sub-processes that use parallelized library routines, OpenMP, or routines using loop-parallelism; also known as mixed-mode). Consult the following links more OpenMPI information.
Link for OPENMPI documentation: http://www.open-mpi.org/doc/v1.4/.
LLNL Tutorial: https://computing.llnl.gov/tutorials/mpi/.
Another tutorial: http://www.lam-mpi.org/tutorials/.
MPI FORUM: http://www.mpi-forum.org/.