PBS Examples
PBS Examples
PBS Examples
Table of Contents
- 1 Submitting an interactive job to the Batch Scheduler
- 2 Hello World
- 3 Submitting a PBS Script to the Batch Scheduler
- 4 Checking on the status of a job
- 5 Determining which nodes your job is using
- 6 Viewing output and error files
- 7 Multi-process Hello World (Single machine)
- 8 Multi-node Hello World
- 9 Multi-node MPI Hello World (from C and Fortran77 Source code)
1 Submitting an interactive job to the Batch Scheduler
2 Hello World
This is an extremely simple PBS script that will spawn a single process on a single node. In this case, it first determines the hostname, then it uses the echo command to print out "Hello World from host " followed by the hostname.
2.1 Bash Hello World PBS Script
This example uses the Bash shell to print a simple "Hello World" message. Note that it specifies the shell with the `-S' option. If you do not specify a shell using the `-S' option (either inside the PBS script or as an argument to qsub), then your default shell will be used.
#PBS -lnodes=1:ppn=4 #PBS -lwalltime=1:00 ## Specify the shell to be bash #PBS -S /bin/bash # print out a hello message # indicating the host this is running on export THIS_HOST=`hostname` echo Hello World from host $THIS_HOST
2.2 Tcsh Hello World PBS Script
This example uses the tcsh shell to print a simple "Hello World" message. Note that it specifies the shell with the `-S' option. If you do not specify a shell using `-S' (either inside the PBS script or as an argument to qsub), then your default shell will be used.
## Introductory Example ## Copyright (c) 2010 The Center for Advanced Research Computing ## at The University of New Mexico #PBS -lnodes=1:ppn=4 #PBS -lwalltime=1:00 ## Specify the shell to be tcsh #PBS -S /bin/tcsh # print out a hello message # indicating the host this is running on setenv THIS_HOST `hostname` echo Hello World from host $THIS_HOST
3 Submitting a PBS Script to the Batch Scheduler
In order to run a PBS script on the cluster, we will need to submit it to the batch scheduler using the command qsub followed by the name of the script we would like to run.
In this example, we submit our Hello World PBS script to the batch scheduler using qsub. Notice that it returns the job identifier when the job is successfully submitted. You can use this job identifier to query the status of your job.
tcsh> qsub hello.pbs 64811.nano.nano.alliance.unm.edu
4 Checking on the status of a job
If you would like to check the status of your job, you can use the qstat command to do so. With the hello.pbs script, the job may run so quickly that you do not see your job in qstat. The -a option causes PBS to display more infomation about the jobs currently in the scheduler.
If you would like to see just the status of this job, you would run the following from your shell:
shell> qstat 64811.nano.nano.alliance.unm.edu
Or, the shorter version with just the numeric portion of the job identifier:
shell> qstat 64811
My username is "download" and, for this example, the job identifier is 64811.nano.nano.alliance.unm.edu.
You should note that your job can be in one of three states while it is in the scheduler: Running, Queued, or Exiting denoted by R, Q, and E respectively in the job State column (the column labelled "S").
tcsh> qstat -anano.nano.alliance.unm.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----- 64758.nano.nano.alli jruser one_long frob0001 1049 1 -- -- 160:0 R 46:27 64760.nano.nano.alli jruser one_long frob-1000 2037 1 -- -- 160:0 R 46:22 64761.nano.nano.alli jruser one_long frob-3000 9944 1 -- -- 160:0 R 46:18 64762.nano.nano.alli jruser one_long frob-6000 21219 1 -- -- 160:0 R 46:14 64763.nano.nano.alli jruser one_long frob-12000 -- 1 -- -- 160:0 Q -- 64764.nano.nano.alli jruser one_long frob-18000 -- 1 -- -- 160:0 Q -- 64765.nano.nano.alli jruser one_long frob-28000 -- 1 -- -- 160:0 Q -- 64766.nano.nano.alli jruser one_long frob-38000 -- 1 -- -- 160:0 Q -- 64770.nano.nano.alli alice defaultq abcd 32682 4 -- -- 60:00 R 28:24 64797.nano.nano.alli bill one_node blub11234 18940 1 -- -- 48:00 R 16:09 64799.nano.nano.alli fred one_node blub112345 24055 1 -- -- 48:00 R 15:25 64800.nano.nano.alli fred one_node blub112337 26151 1 -- -- 48:00 R 15:19 64801.nano.nano.alli bill defaultq hoodger 24066 4 -- -- 80:00 R 06:41 64803.nano.nano.alli george defaultq abc2 13111 2 -- -- 24:00 R 03:18 64804.nano.nano.alli george defaultq abc4 16579 4 -- -- 24:00 R 03:17 64805.nano.nano.alli george defaultq abc8 -- 8 -- -- 24:00 Q -- 64811.nano.nano.alli download one_node hello.pbs -- 1 -- -- 00:01 Q --
5 Determining which nodes your job is using
If you would like to check which nodes your job is using, you can pass the -n option to qsub. Note that if you currently have a job running on a node of the cluster you may freely log into that node in order to check on the status of your job. When your job is finished, your processes on that node will all be killed by the system.
tcsh> qstat -annano.nano.alliance.unm.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----- 64758.nano.nano.alli jruser one_long frob0001 1049 1 -- -- 160:0 R 46:27 nano34+nano34+nano34+nano34 64760.nano.nano.alli jruser one_long frob-1000 2037 1 -- -- 160:0 R 46:22 nano28+nano28+nano28+nano28 64761.nano.nano.alli jruser one_long frob-3000 9944 1 -- -- 160:0 R 46:18 nano12+nano12+nano12+nano12 64762.nano.nano.alli jruser one_long frob-6000 21219 1 -- -- 160:0 R 46:14 nano11+nano11+nano11+nano11 64763.nano.nano.alli jruser one_long frob-12000 -- 1 -- -- 160:0 Q -- -- 64764.nano.nano.alli jruser one_long frob-18000 -- 1 -- -- 160:0 Q -- -- 64765.nano.nano.alli jruser one_long frob-28000 -- 1 -- -- 160:0 Q -- -- 64766.nano.nano.alli jruser one_long frob-38000 -- 1 -- -- 160:0 Q -- -- 64770.nano.nano.alli alice defaultq abcd 32682 4 -- -- 60:00 R 28:24 nano27+nano27+nano27+nano27+nano25+nano25+nano25+nano25+nano24+nano24+nano24 +nano24+nano23+nano23+nano23+nano23 64797.nano.nano.alli fred one_node blub11234 18940 1 -- -- 48:00 R 16:09 nano20+nano20+nano20+nano20 64799.nano.nano.alli fred one_node blub12345 24055 1 -- -- 48:00 R 15:25 nano17+nano17+nano17+nano17 64800.nano.nano.alli fred one_node blub12337 26151 1 -- -- 48:00 R 15:19 nano16+nano16+nano16+nano16 64801.nano.nano.alli bill defaultq hoodger 24066 4 -- -- 80:00 R 06:41 nano26+nano26+nano26+nano26+nano22+nano22+nano22+nano22+nano19+nano19+nano19 +nano19+nano18+nano18+nano18+nano18 64803.nano.nano.alli george defaultq abc2 13111 2 -- -- 24:00 R 03:18 nano32+nano32+nano32+nano32+nano31+nano31+nano31+nano31 64804.nano.nano.alli george defaultq abc4 16579 4 -- -- 24:00 R 03:17 nano29+nano29+nano29+nano29+nano21+nano21+nano21+nano21+nano15+nano15+nano15 +nano15+nano14+nano14+nano14+nano14 64805.nano.nano.alli george defaultq abc8 -- 8 -- -- 24:00 Q -- -- 64811.nano.nano.alli download one_node hello.pbs -- 1 -- -- 00:01 Q -- --
6 Viewing output and error files
Once your job has completed, you should see two files in the directory that you submitted the job from. By default, these will be named <jobname>.pbs.oXXXXX and <jobname>.pbs.eXXXXX (where the <jobname> is replaced by the name of the PBS script X's are replaced by the numerical portion of the job identifier returned by qsub). Any output from the job sent to "standard output" will be written to the hello.pbs.oXXXXX file and any output sent to "standard error" will be written to the hello.pbs.eXXXXX file. These files are referred to as the "output file" and the "error file" respectively throughout this document.
For my Hello World job, the error file is empty and the output file contains the following:
Nano Portable Batch System Prologue Job Id: 64811.nano.nano.alliance.unm.edu Username: downloadprologue running on host: nano10 Hello World from host nano10 Nano Portable Batch System Epilogue
7 Multi-process Hello World (Single Machine)
In this example, we use the "mpirun" command to spawn the same process on each of the processors on the compute node. In this case, we spawn a shell on each of the processors available on the compute node that prints the MPI ID of the process and the number of total processes.
7.1 Bash
## Introductory Example ## Copyright (c) 2010 The Center for Advanced Research Computing ## at The University of New Mexico #PBS -lnodes=1:ppn=4 #PBS -lwalltime=1:00 ## Specify the shell to be bash #PBS -S /bin/bash # set up the PATH environment variable for the # MX (Myrinet) version of mpirun export PATH=/opt/local/mpich-mx-gnu-4.1.0/bin/:$PATH# print out a hello message from each of the processors on this host # indicating the host this is running on export THIS_HOST=`hostname` mpirun -np 4 -machinefile $PBS_NODEFILE /bin/sh \-c \ "echo Hello World from process \\\$MXMPI_ID of \\\$MXMPI_NP on host $THIS_HOST"
7.2 Tcsh
## Introductory Example ## Copyright (c) 2010 The Center for Advanced Research Computing ## at The University of New Mexico #PBS -lnodes=1:ppn=4 #PBS -lwalltime=1:00 ## Specify the shell to be tcsh #PBS -S /bin/tcsh # set up the PATH environment variable for the # MX (Myrinet) version of mpirun setenv PATH /opt/local/mpich-mx-gnu-4.1.0/bin/:$PATH# print out a hello message from each of the processors on this host # indicating the host this is running on setenv THIS_HOST `hostname` mpirun -np 4 -machinefile $PBS_NODEFILE /bin/sh \-c \ 'echo Hello World from process \$MXMPI_ID of \$MXMPI_NP on host $THIS_HOST' echo Hello World from host `hostname`
7.2.1 Output
In this job's output file, you should see something like this.
Nano Portable Batch System Prologue Job Id: 64829.nano.nano.alliance.unm.edu Username: downloadprologue running on host: nano09 Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. Hello World from process 1 of 16 on host nano09 Hello World from process 2 of 16 on host nano09 Hello World from process 3 of 16 on host nano09 Hello World from process 0 of 16 on host nano09 Hello World from process 6 of 16 on host nano08 Hello World from process 5 of 16 on host nano08 Hello World from process 4 of 16 on host nano08 Hello World from process 7 of 16 on host nano08 Hello World from process 8 of 16 on host nano06 Hello World from process 11 of 16 on host nano06 Hello World from process 10 of 16 on host nano06 Hello World from process 9 of 16 on host nano06 Hello World from process 14 of 16 on host nano05 Hello World from process 12 of 16 on host nano05 Hello World from process 15 of 16 on host nano05 Hello World from process 13 of 16 on host nano05 Nano Portable Batch System Epilogue
8 Multi-Node Hello World
In this example, we use the "mpirun" command to spawn the same process on each of the processors on the four compute nodes we've requested. In this case, we spawn a shell on each of the processors available to the job that prints the MPI ID of the process and the number of total processes.
8.1 Bash
## Introductory Example ## Copyright (c) 2010 The Center for Advanced Research Computing ## at The University of New Mexico #PBS -lnodes=4:ppn=4 #PBS -lwalltime=1:00 ## Specify the shell to be bash #PBS -S /bin/bash # set up the PATH environment variable for the # MX (Myrinet) version of mpirun export PATH=/opt/local/mpich-mx-gnu-4.1.0/bin/:$PATH# print out a hello message from each of the processors on this host # indicating the host this is running on mpirun -np 16 -machinefile $PBS_NODEFILE /bin/sh \-c \ "echo Hello World from process \\\$MXMPI_ID of \\\$MXMPI_NP on host `hostname`"
8.2 Tcsh
## Introductory Example ## Copyright (c) 2010 The Center for Advanced Research Computing ## at The University of New Mexico #PBS -lnodes=4:ppn=4 #PBS -lwalltime=1:00 ## Specify the shell to be tcsh #PBS -S /bin/tcsh # set up the PATH environment variable for the # MX (Myrinet) version of mpirun setenv PATH /opt/local/mpich-mx-gnu-4.1.0/bin/:$PATH# print out a hello message from each of the processors on this host # indicating the host this is running on mpirun -np 16 -machinefile $PBS_NODEFILE /bin/sh \-c \ 'echo Hello World from process \$MXMPI_ID of \$MXMPI_NP on host `hostname`
</span> </pre></div></div></div></div> <div class="outline-2" id="outline-container-9" style="display: block; visibility: visible;"><div style="display: none; visibility: hidden;"><div style="display: inline; float: right; text-align: right; font-size: 70%; font-weight: normal;" class="org-info-js_header-navigation"><a>HELP</a> / <a accesskey="m">toggle view</a></div></div><h2 style="cursor: pointer;"> </h2><h2 id="sec-9" style="cursor: pointer;"><strong><span class="section-number-2">9</span> Multi-Node MPI Hello World (from C and Fortran77 Source Code)</strong></h2> <div id="text-9" class="outline-text-2" style="display: block; visibility: visible;"><p>The following examples show how to run an MPI "Hello World" program compiled from either C or Fortran77 source code. These examples each consist of a source code file, a Makefile, and a PBS script.</p> <p>The C and Fortran programs are very similar. In both, they call MPI_Init to initialize the MPI communications, MPI_Comm_size to determine the number of processors in the computation, MPI_Comm_rank to determine this processrank in the computation, gethostname to determine the hostname of the current machine, prints a message indicating the computation size, computation rank, and hostname.
9.1 hello.c Source Code
/* Introductory Example Copyright (c) 2010 The Center for Advanced Research Computing at The University of New Mexico */ /* Include the MPI header file */ #include "mpi.h" #include <unistd.h>int main(int argc, char *argv[]) { int n, myid, numprocs, rc, i, j, k; char hostname[256]; size_t len = 255; /* Initialize MPI */ rc = MPI_Init( &argc, &argv ); /* store the number of processors for this computation in numprocs */ rc = MPI_Comm_size( MPI_COMM_WORLD, &numprocs); /* store the rank of this process in myid */ rc = MPI_Comm_rank( MPI_COMM_WORLD, &myid); /* store the hostname in hostname */ rc = gethostname( hostname, len ); printf( "\nHello World from process: %d of %d on host: %s\n", myid, numprocs, hostname); /* Finalize MPI */ rc = MPI_Finalize(); }
9.2 Makefile
## Introductory Example ## Copyright (c) 2010 The Center for Advanced Research Computing ## at The University of New Mexico # Makefile to compile MPI Hello world C program: hello.c MPI_INCLUDE = -I/opt/local/mpich-mx-gnu-4.1.0/include MPI_LIB_PATHS = -L/opt/local/mpich-mx-gnu-4.1.0/lib/ -L/opt/mx/lib MPI_LIBRARIES = -lmpich -lmyriexpress hello: hello.o gcc hello.o $(MPI_LIBRARIES) -o hello hello.o: hello.c gcc -c $(MPI_INCLUDE) hello.f -o hello.o .PHONY: clean clean: rm -f hello.o hello
9.3 PBS Script
# PBS Script for "Hello World" MPI job ## Introductory Example ## Copyright (c) 2010 The Center for Advanced Research Computing ## at The University of New Mexico #PBS -lnodes=4:ppn=4 #PBS -lwalltime=1:00 ## Specify the shell to be tcsh #PBS -S /bin/tcsh # set up the PATH environment variable for the # MX (Myrinet) version of mpirun setenv PATH /opt/local/mpich-mx-gnu-4.1.0/bin/:$PATH# run hello on 16 processors mpirun -np 16 -machinefile $PBS_NODEFILE hello
9.4 hello.f Source Code
! Introductory Example ! Copyright (c) 2010 The Center for Advanced Research Computing ! at The University of New Mexico program helloworldinclude 'mpif.h' integer comm, rank, numproc, ierror
!Initialize MPI. call MPI_INIT(ierror) call MPI_COMM_RANK(mpi_comm_world, rank, ierror) call MPI_COMM_SIZE(mpi_comm_world, numproc, ierror)
print *,"Hello World from processor",rank,"of",numproc
if (rank == 0) then print *,"Hello again from processor", rank endif
call MPI_FINALIZE(ierror)
end
9.5 Makefile
## Introductory Example ## Copyright (c) 2010 The Center for Advanced Research Computing ## at The University of New Mexico # Makefile to compile MPI Hello World Fortran program: hello.f MPI_INCLUDE = -I/opt/local/mpich-mx-gnu-4.1.0/include MPI_LIB_PATHS = -L/opt/local/mpich-mx-gnu-4.1.0/lib/ -L/opt/mx/lib MPI_LIBRARIES = -lmpich -lmyriexpress hello.o: hello.f gfortran -c $(MPI_INCLUDE) hello.f -o hello.ohello: hello.o gfortran hello.o $(MPI_LIBRARIES) -o hello .PHONY: clean clean: rm -f hello.o hello
9.6 PBS Script
# PBS Script for "Hello World" MPI job ## Introductory Example ## Copyright (c) 2010 The Center for Advanced Research Computing ## at The University of New Mexico #PBS -lnodes=4:ppn=4 #PBS -lwalltime=1:00 ## Specify the shell to be tcsh #PBS -S /bin/tcsh # set up the PATH environment variable for the # MX (Myrinet) version of mpirun setenv PATH /opt/local/mpich-mx-gnu-4.1.0/bin/:$PATH# run hello on 16 processors mpirun -np 16 -machinefile $PBS_NODEFILE hello
Date: 2010-11-05 13:40:38 MDT
HTML generated by org-mode 7.01trans in emacs 24