University of Minnesota Institute of Technology     One Stop   Directories   Search U of M 
School of Mathematics

Tips for Running Large Computations

The Math Department's computer resources are limited to desktop computers, which are shared by a large number of users, and a few computation servers. As such, your large computations should be crafted to play well with others.

Access to the compute servers is limited, but email adm@math.umn.edu if you'd like to have access.

Dedicated research computing servers are available at the Super Computing Institute run by the Digital Technology Center.

Use ssh to run jobs on a remote machine using nice, nohup, and local temp space for data files. If the ssh client machine goes down, then nohup keeps the computation running and sending it's output to a file. For example,

ssh charger 'cd ~/magma; nohup nice /usr/bin/magma magma.script > /var/tmp/magma.out' &

General Info

Run your computations with a reduced scheduling priority.

Computations on shared workstations must use 'nice -n 19 COMMAND' where COMMAND is the command with arguments used to run your computation. Running your computations with the lowest scheduling priority should not significantly impact the run time of your computation unless another computation with a higher priority is running on the same workstation.

On lab machines, it's important to 'nice' commands so the computer is still usable for desktop users. If an instructional lab has a regularly scheduled class that uses it, any jobs on the computers in that lab may be killed if they adversely affect the computers' performance.

The top and renice commands can be used to change the nice value for a running job.

Include checkpoints in your code.

Workstations in the Math Department may need to be restarted at any time; resulting in the termination of your computation. Every workstation in the department is restarted for software installation purposes every week. Each workstation shows the date and time on which it is next scheduled to restart on its login screen, as well as in the message shown when logging in via SSH. For large computations, you should include checkpoints in your computational code so that you may resume your computation with minimal data loss.

Our staff has a pretty good idea of when each machine will be reinstalled so contact adm@math.umn.edu for workstation suggestions or to find out when a particular workstation will be reinstalled.

Selecting Hardware Resources

Use a fast machine

Check the Math Computing Facilities page and Minnesota Supercompuing Institue for lists of fast machines, (As of Mar 2008, the VinH 314 lab machines are the fastest.) Compare performance of a short job on different types of machines, if you have time.

Check system resources while running your code

No matter what programming language you develop in, you can use tools to check system-wide resource usage. Fedora includes the command gnome-system-monitor in the menu entry System Tools > System Monitor. You can also get OS wide details with commands like top, free, uptime, iostat and mpstat.

For example, top and uptime show the current load, but only for one processor. The mpstat command can show the load for each processor or core.

$ uptime
 12:20:08 up 7 days,  1:15,  1 user,  load average: 0.07, 0.02, 0.00
$ mpstat -P ALL
Linux 2.6.23.1-21.fc7 (vinh270b-2.math.umn.edu)         03/24/2008

12:20:10 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
12:20:10 PM  all    0.35    0.00    0.04    0.05    0.00    0.01    0.00   99.54   1044.94
12:20:10 PM    0    0.18    0.00    0.02    0.15    0.00    0.00    0.00   99.64    126.25
12:20:10 PM    1    0.23    0.00    0.01    0.01    0.00    0.00    0.00   99.74    125.61
12:20:10 PM    2    0.71    0.00    0.03    0.15    0.00    0.00    0.00   99.10    126.26
12:20:10 PM    3    0.26    0.00    0.02    0.03    0.00    0.00    0.00   99.69    125.62
12:20:10 PM    4    0.18    0.00    0.01    0.02    0.00    0.00    0.00   99.78    125.94
12:20:10 PM    5    0.15    0.00    0.01    0.02    0.00    0.00    0.00   99.82    125.61
12:20:10 PM    6    0.85    0.00    0.24    0.01    0.01    0.06    0.00   98.84    137.53
12:20:10 PM    7    0.25    0.00    0.01    0.03    0.00    0.00    0.00   99.71    125.60

For more info see Resource Monitoring for Red Hat Linux 9, and the more recent System Monitoring for Red Hat Enterprise Linux 5.

Use local disk, it's faster

Local file access is 10 times faster than network file access. If your computation needs to read or write data, this can save you time.

As an example, here's are some times to write 50k zeroes to local disk, or a home directory mounted using NFS.

$ for d in /var/tmp ~ ; do
	echo -en "\n$d";
	time dd count=50000 if=/dev/zero of=$d/zero_test 2>/dev/null ;
	rm $d/zero_test;
done

/var/tmp
real    0m0.260s
user    0m0.018s
sys     0m0.237s

/home/johndoe
real    0m2.580s
user    0m0.025s
sys     0m0.263s

Use local disk, it reduces load on file servers.

If your computation will be generating heavy I/O, you should be reading and writing to disk space local to the workstation. Use /var/tmp for this purpose, and move data to your home directory periodically or at the completion of your computation.

The lsof command on Linux will list open file handles.

[jdoe@foo ~]$  lsof |grep jdoe | grep home |grep [0-9]w
hb003      1924     jdoe     4w      REG       0,35   122840   20037696 /home/grad/jdoe/codes/data/DEF.dat (bar.example.edu:/home/grad/jdoe)
hb003      1924     jdoe     4w      REG       0,35   122840   20037696 /var/tmp/jdoe/codes/data/DEF.dat (bar.example.edu:/home/grad/jdoe)

The directory /var/tmp will be erased each time the workstation is reinstalled.

Every user's home directory is centralized on our primary file server. Each workstation mounts these directories over the network using NFS. Thus, every time you write or read to your home directory, you produce network traffic. The command below runs a program writing into local disk, and if it's successful moves the data file to the users home.

/usr/bin/magma magma.script > /var/tmp/magma.out && nice mv /var/tmp/magma.out ~/magma.out

Job Control

Run the job noninteractively

Interactive environments increase the possibility of not documenting steps to produce results. Some environments have manage their memory as a heap and can become fragmented over time. To reduce complexity, try running computations with no human interactions. This should also start with a clean environment each time.

For example,

nice /usr/local/bin/math < ~/commands.mathematica > /var/tmp/mathematica_results.txt

Use data export features for smaller output

Programming environments like Fortran, C, Matlab, Mathematica and Maple have different data import and export features. Check the support documents for more information.

In C and Fortran you might use fopen and printf to write into a file. Programming enviroments like Matlab, Mathematica or Maple provide more convenience with functions that can export to specific output formats like JPG, PDF, Excel, or XHTML.

Using data export functions can make smaller files and your program can output to multiple files. Capturing standard out can be easier, but the files produced can be large and diffficult to process afterwards.

Schedule the job to run later and for when the load isn't high

Jobs can be run latter using the at and batch commands.

For example,

nice echo "matlab -nodesktop < ~/my_simulation.m > /var/tmp/my_simulation_results.txt" | at 3:00
and to run when the load is below 0.8 try,
nice echo "maple < ~/commands.maple > /var/tmp/maple.out" | batch

Use nohup to run the job

Prefix the nohup command before a long job, and the command will continue to run if you are logged out.

nohup nice ./monte_carlo 1.7128 > /var/tmp/results.out

Remember to send the output to local disk to reduce network load.

Optionally use screen to run the job

Using nohup is still a good idea, but if you want to see output from the job as it's running over multiple days you can use screen to detach a login shell and reattach at a later time. You can also run multiple shells from a single screen session, which is another nice feature if you're connecting from home to run or check the status of jobs. Screen commands start with a prefix (Ctrl-a by default) and then a command key (usually a single character).

For example, to start matlab in a screen session you could run...

screen nice matlab -nodesktop -nojvm -nosplash
Then press Ctrl-a d to detach the screen session from your terminal. You can log out and screen will keep matlab running.

The next day you could reattach to the running screen by running

screen -r
Inside of a screen session, the basic commands are listed below.
  • Ctrl-a d Dettach from the screen session.
  • Ctrl-a c Create a new terminal inside screen.
  • Ctrl-a 0 Go to terminal 0.
  • Ctrl-a 1 Go to terminal 1.
  • Ctrl-a ? Show a list of commands.
In the help screen, the caret (or hat) character "^" means press the Ctrl key.

Customizing Screen Preferences

Here's a sample ".screenrc" so the screen hardstatus line always shows the hostname, a list of the virtual terminals and a clock.
#Keep the status line active
hardstatus alwayslastline

# Use the "pinfo screen" command to review the topic "string escapes"
hardstatus string '%{= kG}%H:%{w} %= %?%-Lw%?%{G}(%n%f %t%?(%u)%?%)%{w}%?%+Lw%?%?%= %Y%m%d %c'
#                           ^                    ^                                  ^
#                     tomato:             0$ sh  (1$ mail)  2$ sh                   20081023 10:00 
#                      green              white   green      white                      white
#                     Hostname            Shells Zero Through Nine                  Date and Time

# Default screens
term linux
screen -t sh    0
screen -t emacs 1 emacs -nw
screen -t sh    2

Screen running in an 80x24 character xterm

Institute of Technology
www.math.umn.edu/systems_guide/computation.html
Last Modified March 27, 2009
Contact the School of Mathematics
The University of Minnesota is an equal opportunity educator and employer.
© 2009, The Regents of the University of Minnesota
  Enter keyword search