Out of Date Warning
Pretty much all of this information is out of date. You should see this page instead. Read on at your own risk...
- Out of Date Warning
- Installing batch mode capable Continuity from the binaries
- Running 'normal' svn installation in batch mode
- Installing batch mode capable Continuity from source
- Running parallel jobs in batch mode without SGE
- Cleaning up after yourself
- Dealing with common problems
Installing batch mode capable Continuity from the binaries
- Download and install Continuity.
- For example, you might try something like this (replace 3173 with whatever the latest release might be.)
wget http://continuity.nbcr.net/cont6_html/modelling/cont6/distrib/cont6_3_1_3173_linux.tgz tar -xzvf cont6_3_1_3173_linux.tgz mv continuity cont3173 cd cont3173 ./setup
- Run Continuity like this:
cd pcty/ ../mglpython ContinuityParallel.pyc --release --script examples/biomechanics01/example_nogui_parallel_full.py
Running 'normal' svn installation in batch mode
- Assuming you already have Continuity checked out on your Linux server, this is how you'd run a batch mode job
- First get your installation capable of running the client like this:
cd cont_dev/linuxlib/ tar -xzvf mgltools.tar.gz mv continuity/pcty/MglToolsLib/ ../pcty/MglToolsLib cd .. sh setup cd pcty
- Now you can run a serial batch mode job like this:
../mglpython ContinuityClient.py --full --batch yourscript.py #to run non-parallel batch mode jobs
- Or a parallel batch mode job like this:
./ContinuityParallel.py --script examples/biomechanics01/example_nogui_parallel_full.py
Type ./ContinuityParallel.py --help to see other available flags
- If you want to try examples/biomechanics01/example_nogui_parallel.py, you will need to look for the "release" flag in the file, and change it from True to False
Installing batch mode capable Continuity from source
- If you are authorized to access Continuity source, you may wish to install batch mode Continuity this way.
Download and run getBatchContinuity.sh. You can download it using wget like this:
- Then run it like this:
chmod +x getBatchContinuity.sh ./getBatchContinuity.sh
- If everything has gone correctly, it should say:
listening on port 9970 Communications using threads
- Add this point press Ctrl-C to exit.
- Now you can test a parallel problem like this:
cd cont6_dev/pcty ../mglpython ContinuityClient.py --no-threads --batch examples/biomechanics01/example_nogui_parallel.py
- it should generate 2 *.xls files if it ran correctly
- Next you need to permanently fix LD_LIBRARY_PATH. I would recommend that you add the following line to your .bashrc file in your home directory:
- Of course, you'll need to modify your_user_name appropriately.
Running parallel jobs in batch mode without SGE
- Make sure that your parallel flag is set to 1 in the script you run. For me, I had to change biomechanics01/example_nogui.py like this:
self.Snonlin(... 'parallel': 1 ...)
- create a file with the list of nodes to use like this:
compute-1-1 0 compute-3-4 1 /home/lionetti/Cont6e/pcty/server/problem/Biomechanics/ParallelSolverBM.sh compute-1-7 1 /home/lionetti/Cont6e/pcty/server/problem/Biomechanics/ParallelSolverBM.sh compute-1-2 1 /home/lionetti/Cont6e/pcty/server/problem/Biomechanics/ParallelSolverBM.sh
Save the file as whatever you like. I called mine test.mpich and it's important in the command below.
In this example, compute-1-1 is your "root" node and it's where you'll run the command below. The other nodes are arbitrary and must be assigned by the queuing system. You'll have to change the paths of course to point to your ParallelSolverBM.sh file.
- ssh to your "root" node. For me, this was "ssh compute-1-1"
- Run Continuity like normal, but add "-p4pg test.mpich -p4wd /home/lionetti/Cont6e/pcty" to the command line where test.mpich is the file you created in step 1, and the path is the path to your pcty directory. For example, I ran the command like this:
../mglpython ContinuityClient.py --full --batch examples/biomechanics01/example_nogui.py -p4pg test.mpich -p4wd /home/lionetti/Cont6e/pcty
If this command doesn't work doesn't try this instead:
/opt/mpich/gnu/bin/mpirun -p4pg test.mpich /home/lionetti/Cont6e/mglpython ContinuityClient.py --full --batch examples/biomechanics01/example_nogui.py
but you'll have to replace "/opt/mpich/gnu/bin/mpirun" with the mpirun path on your system.
Cleaning up after yourself
After running a job, you should always do a qstat | grep <your user name> to make sure it's finished properly.
- if it hasn't finished properly, do a qdel job_id, where job_id is returned from qstat
cluster-fork pkill -9 -u <your_user_name> will ensure that you don't have any zombie jobs running on the nodes
Dealing with common problems
- Most errors are caused by failing to properly clean up after yourself.
Warning: remote port forwarding failed for listen port 9974
- clean up after youself (see above.)
- if this doesn't work, modify the port you are using.
- i.e. in examples/biomechanics01/example_nogui_parallel.py, change the first line to 9925 or something
Data was empty! Probably disconnected from client!
Error in startupFile: list indices must be integers
- clean up after yourself (see above.)
- try again