media/logo.jpg

Day 2 Laboratory: Getting Started with MPI

Your goal for today's class is to familiarize yourself with MPI by exploring working examples, creating a deadlocked program, and writing a general routine for local communication between neighbors on a gird

Examples

I have checked in a few of the mpi4py examples into the hpc@aub repository. You can get them with the following commands:

git clone git@bitbucket.org:ahmadia/hpc_at_aub.git
cd hpc_at_aub/day_2/

You should see a compute-pi and a hello directory

Hello World!

Enter the hello directory and inspect the various source files there. Try and compile the C and Fortran files using the mpi wrappers:

mpicc -o hello helloworld.c

Then run them using mpirun

mpirun -n 5 hello

Now run the Python demo (no need to compile!)

mpirun -n 5 python helloworld.py

Modify the Python code so that the process numbers print out their ranks in order.

Computing Pi

Now enter the compute-pi directory. You will see three different source files in there, representing collective communications, dynamic process management, and remote memory access.

Collective communications (cpi-cco.py) and remote memory access (cpi-rma.py) are two different paradigms for solving the same problem. Launch these using mpirun:

If the process is hanging, it's because you need a more robust method for communicating via command line to process 0. Copy the raw_input function from cpi-dma.py into cpi-cco.py and cpi-rma.py so that you can test them out. Don't forget to import sys!

Dynamic process management (cpi-dma.py) should look familiar to those of you with experience in threaded programming. A client process launches several worker processes, or 'servers', to do the compute work for it. This dynamic process launch and execution strategy is the dominant pattern used in Cloud computing, and is the paradigm for interactive parallel solutions such as MATLAB's parallel toolbox.

Note that dynamic process management is currently not supported on many large supercomputers.

Parallel Game of Life

I've checked in (really this time) a simple Game of Life simulation in Python as life.py. Your mission is to come up with a parallelization strategy for computing 1000 iterations of a 256 x 256 grid in life.py as fast as possible. Here are some suggestions for your development strategy:

  • start with a simple grid and 2 processors, how should they communicate?
  • now think about a simple grid with 4 processors, what is the best way to divide it to minimize communication?
  • choose a data exchange strategy, there are advantages to each approach for this problem

Feel free to commit your solution to your forked repository and issue a pull request if you'd like to share it with your classmates or the world.

Credits