Re: [OMPI users] 3D domain decomposition with MPI

Gus Correa Fri, 12 Mar 2010 15:06:43 -0500

Hi Cole, Jed

I don't have much direct experience with PETSc.
I mostly troubleshooted other people's PETSc programs,
and observed their performance.
What I noticed is:
1) PETSc's learning curve is as steep if not steeper than MPI, and
2) PETSc codes seem to be slower (or have more overhead)
than codes written directly in MPI.
Jed seems to have a different perception of PETSc, though,
and is more enthusiastic about it.


Admittedly, I don't have any direct comparison
(i.e. the same exact code implemented via PETSc and via MPI),
to support what I said above.

However, it is true that PETSc comes with a big toolbox,
and a large number of solvers.
PETSc is also very flexible, you can attach a variety of other
packages to it (most linear algebra stuff, sparse matrices, etc, etc).

The PETSc fans here love its ability to
prototype complex algorithms in relatively little time.
I don't know if it should be used for production code, though,
and I don't even know if your code is meant to run efficiently
in production mode.

OTOH, if you have a clean and good serial code already developed,
I think it won't be a big deal to parallelize it directly
with MPI, assuming that the core algorithm (your Gauss-Seidel solver)
fits the remaining code in a well structured way.

As per the previous discussion on this thread,
if you don't plan to use a very large number of processors,
you can parallelize in X "books"
(see the IU and LLNL diffusion equation example I sent you),
which is probably the simplest approach,
and still get a decent performance.
(For many processors / small subdomains, use XY "pencils instead.)

With X "book" subdomains you won't need to create MPI types.
You can use the very array sections (pointer and size, assuming
they sit in contiguous memory) in the MPI functions.

To read/write initial/final data you can use a master/slave scheme,
where process rank 0 (master) reads the whole thing,
then use MPI_Scatter[v] to distribute the data
to the other process ranks.
(Rank 0 can still do part of
the computational work also.)
At the end, all processes use MPI_Gather[v] to return the
results to rank 0, who writes the results to disk.

At each time step you exchange halo/ghost sections across
neighbor subdomains, using MPI_Send/MPI_Recv,
or MPI_SendRecv.
Even better if you use non-blocking calls
MPI_ISend/MPI_[I]Recv/MPI_Wait[all].
Read about the advantages of non-blocking communication
in the "MPI The Complete Reference, Vol 1" book that I suggested
to you.

You can do the bookkeeping of "which subdomain/process_rank is my
left neighbor?" etc, yourself, if you create domain neighbor
tables when the program initializes.
Alternatively, and more elegantly, you can use the MPI
Cartesian topology functions to take care of this for you.

The description above is probably a very simple minded approach to
parallelization.
However, it should work with a decent performance,
and won't take too much effort or time to develop.

I hope this helps.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Cole, Derek E wrote:

I actually am only working on this code because it is what

someone wants. I would have probably chosen a
different solver as  well.
We do have some vector machines though that this will probably run on.

I did a lot of memory work already in the serial code to speed

it up pretty significantly.
I also have to a little careful of what other
libraries are introduced.
I am reading up on PETSc to see how flexible it is.

Thanks for the responses

-----Original Message-----
From: Jed Brown [mailto:five...@gmail.com] On Behalf Of Jed Brown
Sent: Thursday, March 11, 2010 1:00 PM
To: Cole, Derek E; us...@open-mpi.org
Subject: Re: [OMPI users] 3D domain decomposition with MPI

On Thu, 11 Mar 2010 12:44:01 -0500, "Cole, Derek E" <derek.e.c...@lmco.com> 
wrote:
I am replying to this via the daily-digest message I got. Sorry itwasn't sooner... I didn't realize I was getting replies until I gotthe digest. Does anyone know how to change it so I get the emails asyou all send them?
Log in at the bottom and edit options:

  http://www.open-mpi.org/mailman/listinfo.cgi/users
I am doing a Red-black Gauss-Seidel algorithm.
Note that red-black Guss-Seidel is a terrible algorithm on cache-based 
hardware, it only makes sense on vector hardware.  The reason for this is that 
the whole point is to approximate a dense action (the inverse of a matrix), but 
the red-black ordering causes this action to be purely local.  A sequential 
ordering, on the other hand, is like a dense lower-triangular operation, which 
tends to be much closer to a real inverse.  In parallel, you do these 
sequential sweeps on each process, and communicate when you're done.  The 
memory performance will be twice as good, and the algorithm will converge in 
fewer iterations.
The ghost points are what I was trying to figure for moving this intothe 3rd dimension. Thanks for adding some concrete-ness to my idea ofexactly how much overhead is involved. The test domains I am computingon are on the order of 100*50*50 or so. This is why I am trying tolimit the overhead of the MPI communication. I am in the process offinding out exactly how big the domains may become, so that I canadjust the algorithm accordingly.
The difficulty is for small subdomains.  If you have large subdomains, then 
parallel overhead will almost always be small.
If I understand what you mean by pencils versus books, I don't thinkthat will work for these types of calculations will it? Maybe a betterexplanation of what you mean by a pencil versus a book. If I was goingto solve a sub-matrix of the XY plane for all Z-values, what is thatconsidered?
That would be a "book" or "slab".

I still recommend using PETSc rather than reproducing standard code to call MPI 
directly for this, it will handle all the decomposition and updates, and has 
the advantage that you'll be able to use much better algorithms than 
Gauss-Seidel.

Jed
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] 3D domain decomposition with MPI

Reply via email to