Hi Jed, All

Many thanks to your informative replies on this and the PetSc list. I will make 
a decision based on your advice.

thanks
Sanjay



________________________________
From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Jed 
Brown [j...@59a2.org]
Sent: 06 March 2012 22:50
To: tpri...@computer.org; Open MPI Users
Subject: Re: [OMPI users] parallelising ADI

On Tue, Mar 6, 2012 at 16:23, Tim Prince <n...@aol.com<mailto:n...@aol.com>> 
wrote:
 On 03/06/2012 03:59 PM, Kharche, Sanjay wrote:
Hi

I am working on a 3D ADI solver for the heat equation. I have implemented it as 
serial. Would anybody be able to indicate the best and more straightforward way 
to parallelise it. Apologies if this is going to the wrong forum.


If it's to be implemented in parallelizable fashion (not SSOR style where each 
line uses updates from the previous line), it should be feasible to divide the 
outer loop into an appropriate number of blocks, or decompose the physical 
domain and perform ADI on individual blocks, then update and repeat.

True ADI has inherently high communication cost because a lot of data movement 
is needed to make the _fundamentally sequential_ tridiagonal solves local 
enough that latency doesn't kill you trying to keep those solves distributed. 
This also applies (albeit to a lesser degree) in serial due to way memory works.

If you only do non-overlapping subdomain solves, you must use a Krylov method 
just to ensure convergence. You can add overlap, but the Krylov method is still 
necessary for any practical convergence rate. The method will also require an 
iteration count proportional to the number of subdomains across the global 
domain times the square root of the number of elements across a subdomain. The 
constants may not be small and this asymptotic result is independent of what 
the subdomain solver is. You need a coarse level to improve this scaling.

Sanjay, as Matt and I recommended when you asked the same question on the PETSc 
list this morning, unless this is a homework assignment, you should solve your 
problem with multigrid instead of ADI. We pointed you to simple example code 
that scales well to many thousands of processes.

Reply via email to