With the example you have given, here is what I would like to do: - 12 MPI ranks - Each rank has one block (rank 0 has A, rank 1 has B, ..., rank 11 has L) - to make the rest of this easier I'll refer to the rank containing block A as "rank A", and so on - rank A, rank B, rank C, and rank D have w, x, y, z respectively - the first step of the custom matvec implementation broadcasts w to rank E and rank I (similarly x is broadcast to rank F and rank J ...) - at the end of the matvec computation, ranks D, H, and L have a, b, and c respectively
Thanks, Sreeram On Tue, Sep 19, 2023 at 6:23 PM Barry Smith <[email protected]> wrote: > > ( a ) ( A B C D ) ( w ) > ( b ) = ( E F G H ) ( x ) > ( c ) ( I J K L ) ( y ) > ( z ) > > I have no idea what "The input vector is partitioned across each row, and > the output vector is partitioned across each column" means. > > Anyways the shell matrix needs to live on MPI_COMM_WORLD, as do both the > (a,b,c) and (w,x,y,z) vector. > > Now how many MPI ranks do you want to do the compution on? 12? > Do you want one matrix A .. Z on each rank? > > Do you want the (a,b,c) vector spread over all ranks? What about the (w,x,y,z) > vector? > > Barry > > > > On Sep 19, 2023, at 4:42 PM, Sreeram R Venkat <[email protected]> wrote: > > I have a custom implementation of a matrix-vector product that inherently > relies on a 2D processor partitioning of the matrix. That is, if the matrix > looks like: > > A B C D > E F G H > I J K L > > in block form, we use 12 processors, each having one block. The input > vector is partitioned across each row, and the output vector is partitioned > across each column. > > Each processor has 3 communicators: the WORLD_COMM, a ROW_COMM, and a > COL_COMM. The ROW/COL communicators are used to do reductions over > rows/columns of processors. > > With this setup, I am a bit confused about how to set up the matrix shell. > The "MatCreateShell" function only accepts one communicator. If I give the > WORLD_COMM, the local/global sizes won't match since PETSc will try to > multiply local_size * total_processors instead of local_size * > processors_per_row (or col). I have gotten around this temporarily by > giving ROW_COMM here instead. What I think happens is a different MatShell > is created on each row, but when computing the matvec, they all work > together. > > However, if I try to use KSP (CG) with this setup (giving ROW_COMM as the > communicator), the process hangs. I believe this is due to the partitioning > of the input/output vectors. The matvec itself is fine, but the inner > products and other steps of CG fail. In fact, if I restrict to the case > where I only have one row of processors, I am able to successfully use KSP. > > Is there a way to use KSP with this 2D partitioning setup when there are > multiple rows of processors? I'd also prefer to work with one global > MatShell object instead of this one object per row thing that I'm doing > right now. > > Thanks for your help, > Sreeram > > >
