Thank you for your help. I will try this solution. Sreeram
On Wed, Sep 20, 2023 at 9:24 AM Barry Smith <[email protected]> wrote: > > Use VecCreate(), VecSetSizes(), VecSetType() and MatCreate(), > MatSetSizes(), and MatSetType() instead of the convience functions > VecCreateMPICUDA() and MatCreateShell(). > > > On Sep 19, 2023, at 8:44 PM, Sreeram R Venkat <[email protected]> wrote: > > Thank you for your reply. > > Let's call this matrix *M*: > (A B C D) > (E F G H) > (I J K L) > > Now, instead of doing KSP with just *M*, what if I want *M^TM*? In this > case, the matvec implementation would be as follows: > > > - same partitioning of blocks A, B, ..., L among the 12 MPI ranks > - matvec looks like: > > (a) (w) > (b) = (*M^TM* ) (x) > (c) (y) > (d) (z) > > - w, x, y, z stored on ranks A, B, C, D (as before) > - a, b, c, d now also stored on ranks A, B, C, D > > Based on your message, I believe using a PetscLayout for both the > (a,b,c,d) and (w,x,y,z) vector of (number of columns of A, number of > columns of B, number of columns of C, number of columns of > D,0,0,0,0,0,0,0,0,0) should work. > > > I see there are functions "VecSetLayout" and "MatSetLayouts" to set the > PetscLayouts of the matrix and vectors. When I create the vectors (I need > VecCreateMPICUDA) or matrix shell (with MatCreateShell), I need to pass the > local and global sizes. I'm not sure what to do there. > > > Thanks, > Sreeram > > On Tue, Sep 19, 2023, 7:13 PM Barry Smith <[email protected]> wrote: > >> >> The PetscLayout local sizes for PETSc (a,b,c) vector (0,0,0,number of >> rows of D, 0,0,0, number of rows of H, 0,0,0,number of rows of L) >> >> >> The PetscLayout local sizes for PETSc (w,x,y,z) vector (number of >> columns of A, number of columns of B, number of columns of C, number of >> columns of D,0,0,0,0,0,0,0,0,0) >> >> The left and right layouts of the shell matrix need to match the two >> above. >> >> There is a huge problem. KSP is written assuming that the left vector >> layout is the same as the right vector layout. So it can do dot products >> MPI rank by MPI rank without needing to send individual vector values >> around. >> >> I don't it makes sense to use PETSc with such vector decompositions as >> you would like. >> >> Barry >> >> >> >> On Sep 19, 2023, at 7:44 PM, Sreeram R Venkat <[email protected]> >> wrote: >> >> With the example you have given, here is what I would like to do: >> >> - 12 MPI ranks >> - Each rank has one block (rank 0 has A, rank 1 has B, ..., rank 11 >> has L) - to make the rest of this easier I'll refer to the rank containing >> block A as "rank A", and so on >> - rank A, rank B, rank C, and rank D have w, x, y, z respectively - >> the first step of the custom matvec implementation broadcasts w to rank E >> and rank I (similarly x is broadcast to rank F and rank J ...) >> - at the end of the matvec computation, ranks D, H, and L have a, b, >> and c respectively >> >> Thanks, >> Sreeram >> >> >> On Tue, Sep 19, 2023 at 6:23 PM Barry Smith <[email protected]> wrote: >> >>> >>> ( a ) ( A B C D ) ( w ) >>> ( b ) = ( E F G H ) ( x ) >>> ( c ) ( I J K L ) ( y ) >>> ( z ) >>> >>> I have no idea what "The input vector is partitioned across each row, >>> and the output vector is partitioned across each column" means. >>> >>> Anyways the shell matrix needs to live on MPI_COMM_WORLD, as do both the >>> (a,b,c) and (w,x,y,z) vector. >>> >>> Now how many MPI ranks do you want to do the compution on? 12? >>> Do you want one matrix A .. Z on each rank? >>> >>> Do you want the (a,b,c) vector spread over all ranks? What about the >>> (w,x,y,z) >>> vector? >>> >>> Barry >>> >>> >>> >>> On Sep 19, 2023, at 4:42 PM, Sreeram R Venkat <[email protected]> >>> wrote: >>> >>> I have a custom implementation of a matrix-vector product that >>> inherently relies on a 2D processor partitioning of the matrix. That is, if >>> the matrix looks like: >>> >>> A B C D >>> E F G H >>> I J K L >>> >>> in block form, we use 12 processors, each having one block. The input >>> vector is partitioned across each row, and the output vector is partitioned >>> across each column. >>> >>> Each processor has 3 communicators: the WORLD_COMM, a ROW_COMM, and a >>> COL_COMM. The ROW/COL communicators are used to do reductions over >>> rows/columns of processors. >>> >>> With this setup, I am a bit confused about how to set up the matrix >>> shell. The "MatCreateShell" function only accepts one communicator. If I >>> give the WORLD_COMM, the local/global sizes won't match since PETSc will >>> try to multiply local_size * total_processors instead of local_size * >>> processors_per_row (or col). I have gotten around this temporarily by >>> giving ROW_COMM here instead. What I think happens is a different MatShell >>> is created on each row, but when computing the matvec, they all work >>> together. >>> >>> However, if I try to use KSP (CG) with this setup (giving ROW_COMM as >>> the communicator), the process hangs. I believe this is due to the >>> partitioning of the input/output vectors. The matvec itself is fine, but >>> the inner products and other steps of CG fail. In fact, if I restrict to >>> the case where I only have one row of processors, I am able to successfully >>> use KSP. >>> >>> Is there a way to use KSP with this 2D partitioning setup when there are >>> multiple rows of processors? I'd also prefer to work with one global >>> MatShell object instead of this one object per row thing that I'm doing >>> right now. >>> >>> Thanks for your help, >>> Sreeram >>> >>> >>> >> >
