Use VecCreate(), VecSetSizes(), VecSetType() and MatCreate(), MatSetSizes(), and MatSetType() instead of the convience functions VecCreateMPICUDA() and MatCreateShell().
> On Sep 19, 2023, at 8:44 PM, Sreeram R Venkat <[email protected]> wrote: > > Thank you for your reply. > > Let's call this matrix M: > (A B C D) > (E F G H) > (I J K L) > > Now, instead of doing KSP with just M, what if I want M^TM? In this case, the > matvec implementation would be as follows: > > same partitioning of blocks A, B, ..., L among the 12 MPI ranks > matvec looks like: > (a) (w) > (b) = (M^TM ) (x) > (c) (y) > (d) (z) > w, x, y, z stored on ranks A, B, C, D (as before) > a, b, c, d now also stored on ranks A, B, C, D > Based on your message, I believe using a PetscLayout for both the (a,b,c,d) > and (w,x,y,z) vector of (number of columns of A, number of columns of B, > number of columns of C, number of columns of D,0,0,0,0,0,0,0,0,0) should work. > > > I see there are functions "VecSetLayout" and "MatSetLayouts" to set the > PetscLayouts of the matrix and vectors. When I create the vectors (I need > VecCreateMPICUDA) or matrix shell (with MatCreateShell), I need to pass the > local and global sizes. I'm not sure what to do there. > > > Thanks, > Sreeram > > On Tue, Sep 19, 2023, 7:13 PM Barry Smith <[email protected] > <mailto:[email protected]>> wrote: >> >> The PetscLayout local sizes for PETSc (a,b,c) vector (0,0,0,number of >> rows of D, 0,0,0, number of rows of H, 0,0,0,number of rows of L) >> >> >> The PetscLayout local sizes for PETSc (w,x,y,z) vector (number of columns >> of A, number of columns of B, number of columns of C, number of columns of >> D,0,0,0,0,0,0,0,0,0) >> >> The left and right layouts of the shell matrix need to match the two >> above. >> >> There is a huge problem. KSP is written assuming that the left vector >> layout is the same as the right vector layout. So it can do dot products MPI >> rank by MPI rank without needing to send individual vector values around. >> >> I don't it makes sense to use PETSc with such vector decompositions as >> you would like. >> >> Barry >> >> >> >>> On Sep 19, 2023, at 7:44 PM, Sreeram R Venkat <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> With the example you have given, here is what I would like to do: >>> 12 MPI ranks >>> Each rank has one block (rank 0 has A, rank 1 has B, ..., rank 11 has L) - >>> to make the rest of this easier I'll refer to the rank containing block A >>> as "rank A", and so on >>> rank A, rank B, rank C, and rank D have w, x, y, z respectively - the first >>> step of the custom matvec implementation broadcasts w to rank E and rank I >>> (similarly x is broadcast to rank F and rank J ...) >>> at the end of the matvec computation, ranks D, H, and L have a, b, and c >>> respectively >>> Thanks, >>> Sreeram >>> >>> >>> On Tue, Sep 19, 2023 at 6:23 PM Barry Smith <[email protected] >>> <mailto:[email protected]>> wrote: >>>> >>>> ( a ) ( A B C D ) ( w ) >>>> ( b ) = ( E F G H ) ( x ) >>>> ( c ) ( I J K L ) ( y ) >>>> ( z ) >>>> >>>> I have no idea what "The input vector is partitioned across each row, and >>>> the output vector is partitioned across each column" means. >>>> >>>> Anyways the shell matrix needs to live on MPI_COMM_WORLD, as do both the >>>> (a,b,c) and (w,x,y,z) vector. >>>> >>>> Now how many MPI ranks do you want to do the compution on? 12? >>>> Do you want one matrix A .. Z on each rank? >>>> >>>> Do you want the (a,b,c) vector spread over all ranks? What about the >>>> (w,x,y,z) vector? >>>> >>>> Barry >>>> >>>> >>>> >>>>> On Sep 19, 2023, at 4:42 PM, Sreeram R Venkat <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> I have a custom implementation of a matrix-vector product that inherently >>>>> relies on a 2D processor partitioning of the matrix. That is, if the >>>>> matrix looks like: >>>>> >>>>> A B C D >>>>> E F G H >>>>> I J K L >>>>> in block form, we use 12 processors, each having one block. The input >>>>> vector is partitioned across each row, and the output vector is >>>>> partitioned across each column. >>>>> >>>>> Each processor has 3 communicators: the WORLD_COMM, a ROW_COMM, and a >>>>> COL_COMM. The ROW/COL communicators are used to do reductions over >>>>> rows/columns of processors. >>>>> >>>>> With this setup, I am a bit confused about how to set up the matrix >>>>> shell. The "MatCreateShell" function only accepts one communicator. If I >>>>> give the WORLD_COMM, the local/global sizes won't match since PETSc will >>>>> try to multiply local_size * total_processors instead of local_size * >>>>> processors_per_row (or col). I have gotten around this temporarily by >>>>> giving ROW_COMM here instead. What I think happens is a different >>>>> MatShell is created on each row, but when computing the matvec, they all >>>>> work together. >>>>> >>>>> However, if I try to use KSP (CG) with this setup (giving ROW_COMM as the >>>>> communicator), the process hangs. I believe this is due to the >>>>> partitioning of the input/output vectors. The matvec itself is fine, but >>>>> the inner products and other steps of CG fail. In fact, if I restrict to >>>>> the case where I only have one row of processors, I am able to >>>>> successfully use KSP. >>>>> >>>>> Is there a way to use KSP with this 2D partitioning setup when there are >>>>> multiple rows of processors? I'd also prefer to work with one global >>>>> MatShell object instead of this one object per row thing that I'm doing >>>>> right now. >>>>> >>>>> Thanks for your help, >>>>> Sreeram >>>> >>
