Use VecCreate(), VecSetSizes(), VecSetType() and MatCreate(), MatSetSizes(), 
and MatSetType() instead of the convience functions VecCreateMPICUDA() and 
MatCreateShell().


> On Sep 19, 2023, at 8:44 PM, Sreeram R Venkat <[email protected]> wrote:
> 
> Thank you for your reply.
> 
> Let's call this matrix M: 
> (A B C D)
> (E F G H)
> (I  J K L)
> 
> Now, instead of doing KSP with just M, what if I want M^TM? In this case, the 
> matvec implementation would be as follows:
> 
> same partitioning of blocks A, B, ..., L among the 12 MPI ranks
> matvec looks like:
> (a)                  (w)
> (b) = (M^TM ) (x)
> (c)                   (y)
> (d)                   (z)
> w, x, y, z stored on ranks A, B, C, D (as before)
> a, b, c, d now also stored on ranks A, B, C, D
> Based on your message, I believe using a PetscLayout for both the (a,b,c,d) 
> and (w,x,y,z) vector of (number of columns of A, number of columns of B, 
> number of columns of C, number of columns of D,0,0,0,0,0,0,0,0,0) should work.
> 
> 
> I see there are functions "VecSetLayout" and "MatSetLayouts" to set the 
> PetscLayouts of the matrix and vectors. When I create the vectors (I need 
> VecCreateMPICUDA) or matrix shell (with MatCreateShell), I need to pass the 
> local and global sizes. I'm not sure what to do there.
> 
> 
> Thanks,
> Sreeram
> 
> On Tue, Sep 19, 2023, 7:13 PM Barry Smith <[email protected] 
> <mailto:[email protected]>> wrote:
>> 
>>    The PetscLayout local sizes for PETSc (a,b,c) vector (0,0,0,number of 
>> rows of D, 0,0,0, number of rows of H, 0,0,0,number of rows of L)
>> 
>>    
>>    The PetscLayout local sizes for PETSc (w,x,y,z) vector (number of columns 
>> of A, number of columns of B, number of columns of C, number of columns of 
>> D,0,0,0,0,0,0,0,0,0)
>> 
>>    The left and right layouts of the shell matrix need to match the two 
>> above. 
>> 
>>    There is a huge problem. KSP is written assuming that the left vector 
>> layout is the same as the right vector layout. So it can do dot products MPI 
>> rank by MPI rank without needing to send individual vector values around.
>> 
>>    I don't it makes sense to use PETSc with such vector decompositions as 
>> you would like.
>> 
>>   Barry
>> 
>> 
>> 
>>> On Sep 19, 2023, at 7:44 PM, Sreeram R Venkat <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> With the example you have given, here is what I would like to do:
>>> 12 MPI ranks
>>> Each rank has one block (rank 0 has A, rank 1 has B, ..., rank 11 has L) - 
>>> to make the rest of this easier I'll refer to the rank containing block A 
>>> as "rank A", and so on
>>> rank A, rank B, rank C, and rank D have w, x, y, z respectively - the first 
>>> step of the custom matvec implementation broadcasts w to rank E and rank I 
>>> (similarly x is broadcast to rank F and rank J ...)
>>> at the end of the matvec computation, ranks D, H, and L have a, b, and c 
>>> respectively
>>> Thanks,
>>> Sreeram
>>> 
>>> 
>>> On Tue, Sep 19, 2023 at 6:23 PM Barry Smith <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> 
>>>>  (  a )       (  A  B  C  D ) (   w )
>>>>  (  b )   =  (  E  F  G H  ) (  x )
>>>>  (  c )        ( I    J   K L  )  ( y  )
>>>>                                        ( z  )
>>>> 
>>>> I have no idea what "The input vector is partitioned across each row, and 
>>>> the output vector is partitioned across each column" means.
>>>> 
>>>> Anyways the shell matrix needs to live on MPI_COMM_WORLD, as do both the 
>>>> (a,b,c) and (w,x,y,z) vector. 
>>>> 
>>>> Now how many MPI ranks do you want to do the compution on? 12?  
>>>> Do you want one matrix A .. Z on each rank?
>>>> 
>>>> Do you want the (a,b,c) vector spread over all ranks? What about the 
>>>> (w,x,y,z) vector?
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>> 
>>>>> On Sep 19, 2023, at 4:42 PM, Sreeram R Venkat <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> I have a custom implementation of a matrix-vector product that inherently 
>>>>> relies on a 2D processor partitioning of the matrix. That is, if the 
>>>>> matrix looks like:
>>>>> 
>>>>> A B C D
>>>>> E F G H
>>>>> I  J K L
>>>>> in block form, we use 12 processors, each having one block. The input 
>>>>> vector is partitioned across each row, and the output vector is 
>>>>> partitioned across each column.
>>>>> 
>>>>> Each processor has 3 communicators: the WORLD_COMM, a ROW_COMM, and a 
>>>>> COL_COMM. The ROW/COL communicators are used to do reductions over 
>>>>> rows/columns of processors.
>>>>> 
>>>>> With this setup, I am a bit confused about how to set up the matrix 
>>>>> shell. The "MatCreateShell" function only accepts one communicator. If I 
>>>>> give the WORLD_COMM, the local/global sizes won't match since PETSc will 
>>>>> try to multiply local_size * total_processors instead of local_size * 
>>>>> processors_per_row (or col). I have gotten around this temporarily by 
>>>>> giving ROW_COMM here instead. What I think happens is a different 
>>>>> MatShell is created on each row, but when computing the matvec, they all 
>>>>> work together. 
>>>>> 
>>>>> However, if I try to use KSP (CG) with this setup (giving ROW_COMM as the 
>>>>> communicator), the process hangs. I believe this is due to the 
>>>>> partitioning of the input/output vectors. The matvec itself is fine, but 
>>>>> the inner products and other steps of CG fail. In fact, if I restrict to 
>>>>> the case where I only have one row of processors, I am able to 
>>>>> successfully use KSP. 
>>>>> 
>>>>> Is there a way to use KSP with this 2D partitioning setup when there are 
>>>>> multiple rows of processors? I'd also prefer to work with one global 
>>>>> MatShell object instead of this one object per row thing that I'm doing 
>>>>> right now.
>>>>> 
>>>>> Thanks for your help,
>>>>> Sreeram
>>>> 
>> 

Reply via email to