Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

Stefano Zampini Thu, 23 Jun 2022 12:55:35 -0700

The logic is wrong. It should check for MATSEQAIJCUSPARSE.

On Thu, Jun 23, 2022, 21:36 Mark Adams <[email protected]> wrote:


>
>
> On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <[email protected]> wrote:
>
>>
>>   It looks like the current code copies the nonzero values to the CPU
>> from the MPI matrix (with the calls
>> PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav));
>>   PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them
>> into the CPU memory of the Seq matrix. When the matrix entries are next
>> accessed on the GPU it should automatically copy them down to the GPU.  So
>> the code looks ok even for GPUs. We'll need to see the full error message
>> with what the "invalid pointer" is.
>>
>
> I showed Matt how to peek into offloadmask and he found that it is a host
> state, but this is not the issue. The access method should do the copy to
> the device.
>
> I am thinking the logic here might be wrong. (Matt fixed "VEC" --> "MAT"
> in the comparison below).
>
> Matt, is the issue that you are calling  *MatSeqAIJCUSPARSEGetArrayRead *and
> getting a host pointer?
>
> I think the state of amgx->localA after the call to
> MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the data
> to the device so they are both valid and you should have device data.
>
> 211   PetscBool is_dev_ptrs;
> 212   PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA,
> &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, ""));
> 213
> 214   if (is_dev_ptrs) {
>
> *216     PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA,
> &amgx->values));*217   } else {
> 219     PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values));
> 220   }
>
>
>>   Barry
>>
>>
>>  Yes this routine is terribly inefficient for GPU matrices, it needs to
>> be specialized to not use the GPU memory but that is a separate issue from
>> there being bugs in the current code.
>>
>>   The code also seems to implicitly assume the parallel matrix has the
>> same nonzero pattern with a reuse. This should be checked with each use by
>> stashing the nonzero state of the matrix into the sequential matrix and
>> making sure the parallel matrix has that same stashed value each time.
>> Currently if one changes the nonzero matrix of the parallel matrix one is
>> likely to get random confusing crashes due to memory corruption. But likely
>> not the problem here.
>>
>> On Jun 23, 2022, at 2:23 PM, Mark Adams <[email protected]> wrote:
>>
>> We have a bug in the AMGx test snes_tests-ex13_amgx in parallel.
>> Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass
>> in the code below, where the local matrix is created (INITIAL), but in the
>> next pass, when "REUSE" is used, he sees an invalid pointer.
>> Matt found that it does have offloadmask == CPU.
>> Maybe it is missing logic to put the output in same state as the input?
>>
>> Any ideas on this or should I just dig into it?
>>
>> Thanks,
>>
>> bool partial_setup_allowed = (pc->setupcalled && pc->flag != 
>> DIFFERENT_NONZERO_PATTERN);
>> 199   if (amgx->nranks > 1) {
>> 200     if (partial_setup_allowed) {
>> 202       PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, 
>> &amgx->localA)); // This path seems doesn't work by the time we reach AmgX 
>> API
>> 203     } else {
>> 205       PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, 
>> &amgx->localA)); // This path works
>> 206     }
>> 207   } else {
>> 208     amgx->localA = Pmat;
>> 209   }
>> 210
>>
>>
>>

Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs

Reply via email to