The logic is wrong. It should check for MATSEQAIJCUSPARSE. On Thu, Jun 23, 2022, 21:36 Mark Adams <[email protected]> wrote:
> > > On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <[email protected]> wrote: > >> >> It looks like the current code copies the nonzero values to the CPU >> from the MPI matrix (with the calls >> PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav)); >> PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them >> into the CPU memory of the Seq matrix. When the matrix entries are next >> accessed on the GPU it should automatically copy them down to the GPU. So >> the code looks ok even for GPUs. We'll need to see the full error message >> with what the "invalid pointer" is. >> > > I showed Matt how to peek into offloadmask and he found that it is a host > state, but this is not the issue. The access method should do the copy to > the device. > > I am thinking the logic here might be wrong. (Matt fixed "VEC" --> "MAT" > in the comparison below). > > Matt, is the issue that you are calling *MatSeqAIJCUSPARSEGetArrayRead *and > getting a host pointer? > > I think the state of amgx->localA after the call to > MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the data > to the device so they are both valid and you should have device data. > > 211 PetscBool is_dev_ptrs; > 212 PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, > &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, "")); > 213 > 214 if (is_dev_ptrs) { > > *216 PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA, > &amgx->values));*217 } else { > 219 PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); > 220 } > > >> Barry >> >> >> Yes this routine is terribly inefficient for GPU matrices, it needs to >> be specialized to not use the GPU memory but that is a separate issue from >> there being bugs in the current code. >> >> The code also seems to implicitly assume the parallel matrix has the >> same nonzero pattern with a reuse. This should be checked with each use by >> stashing the nonzero state of the matrix into the sequential matrix and >> making sure the parallel matrix has that same stashed value each time. >> Currently if one changes the nonzero matrix of the parallel matrix one is >> likely to get random confusing crashes due to memory corruption. But likely >> not the problem here. >> >> On Jun 23, 2022, at 2:23 PM, Mark Adams <[email protected]> wrote: >> >> We have a bug in the AMGx test snes_tests-ex13_amgx in parallel. >> Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass >> in the code below, where the local matrix is created (INITIAL), but in the >> next pass, when "REUSE" is used, he sees an invalid pointer. >> Matt found that it does have offloadmask == CPU. >> Maybe it is missing logic to put the output in same state as the input? >> >> Any ideas on this or should I just dig into it? >> >> Thanks, >> >> bool partial_setup_allowed = (pc->setupcalled && pc->flag != >> DIFFERENT_NONZERO_PATTERN); >> 199 if (amgx->nranks > 1) { >> 200 if (partial_setup_allowed) { >> 202 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, >> &amgx->localA)); // This path seems doesn't work by the time we reach AmgX >> API >> 203 } else { >> 205 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, >> &amgx->localA)); // This path works >> 206 } >> 207 } else { >> 208 amgx->localA = Pmat; >> 209 } >> 210 >> >> >>
