I am rearranging the code for clarity from the repo but I have: PetscBool is_dev_ptrs; PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA)); PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); PetscPrintf(PETSC_COMM_SELF,"*checking against mataijcusparse amgx->localA = %d*\n",is_dev_ptrs); PetscCall(PetscObjectTypeCompareAny((PetscObject)Pmat, &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); PetscPrintf(PETSC_COMM_SELF,"*checking against mataijcusparse Pmat = %d* \n",is_dev_ptrs);
And it seems to show that MatMPIAIJGetLocalMat takes a mpiaijcusparse Mat and returns an seqaij mat (see below): AMGX version 2.2.0.132-opensource Built on Jun 24 2022, 09:21:43 Compiled with CUDA Runtime 11.5, using CUDA driver 11.5 *checking against mataijcusparse amgx->localA = 0checking against mataijcusparse Pmat = 1* localA_name seqaij Pmat_name mpiaijcusparse Matt's existing testing code (below) then shows the types that conform with these tests and prints that I added. // XXX DEBUG REMOVE const char* localA_name; PetscObjectGetType((PetscObject)amgx->localA, &localA_name); PetscPrintf(PETSC_COMM_SELF,"localA_name %s\n", localA_name); const char* Pmat_name; PetscObjectGetType((PetscObject)Pmat, &Pmat_name); PetscPrintf(PETSC_COMM_SELF,"Pmat_name %s\n", Pmat_name); On Fri, Jun 24, 2022 at 10:00 AM Barry Smith <[email protected]> wrote: > > > On Jun 24, 2022, at 8:58 AM, Mark Adams <[email protected]> wrote: > > And before we move to the MR, I think Matt found a clear problem: > > * PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); > returns "localA seqaij" > > * And, oddly, PetscCall(MatSeqAIJGetArrayRead(amgx->localA, > &amgx->values)); returns: > "it seems to detect that the pointer is a device mapped pointer but that > it is invalid" > > > It does not return a device mapped pointer, it returns a valid host > pointer only. MatSeqAIJGetArrayRead() is intended to only return a host > pointer, it cannot return a device pointer. MatSeqAIJCusparseGetArrayRead() > returns device pointers and should be used for this purpose. > > > Matt, lets just comment out the REUSE line and add another INITIAL line > (destroying the old Mat of course), and lets press on. > > > Looking at the code there is no way that simply using INITIAL instead of > REUSE will make a code that does not work on the GPU run on the GPU. The > MatMPIAIJGetLocalMat() returns only a MATSEQAIJ matrix regardless of > the INITIAL versus REUSE and one can never get a device pointer from a > non-GPU matrix. > > As noted by Stefano, the code either needs to use > MatMPIAIJGetLocalMatMerge () which does return a CUSPARSE matrix (but the > columns are not supported) or MatMPIAIJGetLocalMat() > needs to be updated to return a CUSPARSE matrix when the input MPI matrix > is a CUSPARSE matrix. > > > > > We can keep the debugging code for now. > > We (PETSc) can work on this independently, > > Thanks, > Mark > > On Fri, Jun 24, 2022 at 8:51 AM Mark Adams <[email protected]> wrote: > >> I am not seeing this response, I see my "hstack" comment last. >> https://gitlab.com/petsc/petsc/-/merge_requests/4323 >> >> On Thu, Jun 23, 2022 at 4:37 PM Barry Smith <[email protected]> wrote: >> >>> >>> I have responded in the MR, which has all the context and the code. >>> Please move this conversation from petsc-dev to the MR. Note you can use >>> the little cartoon cloud symbol (upper write of the sub window with my >>> text) to reply to my post and keep everything in a thread for clarity. >>> >>> We are confused because it seems you are trying a variety of things >>> and we don't know how the different things you tried resulted in the >>> multiple errors you reported. >>> >>> >>> >>> On Jun 23, 2022, at 3:59 PM, Matthew Martineau <[email protected]> >>> wrote: >>> >>> I checked in the changes and some debugging statements. >>> >>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >>> PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, >>> &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); >>> >>> Then the call returns false. If we instead call >>> PetscObjectTypeCompareAny on Pmat then it returns true. If you print the >>> type of the matrices: >>> >>> >>> localA seqaij >>> >>> Pmat mpiaijcusparse >>> >>> If you subsequently call MatSeqAIJCUSPARSEGetArrayRead on localA then it >>> errors (presumably because of the type mismatch). >>> >>> If we call MatSeqAIJGetArrayRead on localA and then pass the `values` to >>> AmgX it seems to detect that the pointer is a device mapped pointer but >>> that it is invalid. >>> >>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Seems >>> to return invalid pointer, but I’ll investigate more >>> >>> This doesn’t reproduce if we call: >>> >>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA)); >>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // >>> Pointer appears to be valid and we converge >>> >>> Essentially all I want to achieve is that when we are parallel, we fetch >>> the local part of A and the device pointer to the matrix values from that >>> structure so that we can pass to AmgX. Preferring whichever API calls are >>> the most efficient. >>> >>> >>> *From:* Stefano Zampini <[email protected]> >>> *Sent:* 23 June 2022 20:55 >>> *To:* Mark Adams <[email protected]> >>> *Cc:* Barry Smith <[email protected]>; For users of the development >>> version of PETSc <[email protected]>; Matthew Martineau < >>> [email protected]> >>> *Subject:* Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs >>> >>> *External email: Use caution opening links or attachments* >>> >>> The logic is wrong. It should check for MATSEQAIJCUSPARSE. >>> >>> On Thu, Jun 23, 2022, 21:36 Mark Adams <[email protected]> wrote: >>> >>> >>> >>> On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <[email protected]> wrote: >>> >>> >>> It looks like the current code copies the nonzero values to the CPU >>> from the MPI matrix (with the calls >>> PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav)); >>> PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them >>> into the CPU memory of the Seq matrix. When the matrix entries are next >>> accessed on the GPU it should automatically copy them down to the GPU. So >>> the code looks ok even for GPUs. We'll need to see the full error message >>> with what the "invalid pointer" is. >>> >>> >>> I showed Matt how to peek into offloadmask and he found that it is a >>> host state, but this is not the issue. The access method should do the copy >>> to the device. >>> >>> I am thinking the logic here might be wrong. (Matt fixed "VEC" --> "MAT" >>> in the comparison below). >>> >>> Matt, is the issue that you are calling *MatSeqAIJCUSPARSEGetArrayRead* >>> and getting a host pointer? >>> >>> I think the state of amgx->localA after the call to >>> MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the data >>> to the device so they are both valid and you should have device data. >>> >>> 211 PetscBool is_dev_ptrs; >>> 212 PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, >>> &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, "")); >>> 213 >>> 214 if (is_dev_ptrs) { >>> >>> *216 PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA, >>> &amgx->values));*217 } else { >>> 219 PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); >>> 220 } >>> >>> >>> >>> Barry >>> >>> >>> Yes this routine is terribly inefficient for GPU matrices, it needs to >>> be specialized to not use the GPU memory but that is a separate issue from >>> there being bugs in the current code. >>> >>> The code also seems to implicitly assume the parallel matrix has the >>> same nonzero pattern with a reuse. This should be checked with each use by >>> stashing the nonzero state of the matrix into the sequential matrix and >>> making sure the parallel matrix has that same stashed value each time. >>> Currently if one changes the nonzero matrix of the parallel matrix one is >>> likely to get random confusing crashes due to memory corruption. But likely >>> not the problem here. >>> >>> >>> On Jun 23, 2022, at 2:23 PM, Mark Adams <[email protected]> wrote: >>> >>> We have a bug in the AMGx test snes_tests-ex13_amgx in parallel. >>> Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass >>> in the code below, where the local matrix is created (INITIAL), but in the >>> next pass, when "REUSE" is used, he sees an invalid pointer. >>> Matt found that it does have offloadmask == CPU. >>> Maybe it is missing logic to put the output in same state as the input? >>> >>> Any ideas on this or should I just dig into it? >>> >>> Thanks, >>> >>> bool partial_setup_allowed = (pc->setupcalled && pc->flag != >>> DIFFERENT_NONZERO_PATTERN); >>> >>> 199 if (amgx->nranks > 1) { >>> >>> 200 if (partial_setup_allowed) { >>> >>> 202 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, >>> &amgx->localA)); // This path seems doesn't work by the time we reach AmgX >>> API >>> >>> 203 } else { >>> >>> 205 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, >>> &amgx->localA)); // This path works >>> >>> 206 } >>> >>> 207 } else { >>> >>> 208 amgx->localA = Pmat; >>> >>> 209 } >>> >>> 210 >>> >>> >>> >
