On Wed, Apr 17, 2024 at 7:51 AM Sreeram R Venkat <[email protected]> wrote:
> Do you know if there are plans for NCCL support in PETSc? > What is your need? Do you mean using NCCL for the MPI communication? > > On Tue, Apr 16, 2024, 10:41 PM Junchao Zhang <[email protected]> > wrote: > >> Glad to hear you found a way. Did you use Frontera at TACC? If yes, I >> could have a try. >> >> --Junchao Zhang >> >> >> On Tue, Apr 16, 2024 at 8:35 PM Sreeram R Venkat <[email protected]> >> wrote: >> >>> I finally figured out a way to make it work. I had to build PETSc and my >>> application using the (non GPU-aware) Intel MPI. Then, before running, I >>> switch to the MVAPICH2-GDR. I'm not sure why that works, but it's the only >>> way I've >>> ZjQcmQRYFpfptBannerStart >>> This Message Is From an External Sender >>> This message came from outside your organization. >>> >>> ZjQcmQRYFpfptBannerEnd >>> I finally figured out a way to make it work. I had to build PETSc and my >>> application using the (non GPU-aware) Intel MPI. Then, before running, I >>> switch to the MVAPICH2-GDR. >>> I'm not sure why that works, but it's the only way I've found to compile >>> and run successfully without throwing any errors about not having a >>> GPU-aware MPI. >>> >>> >>> >>> On Fri, Dec 8, 2023 at 5:30 PM Mark Adams <[email protected]> wrote: >>> >>>> You may need to set some env variables. This can be system specific so >>>> you might want to look at docs or ask TACC how to run with GPU-aware MPI. >>>> >>>> Mark >>>> >>>> On Fri, Dec 8, 2023 at 5:17 PM Sreeram R Venkat <[email protected]> >>>> wrote: >>>> >>>>> Actually, when I compile my program with this build of PETSc and run, >>>>> I still get the error: >>>>> >>>>> PETSC ERROR: PETSc is configured with GPU support, but your MPI is not >>>>> GPU-aware. For better performance, please use a GPU-aware MPI. >>>>> >>>>> I have the mvapich2-gdr module loaded and MV2_USE_CUDA=1. >>>>> >>>>> Is there anything else I need to do? >>>>> >>>>> Thanks, >>>>> Sreeram >>>>> >>>>> On Fri, Dec 8, 2023 at 3:29 PM Sreeram R Venkat <[email protected]> >>>>> wrote: >>>>> >>>>>> Thank you, changing to CUDA 11.4 fixed the issue. The mvapich2-gdr >>>>>> module didn't require CUDA 11.4 as a dependency, so I was using 12.0 >>>>>> >>>>>> On Fri, Dec 8, 2023 at 1:15 PM Satish Balay <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Executing: mpicc -show >>>>>>> stdout: icc -I/opt/apps/cuda/11.4/include >>>>>>> -I/opt/apps/cuda/11.4/include -lcuda -L/opt/apps/cuda/11.4/lib64/stubs >>>>>>> -L/opt/apps/cuda/11.4/lib64 -lcudart -lrt >>>>>>> -Wl,-rpath,/opt/apps/cuda/11.4/lib64 -Wl,-rpath,XORIGIN/placeholder >>>>>>> -Wl,--build-id -L/opt/apps/cuda/11.4/lib64/ -lm >>>>>>> -I/opt/apps/intel19/mvapich2-gdr/2.3.7/include >>>>>>> -L/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,-rpath >>>>>>> -Wl,/opt/apps/intel19/mvapich2-gdr/2.3.7/lib64 -Wl,--enable-new-dtags >>>>>>> -lmpi >>>>>>> >>>>>>> Checking for program /opt/apps/cuda/12.0/bin/nvcc...found >>>>>>> >>>>>>> Looks like you are trying to mix in 2 different cuda versions in >>>>>>> this build. >>>>>>> >>>>>>> Perhaps you need to use cuda-11.4 - with this install of mvapich.. >>>>>>> >>>>>>> Satish >>>>>>> >>>>>>> On Fri, 8 Dec 2023, Matthew Knepley wrote: >>>>>>> >>>>>>> > On Fri, Dec 8, 2023 at 1:54 PM Sreeram R Venkat < >>>>>>> [email protected]> wrote: >>>>>>> > >>>>>>> > > I am trying to build PETSc with CUDA using the CUDA-Aware >>>>>>> MVAPICH2-GDR. >>>>>>> > > >>>>>>> > > Here is my configure command: >>>>>>> > > >>>>>>> > > ./configure PETSC_ARCH=linux-c-debug-mvapich2-gdr >>>>>>> --download-hypre >>>>>>> > > --with-cuda=true --cuda-dir=$TACC_CUDA_DIR --with-hdf5=true >>>>>>> > > --with-hdf5-dir=$TACC_PHDF5_DIR --download-elemental >>>>>>> --download-metis >>>>>>> > > --download-parmetis --with-cc=mpicc --with-cxx=mpicxx >>>>>>> --with-fc=mpif90 >>>>>>> > > >>>>>>> > > which errors with: >>>>>>> > > >>>>>>> > > UNABLE to CONFIGURE with GIVEN OPTIONS (see >>>>>>> configure.log for >>>>>>> > > details): >>>>>>> > > >>>>>>> > > >>>>>>> --------------------------------------------------------------------------------------------- >>>>>>> > > CUDA compile failed with arch flags " -ccbin mpic++ -std=c++14 >>>>>>> > > -Xcompiler -fPIC >>>>>>> > > -Xcompiler -fvisibility=hidden -g -lineinfo -gencode >>>>>>> > > arch=compute_80,code=sm_80" >>>>>>> > > generated from "--with-cuda-arch=80" >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> > > The same configure command works when I use the Intel MPI and I >>>>>>> can build >>>>>>> > > with CUDA. The full config.log file is attached. Please let me >>>>>>> know if you >>>>>>> > > need any other information. I appreciate your help with this. >>>>>>> > > >>>>>>> > >>>>>>> > The proximate error is >>>>>>> > >>>>>>> > Executing: nvcc -c -o >>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/conftest.o >>>>>>> > -I/tmp/petsc-kn3f29gl/config.setCompilers >>>>>>> > -I/tmp/petsc-kn3f29gl/config.types >>>>>>> > -I/tmp/petsc-kn3f29gl/config.packages.cuda -ccbin mpic++ >>>>>>> -std=c++14 >>>>>>> > -Xcompiler -fPIC -Xcompiler -fvisibility=hidden -g -lineinfo >>>>>>> -gencode >>>>>>> > arch=compute_80,code=sm_80 >>>>>>> /tmp/petsc-kn3f29gl/config.packages.cuda/ >>>>>>> > conftest.cu >>>>>>> <https://urldefense.us/v3/__http://conftest.cu__;!!G_uCfscf7eWS!duKUz7pE9N0adJ-FOW7PLZ_1cSZvYlnqh7J0TIcZN0v8RLplcWxh1YE8Vis29K0cuw_zAvjdK-H9H2JYYuUUKRXxlA$> >>>>>>> > stdout: >>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>> than one >>>>>>> > instance of overloaded function >>>>>>> "__nv_associate_access_property_impl" has >>>>>>> > "C" linkage >>>>>>> > 1 error detected in the compilation of >>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda/conftest.cu >>>>>>> <https://urldefense.us/v3/__http://conftest.cu__;!!G_uCfscf7eWS!duKUz7pE9N0adJ-FOW7PLZ_1cSZvYlnqh7J0TIcZN0v8RLplcWxh1YE8Vis29K0cuw_zAvjdK-H9H2JYYuUUKRXxlA$> >>>>>>> ". >>>>>>> > Possible ERROR while running compiler: exit code 1 >>>>>>> > stderr: >>>>>>> > /opt/apps/cuda/11.4/include/crt/sm_80_rt.hpp(141): error: more >>>>>>> than one >>>>>>> > instance of overloaded function >>>>>>> "__nv_associate_access_property_impl" has >>>>>>> > "C" linkage >>>>>>> > >>>>>>> > 1 error detected in the compilation of >>>>>>> > "/tmp/petsc-kn3f29gl/config.packages.cuda >>>>>>> > >>>>>>> > This looks like screwed up headers to me, but I will let someone >>>>>>> that >>>>>>> > understands CUDA compilation reply. >>>>>>> > >>>>>>> > Thanks, >>>>>>> > >>>>>>> > Matt >>>>>>> > >>>>>>> > Thanks, >>>>>>> > > Sreeram >>>>>>> > > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> >>>>>>
