On Fri, Jan 7, 2022 at 11:17 AM Mark Adams <[email protected]> wrote: > These are cuda/cusparse tests. The Kokkos versions are fine and cusparse > w/o a Kokkos build is fine. > > I do have some #ifdefs in the code. Maybe something snuck into the #ifdef > KOKKOS, but I can't imagine what that could even be. > > I have had problems with very large "cuda" jobs (on Summit with 21 MPI > processes per GPU) running out of "resources" with a Kokkos build, that > went away with a pure CUDA build (ie, w/o Kokkos), but these are tiny tests. > If Kokkos is initialized on MPI ranks, then each rank will consume resources on GPU.
> > I will try it again. > > Thanks, > > > On Fri, Jan 7, 2022 at 12:06 PM Junchao Zhang <[email protected]> > wrote: > >> It failed when you did not even pass any vec/mat kokkos options? It does >> not make sense and you need to double check that. >> --Junchao Zhang >> >> >> On Thu, Jan 6, 2022 at 9:33 PM Mark Adams <[email protected]> wrote: >> >>> I seem to have a regression with using aijcusprase in a kokkos build. >>> It's OK with a straight CUDA build. >>> >>> # [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> # [0]PETSC ERROR: GPU error >>> # [0]PETSC ERROR: cuBLAS error 13 (CUBLAS_STATUS_EXECUTION_FAILED) >>> # [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble >>> shooting. >>> # [0]PETSC ERROR: Petsc Development GIT revision: >>> v3.16.3-511-g96172674f3 GIT Date: 2022-01-06 23:44:32 +0000 >>> # [0]PETSC ERROR: >>> /global/u2/m/madams/petsc_install/petsc/arch-perlmutter-opt-gcc-kokkos-cuda/tests/ts/utils/dmplexlandau/tutorials/runex1_cuda/../ex1 >>> on a arch-perlmutter-opt-gcc-kokkos-cuda named nid003188 by madams Thu Jan >>> 6 19:29:06 2022 >>> # [0]PETSC ERROR: Configure options --CFLAGS=" -g -DLANDAU_DIM=2 >>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 >>> -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler >>> -rdynamic -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_Q=4" >>> --with-cc=cc --with-cxx=CC --with-fc=ftn --LDFLAGS=-lmpifort_gnu_91 >>> --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc >>> --COPTFLAGS=" -O3" --CXXOPTFLAGS=" -O3" --FOPTFLAGS=" -O3" >>> --with-debugging=0 --download-metis --download-parmetis --with-cuda=1 >>> --with-cuda-arch=80 --with-mpiexec=srun --with-batch=0 --download-p4est=1 >>> --with-zlib=1 --download-kokkos --download-kokkos-kernels >>> --with-kokkos-kernels-tpl=0 --with-make-np=8 >>> PETSC_DIR=/global/homes/m/madams/petsc_install/petsc >>> PETSC_ARCH=arch-perlmutter-opt-gcc-kokkos-cuda >>> # [0]PETSC ERROR: #1 VecNorm_SeqCUDA() at >>> /global/u2/m/madams/petsc_install/petsc/src/vec/vec/impls/seq/seqcuda/ >>> veccuda2.cu:994 >>> # [0]PETSC ERROR: #2 VecNorm() at >>> /global/u2/m/madams/petsc_install/petsc/src/vec/vec/interface/rvector.c:228 >>> # [0]PETSC ERROR: #3 SNESSolve_NEWTONLS() at >>> /global/u2/m/madams/petsc_install/petsc/src/snes/impls/ls/ls.c:179 >>> # [0]PETSC ERROR: #4 SNESSolve() at >>> /global/u2/m/madams/petsc_install/petsc/src/snes/interface/snes.c:4810 >>> # [0]PETSC ERROR: #5 TSStep_ARKIMEX() at >>> /global/u2/m/madams/petsc_install/petsc/src/ts/impls/arkimex/arkimex.c:845 >>> # [0]PETSC ERROR: #6 TSStep() at >>> /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3572 >>> # [0]PETSC ERROR: #7 TSSolve() at >>> /global/u2/m/madams/petsc_install/petsc/src/ts/interface/ts.c:3971 >>> # [0]PETSC ERROR: #8 main() at >>> /global/u2/m/madams/petsc_install/petsc/src/ts/utils/dmplexlandau/tutorials/ex1.c:45 >>> # [0]PETSC ERROR: PETSc Option Table entries: >>> # [0]PETSC ERROR: -check_pointer_intensity 0 >>> # [0]PETSC ERROR: -dm_landau_amr_levels_max 2,1 >>> # [0]PETSC ERROR: -dm_landau_device_type cuda >>> # [0]PETSC ERROR: -dm_landau_ion_charges 1,18 >>> # [0]PETSC ERROR: -dm_landau_ion_masses 2,4 >>> # [0]PETSC ERROR: -dm_landau_n 1.00018,1,1e-5 >>> # [0]PETSC ERROR: -dm_landau_n_0 1e20 >>> # [0]PETSC ERROR: -dm_landau_num_species_grid 1,2 >>> # [0]PETSC ERROR: -dm_landau_thermal_temps 5,5,.5 >>> # [0]PETSC ERROR: -dm_landau_type p4est >>> # [0]PETSC ERROR: -dm_mat_type aijcusparse >>> # [0]PETSC ERROR: -dm_preallocate_only false >>> # [0]PETSC ERROR: -dm_vec_type cuda >>> # [0]PETSC ERROR: -error_output_stdout >>> # [0]PETSC ERROR: -ksp_type preonly >>> # [0]PETSC ERROR: -malloc_dump >>> # [0]PETSC ERROR: -mat_cusparse_use_cpu_solve >>> # [0]PETSC ERROR: -nox >>> # [0]PETSC ERROR: -nox_warning >>> # [0]PETSC ERROR: -pc_type lu >>> # [0]PETSC ERROR: -petscspace_degree 3 >>> # [0]PETSC ERROR: -petscspace_poly_tensor 1 >>> # [0]PETSC ERROR: -snes_converged_reason >>> # [0]PETSC ERROR: -snes_monitor >>> # [0]PETSC ERROR: -snes_rtol 1.e-14 >>> # [0]PETSC ERROR: -snes_stol 1.e-14 >>> # [0]PETSC ERROR: -ts_adapt_clip .5,1.25 >>> # [0]PETSC ERROR: -ts_adapt_scale_solve_failed 0.75 >>> # [0]PETSC ERROR: -ts_adapt_time_step_increase_delay 5 >>> # [0]PETSC ERROR: -ts_arkimex_type 1bee >>> # [0]PETSC ERROR: -ts_dt 1.e-1 >>> # [0]PETSC ERROR: -ts_max_snes_failures -1 >>> # [0]PETSC ERROR: -ts_max_steps 1 >>> # [0]PETSC ERROR: -ts_max_time 1 >>> # [0]PETSC ERROR: -ts_monitor >>> # [0]PETSC ERROR: -ts_rtol 1e-1 >>> # [0]PETSC ERROR: -ts_type arkimex >>> # [0]PETSC ERROR: -use_gpu_aware_mpi 0 >>> # [0]PETSC ERROR: ----------------End of Error Message -------send >>> entire error message to [email protected] >>> >>
