Just an update. This is not deterministic. I am thinking that Permutters MPI might be an issue.
On Sat, Oct 16, 2021 at 4:47 PM Mark Adams <[email protected]> wrote: > I am running snes/ex13 on Perlmutter and doing a scaling study in a > script. The first case runs fine: > > + srun -G 32 -n 256 --cpu-bind=cores --ntasks-per-core=1 > /global/homes/m/madams/mps-wrapper.sh ../ex13 -dm_plex_box_faces 4,8,8 > -petscpartitioner_simple_node_grid 2,2,2 *-dm_refine 3 *-dm_mat_type > aijcusparse -dm_vec_type cuda -dm_view -ksp_max_it 15 -log_view > > but the next case *with another levels of refinement*: > > + srun -G 32 -n 256 --cpu-bind=cores --ntasks-per-core=1 > /global/homes/m/madams/mps-wrapper.sh ../ex13 -dm_plex_box_faces 4,8,8 > -petscpartitioner_simple_node_grid 2,2,2 *-dm_refine 4 *-dm_mat_type > aijcusparse -dm_vec_type cuda -dm_view -ksp_max_it 15 -log_ > > hangs in BuildTwoSided. With log_trace I see this (grepping on 177). Args > appended. > > Any ideas? > Thanks > > > [177] 21.4441 Event begin: MatGetBrAoCol > [177] 21.4441 Event begin: SFSetUp > [177] 21.4441 Event begin: BuildTwoSided > [177] 21.5443 Event end: BuildTwoSided > [177] 21.5443 Event end: SFSetUp > [177] 21.5443 Event begin: MatAssemblyBegin > [177] 21.5444 Event end: MatAssemblyBegin > [177] 21.5444 Event begin: MatAssemblyEnd > [177] 21.5444 Event end: MatAssemblyEnd > [177] 21.5444 Event end: MatGetBrAoCol > [177] 21.5444 Event begin: MatGetLocalMat > [177] 21.5569 Event begin: MatCUSPARSCopyTo > [177] 21.557 Event end: MatCUSPARSCopyTo > [177] 21.557 Event begin: MatCUSPARSCopyTo > [177] 21.5571 Event end: MatCUSPARSCopyTo > [177] 21.5571 Event end: MatGetLocalMat > [177] 21.5571 Event begin: MatCUSPARSCopyTo > [177] 21.5571 Event end: MatCUSPARSCopyTo > [177] 21.5698 Event begin: MatCUSPARSCopyTo > [177] 21.5698 Event end: MatCUSPARSCopyTo > [177] 21.5827 Event begin: MatConvert > [177] 21.5954 Event end: MatConvert > [177] 21.5954 Event begin: MatCUSPARSCopyTo > [177] 21.5954 Event end: MatCUSPARSCopyTo > [177] 21.5954 Event begin: MatCUSPARSCopyTo > [177] 21.5955 Event end: MatCUSPARSCopyTo > [177] 21.6208 Event begin: SFSetGraph > [177] 21.6208 Event end: SFSetGraph > [177] 21.6208 Event begin: SFSetUp > [177] 21.6208 Event begin: BuildTwoSided > > > > #PETSc Option Table entries: > -benchmark_it 10 > -dm_distribute > -dm_mat_type aijcusparse > -dm_plex_box_faces 4,8,8 > -dm_plex_box_lower 0,0,0 > -dm_plex_box_upper 2,4,4 > -dm_plex_dim 3 > -dm_plex_simplex 0 > -dm_refine 3 > -dm_vec_type cuda > -dm_view > -ksp_max_it 15 > -ksp_monitor_short > -ksp_norm_type unpreconditioned > -ksp_rtol 1.e-12 > -ksp_type cg > -log_view > -matptap_via scalable > -mg_levels_esteig_ksp_max_it 5 > -mg_levels_esteig_ksp_type cg > -mg_levels_ksp_max_it 2 > -mg_levels_ksp_type richardson > -mg_levels_pc_type jacobi > -options_left > -pc_gamg_coarse_eq_limit 100 > -pc_gamg_coarse_grid_layout_type spread > -pc_gamg_esteig_ksp_max_it 5 > -pc_gamg_esteig_ksp_type cg > -pc_gamg_process_eq_limit 100 > -pc_gamg_repartition false > -pc_gamg_reuse_interpolation true > -pc_gamg_square_graph 1 > -pc_gamg_threshold 0.01 > -pc_gamg_threshold_scale .5 > -pc_type gamg > -petscpartitioner_simple_node_grid 2,2,2 > -petscpartitioner_simple_process_grid 2,4,4 > -petscpartitioner_type simple > -potential_petscspace_degree 2 > -snes_max_it 1 > -snes_rtol 1.e-8 > -snes_type ksponly > -use_gpu_aware_mpi 0 > #End of PETSc Option Table entries > Compiled without FORTRAN kernels > Compiled with full precision matrices (default) > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 > sizeof(PetscScalar) 8 sizeof(PetscInt) 4 > Configure options: --CFLAGS=" -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 > -DLANDAU_MAX_Q=4" --CXXFLAGS=" -g -DLANDAU_DIM=2 -DLANDAU_MAX_SPECIES=10 > -DLANDAU_MAX_Q=4" --CUDAFLAGS="-g -Xcompiler -rdynamic -DLANDAU_DIM=2 > -DLANDAU_MAX_SPECIES=10 -DLANDAU_MAX_ > Q=4" --with-cc=cc --with-cxx=CC --with-fc=ftn > --with-cudac=/global/common/software/nersc/cos1.3/cuda/11.3.0/bin/nvcc > --FFLAGS=" -g " --COPTFLAGS=" -O" --CXXOPTFLAGS=" -O" --FOPTFLAGS=" > -O" --with-debugging=0 --with-cuda=1 --with-cuda-arch=80 --with > -mpiexec=srun --with-batch=0 --download-p4est=1 --with-zlib=1 > PETSC_ARCH=arch-perlmutter-opt-nvidia-cuda > ----------------------------------------- > Libraries compiled on 2021-10-16 18:33:45 on login02 > >
