Well, Nans are a clear sign that something is very wrong. On Tue, Apr 26, 2022 at 11:52 AM Jacob Faibussowitsch <[email protected]> wrote:
> There is an automatic warning that shows when you do run with > `-log_view_gpu_time`, but perhaps there should also be an automatic warning > when *not* running with it. It is unfortunate that NaN is the value printed > as this implies a bug but AFAIK it is unavoidable (Barry can say more on > this though). > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > > > On Apr 26, 2022, at 09:48, Jose E. Roman <[email protected]> wrote: > > > > You have to add -log_view_gpu_time > > See https://gitlab.com/petsc/petsc/-/merge_requests/5056 > > > > Jose > > > > > >> El 26 abr 2022, a las 16:39, Mark Adams <[email protected]> escribió: > >> > >> I'm seeing this on Perlmutter with Kokkos-CUDA. Nans in most log timing > data except the two 'Solve' lines. > >> Just cg/jacobi on snes/ex56. > >> > >> Any ideas? > >> > >> VecTDot 2 1.0 nan nan 1.20e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 > 0.00e+00 100 > >> VecNorm 2 1.0 nan nan 1.00e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 > 0.00e+00 100 > >> VecCopy 2 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 > 0.00e+00 0 > >> VecSet 5 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 > 0.00e+00 0 > >> VecAXPY 4 1.0 nan nan 2.40e+01 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 1 0 0 0 0 -nan -nan 0 0.00e+00 0 > 0.00e+00 100 > >> VecPointwiseMult 1 1.0 nan nan 3.00e+00 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 > 0.00e+00 100 > >> KSPSetUp 1 1.0 nan nan 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 -nan -nan 0 0.00e+00 0 > 0.00e+00 0 > >> KSPSolve 1 1.0 4.0514e-04 1.0 5.50e+01 1.0 0.0e+00 > 0.0e+00 0.0e+00 1 0 0 0 0 2 0 0 0 0 0 -nan 0 > 0.00e+00 0 0.00e+00 100 > >> SNESSolve 1 1.0 2.2128e-02 1.0 5.55e+05 1.0 0.0e+00 > 0.0e+00 0.0e+00 72 56 0 0 0 100100 0 0 0 25 -nan 0 > 0.00e+00 0 0.00e+00 0 > > > >
