Do you have something like valgrind on your machine? If so, then why not launch
your apps under valgrind - eg., "mpirun valgrind my_app"?
If your app is segfaulting, there isn't much OMPI can do to tell you why. All
we can do is tell you that your app was hit with a SIGTERM.
Did you talk t
Hi David,
Unfortunately there is no information about error in the rsl.out.*,
rsl.error and wrf.out files. The error message mentioned in the previous
email appeared in the wrf.err file. Both rsl.out and rsl.error shows
stopping of integration at the time of crash and that is it. I am just
wonderin
That error is from one of the processes that was working when another
one died. It is not an indication that MPI had problems, but that you
had one of the wrf processes (#45) crash. You need to look at what
happened to process 45. What do the rsl.out and rsl.error files for #45
say?
On 08/04/
If you want to debug this on BGP, you could set BG_COREDUMPONERROR=1
and look at the backtrace in the light weight core files
(you probably need to recompile everything with -g).
A.Chan
- Original Message -
> Hi Dmitry,
> Thanks for a prompt and fairly detailed response. I have also
> fo
Signal 15 is usually SIGTERM on Linux, meaning that some external entity
probably killed the job.
The OMPI error message you describe is also typical for that kind of scenario
-- i.e., a process exited without calling MPI_Finalize could mean that it
called exit() or some external process killed
Hi Dmitry,
Thanks for a prompt and fairly detailed response. I have also forwarded
the email to wrf community in the hope that somebody would have some
straight forward solution. I will try to debug the error as suggested by
you if I would not have much luck from the wrf forum.
Cheers,
---
Basit
BasitAli,
Signal 15 apparently means one of the WRF's MPI processes has been
unexpectedly terminated, maybe by program decision. No matter, if it
is OpenMPI-specific or not, issue needs to be tracked somehow to get
more details about it. Ideally, best thing is to get debugger attached
once the pro
I am trying to run a rather heavy wrf simulation with spectral nudging but the
simulation crashes after 1.8 minutes of integration.
The simulation has two domainswith d01 = 601x601 and d02 = 721x721 and 51
vertical levels. I tried this simulation on two different systems but result
was mor