Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-06 Thread Ralph Castain
Do you have something like valgrind on your machine? If so, then why not launch your apps under valgrind - eg., "mpirun valgrind my_app"? If your app is segfaulting, there isn't much OMPI can do to tell you why. All we can do is tell you that your app was hit with a SIGTERM. Did you talk t

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-06 Thread BasitAli Khan
Hi David, Unfortunately there is no information about error in the rsl.out.*, rsl.error and wrf.out files. The error message mentioned in the previous email appeared in the wrf.err file. Both rsl.out and rsl.error shows stopping of integration at the time of crash and that is it. I am just wonderin

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-05 Thread David Warren
That error is from one of the processes that was working when another one died. It is not an indication that MPI had problems, but that you had one of the wrf processes (#45) crash. You need to look at what happened to process 45. What do the rsl.out and rsl.error files for #45 say? On 08/04/

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-04 Thread Anthony Chan
If you want to debug this on BGP, you could set BG_COREDUMPONERROR=1 and look at the backtrace in the light weight core files (you probably need to recompile everything with -g). A.Chan - Original Message - > Hi Dmitry, > Thanks for a prompt and fairly detailed response. I have also > fo

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-04 Thread Jeff Squyres
Signal 15 is usually SIGTERM on Linux, meaning that some external entity probably killed the job. The OMPI error message you describe is also typical for that kind of scenario -- i.e., a process exited without calling MPI_Finalize could mean that it called exit() or some external process killed

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-03 Thread BasitAli Khan
Hi Dmitry, Thanks for a prompt and fairly detailed response. I have also forwarded the email to wrf community in the hope that somebody would have some straight forward solution. I will try to debug the error as suggested by you if I would not have much luck from the wrf forum. Cheers, --- Basit

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-03 Thread Dmitry N. Mikushin
BasitAli, Signal 15 apparently means one of the WRF's MPI processes has been unexpectedly terminated, maybe by program decision. No matter, if it is OpenMPI-specific or not, issue needs to be tracked somehow to get more details about it. Ideally, best thing is to get debugger attached once the pro

[OMPI users] OpenMPI causing WRF to crash

2011-08-03 Thread BasitAli Khan
I am trying to run a rather heavy wrf simulation with spectral nudging but the simulation crashes after 1.8 minutes of integration. The simulation has two domainswith d01 = 601x601 and d02 = 721x721 and 51 vertical levels. I tried this simulation on two different systems but result was mor