Durga, you might wanna try to restore the signal handler for other signals as well (SIGSEGV, SIGBUS, ...) ompi_info --all | grep opal_signal does list the signal you should restore the handler
only one backtrace component is built (out of several candidates : execinfo, none, printstack) nm -l libopen-pal.so | grep backtrace will hint you which component was built your two similar distros might have different backtrace component Gus, btr is a plain text file with a back trace "ala" gdb Nathan, i did a 'grep btr' and could not find anything :-( opal_backtrace_buffer and opal_backtrace_print are only used with stderr. so i am puzzled who creates the tracefile name and where ... also, no stack is printed by default unless opal_abort_print_stack is true Cheers, Gilles On Wed, May 11, 2016 at 3:43 PM, dpchoudh . <dpcho...@gmail.com> wrote: > Hello Nathan > > Thank you for your response. Could you please be more specific? Adding the > following after MPI_Init() does not seem to make a difference. > > MPI_Init(&argc, &argv); > signal(SIGABRT, SIG_DFL); > signal(SIGTERM, SIG_DFL); > > I also find it puzzling that nearly identical OMPI distro running on a > different machine shows different behaviour. > > Best regards > Durga > > The surgeon general advises you to eat right, exercise regularly and quit > ageing. > > On Tue, May 10, 2016 at 10:02 AM, Hjelm, Nathan Thomas <hje...@lanl.gov> > wrote: >> >> btr files are indeed created by open mpi's backtrace mechanism. I think we >> should revisit it at some point but for now the only effective way i have >> found to prevent it is to restore the default signal handlers after >> MPI_Init. >> >> Excuse the quoting style. Good sucks. >> >> >> ________________________________________ >> From: users on behalf of dpchoudh . >> Sent: Monday, May 09, 2016 2:59:37 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] No core dump in some cases >> >> Hi Gus >> >> Thanks for your suggestion. But I am not using any resource manager (i.e. >> I am launching mpirun from the bash shell.). In fact, both of the two >> clusters I talked about run CentOS 7 and I launch the job the same way on >> both of these, yet one of them creates standard core files and the other >> creates the 'btr; files. Strange thing is, I could not find anything on the >> .btr (= Backtrace?) files on Google, which is any I asked on this forum. >> >> Best regards >> Durga >> >> The surgeon general advises you to eat right, exercise regularly and quit >> ageing. >> >> On Mon, May 9, 2016 at 12:04 PM, Gus Correa >> <g...@ldeo.columbia.edu<mailto:g...@ldeo.columbia.edu>> wrote: >> Hi Durga >> >> Just in case ... >> If you're using a resource manager to start the jobs (Torque, etc), >> you need to have them set the limits (for coredump size, stacksize, locked >> memory size, etc). >> This way the jobs will inherit the limits from the >> resource manager daemon. >> On Torque (which I use) I do this on the pbs_mom daemon >> init script (I am still before the systemd era, that lovely POS). >> And set the hard/soft limits on /etc/security/limits.conf as well. >> >> I hope this helps, >> Gus Correa >> >> On 05/07/2016 12:27 PM, Jeff Squyres (jsquyres) wrote: >> I'm afraid I don't know what a .btr file is -- that is not something that >> is controlled by Open MPI. >> >> You might want to look into your OS settings to see if it has some kind of >> alternate corefile mechanism...? >> >> >> On May 6, 2016, at 8:58 PM, dpchoudh . >> <dpcho...@gmail.com<mailto:dpcho...@gmail.com>> wrote: >> >> Hello all >> >> I run MPI jobs (for test purpose only) on two different 'clusters'. Both >> 'clusters' have two nodes only, connected back-to-back. The two are very >> similar, but not identical, both software and hardware wise. >> >> Both have ulimit -c set to unlimited. However, only one of the two creates >> core files when an MPI job crashes. The other creates a text file named >> something like >> >> <program_name_that_crashed>.80s-<a-number-that-looks-like-a-PID>,<hostname-where-the-crash-happened>.btr >> >> I'd much prefer a core file because that allows me to debug with a lot >> more options than a static text file with addresses. How do I get a core >> file in all situations? I am using MPI source from the master branch. >> >> Thanks in advance >> Durga >> >> The surgeon general advises you to eat right, exercise regularly and quit >> ageing. >> _______________________________________________ >> users mailing list >> us...@open-mpi.org<mailto:us...@open-mpi.org> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/05/29124.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org<mailto:us...@open-mpi.org> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/05/29141.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/05/29154.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29169.php