Hello Nathan Thank you for your response. Could you please be more specific? Adding the following after MPI_Init() does not seem to make a difference.
MPI_Init(&argc, &argv); * signal(SIGABRT, SIG_DFL); signal(SIGTERM, SIG_DFL);* I also find it puzzling that nearly identical OMPI distro running on a different machine shows different behaviour. Best regards Durga The surgeon general advises you to eat right, exercise regularly and quit ageing. On Tue, May 10, 2016 at 10:02 AM, Hjelm, Nathan Thomas <hje...@lanl.gov> wrote: > btr files are indeed created by open mpi's backtrace mechanism. I think we > should revisit it at some point but for now the only effective way i have > found to prevent it is to restore the default signal handlers after > MPI_Init. > > Excuse the quoting style. Good sucks. > > > ________________________________________ > From: users on behalf of dpchoudh . > Sent: Monday, May 09, 2016 2:59:37 PM > To: Open MPI Users > Subject: Re: [OMPI users] No core dump in some cases > > Hi Gus > > Thanks for your suggestion. But I am not using any resource manager (i.e. > I am launching mpirun from the bash shell.). In fact, both of the two > clusters I talked about run CentOS 7 and I launch the job the same way on > both of these, yet one of them creates standard core files and the other > creates the 'btr; files. Strange thing is, I could not find anything on the > .btr (= Backtrace?) files on Google, which is any I asked on this forum. > > Best regards > Durga > > The surgeon general advises you to eat right, exercise regularly and quit > ageing. > > On Mon, May 9, 2016 at 12:04 PM, Gus Correa <g...@ldeo.columbia.edu<mailto: > g...@ldeo.columbia.edu>> wrote: > Hi Durga > > Just in case ... > If you're using a resource manager to start the jobs (Torque, etc), > you need to have them set the limits (for coredump size, stacksize, locked > memory size, etc). > This way the jobs will inherit the limits from the > resource manager daemon. > On Torque (which I use) I do this on the pbs_mom daemon > init script (I am still before the systemd era, that lovely POS). > And set the hard/soft limits on /etc/security/limits.conf as well. > > I hope this helps, > Gus Correa > > On 05/07/2016 12:27 PM, Jeff Squyres (jsquyres) wrote: > I'm afraid I don't know what a .btr file is -- that is not something that > is controlled by Open MPI. > > You might want to look into your OS settings to see if it has some kind of > alternate corefile mechanism...? > > > On May 6, 2016, at 8:58 PM, dpchoudh . <dpcho...@gmail.com<mailto: > dpcho...@gmail.com>> wrote: > > Hello all > > I run MPI jobs (for test purpose only) on two different 'clusters'. Both > 'clusters' have two nodes only, connected back-to-back. The two are very > similar, but not identical, both software and hardware wise. > > Both have ulimit -c set to unlimited. However, only one of the two creates > core files when an MPI job crashes. The other creates a text file named > something like > > <program_name_that_crashed>.80s-<a-number-that-looks-like-a-PID>,<hostname-where-the-crash-happened>.btr > > I'd much prefer a core file because that allows me to debug with a lot > more options than a static text file with addresses. How do I get a core > file in all situations? I am using MPI source from the master branch. > > Thanks in advance > Durga > > The surgeon general advises you to eat right, exercise regularly and quit > ageing. > _______________________________________________ > users mailing list > us...@open-mpi.org<mailto:us...@open-mpi.org> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29124.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org<mailto:us...@open-mpi.org> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29141.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/05/29154.php >