btr files are indeed created by open mpi's backtrace mechanism. I think we should revisit it at some point but for now the only effective way i have found to prevent it is to restore the default signal handlers after MPI_Init.
Excuse the quoting style. Good sucks. ________________________________________ From: users on behalf of dpchoudh . Sent: Monday, May 09, 2016 2:59:37 PM To: Open MPI Users Subject: Re: [OMPI users] No core dump in some cases Hi Gus Thanks for your suggestion. But I am not using any resource manager (i.e. I am launching mpirun from the bash shell.). In fact, both of the two clusters I talked about run CentOS 7 and I launch the job the same way on both of these, yet one of them creates standard core files and the other creates the 'btr; files. Strange thing is, I could not find anything on the .btr (= Backtrace?) files on Google, which is any I asked on this forum. Best regards Durga The surgeon general advises you to eat right, exercise regularly and quit ageing. On Mon, May 9, 2016 at 12:04 PM, Gus Correa <g...@ldeo.columbia.edu<mailto:g...@ldeo.columbia.edu>> wrote: Hi Durga Just in case ... If you're using a resource manager to start the jobs (Torque, etc), you need to have them set the limits (for coredump size, stacksize, locked memory size, etc). This way the jobs will inherit the limits from the resource manager daemon. On Torque (which I use) I do this on the pbs_mom daemon init script (I am still before the systemd era, that lovely POS). And set the hard/soft limits on /etc/security/limits.conf as well. I hope this helps, Gus Correa On 05/07/2016 12:27 PM, Jeff Squyres (jsquyres) wrote: I'm afraid I don't know what a .btr file is -- that is not something that is controlled by Open MPI. You might want to look into your OS settings to see if it has some kind of alternate corefile mechanism...? On May 6, 2016, at 8:58 PM, dpchoudh . <dpcho...@gmail.com<mailto:dpcho...@gmail.com>> wrote: Hello all I run MPI jobs (for test purpose only) on two different 'clusters'. Both 'clusters' have two nodes only, connected back-to-back. The two are very similar, but not identical, both software and hardware wise. Both have ulimit -c set to unlimited. However, only one of the two creates core files when an MPI job crashes. The other creates a text file named something like <program_name_that_crashed>.80s-<a-number-that-looks-like-a-PID>,<hostname-where-the-crash-happened>.btr I'd much prefer a core file because that allows me to debug with a lot more options than a static text file with addresses. How do I get a core file in all situations? I am using MPI source from the master branch. Thanks in advance Durga The surgeon general advises you to eat right, exercise regularly and quit ageing. _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29124.php _______________________________________________ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2016/05/29141.php