Hi Durga

Just in case ...
If you're using a resource manager to start the jobs (Torque, etc),
you need to have them set the limits (for coredump size, stacksize, locked memory size, etc).
This way the jobs will inherit the limits from the
resource manager daemon.
On Torque (which I use) I do this on the pbs_mom daemon
init script (I am still before the systemd era, that lovely POS).
And set the hard/soft limits on /etc/security/limits.conf as well.

I hope this helps,
Gus Correa

On 05/07/2016 12:27 PM, Jeff Squyres (jsquyres) wrote:
I'm afraid I don't know what a .btr file is -- that is not something that is 
controlled by Open MPI.

You might want to look into your OS settings to see if it has some kind of 
alternate corefile mechanism...?


On May 6, 2016, at 8:58 PM, dpchoudh . <dpcho...@gmail.com> wrote:

Hello all

I run MPI jobs (for test purpose only) on two different 'clusters'. Both 
'clusters' have two nodes only, connected back-to-back. The two are very 
similar, but not identical, both software and hardware wise.

Both have ulimit -c set to unlimited. However, only one of the two creates core 
files when an MPI job crashes. The other creates a text file named something 
like
<program_name_that_crashed>.80s-<a-number-that-looks-like-a-PID>,<hostname-where-the-crash-happened>.btr

I'd much prefer a core file because that allows me to debug with a lot more 
options than a static text file with addresses. How do I get a core file in all 
situations? I am using MPI source from the master branch.

Thanks in advance
Durga

The surgeon general advises you to eat right, exercise regularly and quit 
ageing.
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29124.php



Reply via email to