Durga,

you might wanna try to restore the signal handler for other signals as well
(SIGSEGV, SIGBUS, ...)
ompi_info --all | grep opal_signal
does list the signal you should restore the handler


only one backtrace component is built (out of several candidates :
execinfo, none, printstack)
nm -l libopen-pal.so | grep backtrace
will hint you which component was built

your two similar distros might have different backtrace component



Gus,

btr is a plain text file with a back trace "ala" gdb



Nathan,

i did a 'grep btr' and could not find anything :-(
opal_backtrace_buffer and opal_backtrace_print are only used with stderr.
so i am puzzled who creates the tracefile name and where ...
also, no stack is printed by default unless opal_abort_print_stack is true

Cheers,

Gilles


On Wed, May 11, 2016 at 3:43 PM, dpchoudh . <dpcho...@gmail.com> wrote:
> Hello Nathan
>
> Thank you for your response. Could you please be more specific? Adding the
> following after MPI_Init() does not seem to make a difference.
>
>     MPI_Init(&argc, &argv);
>   signal(SIGABRT, SIG_DFL);
>   signal(SIGTERM, SIG_DFL);
>
> I also find it puzzling that nearly identical OMPI distro running on a
> different machine shows different behaviour.
>
> Best regards
> Durga
>
> The surgeon general advises you to eat right, exercise regularly and quit
> ageing.
>
> On Tue, May 10, 2016 at 10:02 AM, Hjelm, Nathan Thomas <hje...@lanl.gov>
> wrote:
>>
>> btr files are indeed created by open mpi's backtrace mechanism. I think we
>> should revisit it at some point but for now the only effective way i have
>> found to prevent it is to restore the default signal handlers after
>> MPI_Init.
>>
>> Excuse the quoting style. Good sucks.
>>
>>
>> ________________________________________
>> From: users on behalf of dpchoudh .
>> Sent: Monday, May 09, 2016 2:59:37 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] No core dump in some cases
>>
>> Hi Gus
>>
>> Thanks for your suggestion. But I am not using any resource manager (i.e.
>> I am launching mpirun from the bash shell.). In fact, both of the two
>> clusters I talked about run CentOS 7 and I launch the job the same way on
>> both of these, yet one of them creates standard core files and the other
>> creates the 'btr; files. Strange thing is, I could not find anything on the
>> .btr (= Backtrace?) files on Google, which is any I asked on this forum.
>>
>> Best regards
>> Durga
>>
>> The surgeon general advises you to eat right, exercise regularly and quit
>> ageing.
>>
>> On Mon, May 9, 2016 at 12:04 PM, Gus Correa
>> <g...@ldeo.columbia.edu<mailto:g...@ldeo.columbia.edu>> wrote:
>> Hi Durga
>>
>> Just in case ...
>> If you're using a resource manager to start the jobs (Torque, etc),
>> you need to have them set the limits (for coredump size, stacksize, locked
>> memory size, etc).
>> This way the jobs will inherit the limits from the
>> resource manager daemon.
>> On Torque (which I use) I do this on the pbs_mom daemon
>> init script (I am still before the systemd era, that lovely POS).
>> And set the hard/soft limits on /etc/security/limits.conf as well.
>>
>> I hope this helps,
>> Gus Correa
>>
>> On 05/07/2016 12:27 PM, Jeff Squyres (jsquyres) wrote:
>> I'm afraid I don't know what a .btr file is -- that is not something that
>> is controlled by Open MPI.
>>
>> You might want to look into your OS settings to see if it has some kind of
>> alternate corefile mechanism...?
>>
>>
>> On May 6, 2016, at 8:58 PM, dpchoudh .
>> <dpcho...@gmail.com<mailto:dpcho...@gmail.com>> wrote:
>>
>> Hello all
>>
>> I run MPI jobs (for test purpose only) on two different 'clusters'. Both
>> 'clusters' have two nodes only, connected back-to-back. The two are very
>> similar, but not identical, both software and hardware wise.
>>
>> Both have ulimit -c set to unlimited. However, only one of the two creates
>> core files when an MPI job crashes. The other creates a text file named
>> something like
>>
>> <program_name_that_crashed>.80s-<a-number-that-looks-like-a-PID>,<hostname-where-the-crash-happened>.btr
>>
>> I'd much prefer a core file because that allows me to debug with a lot
>> more options than a static text file with addresses. How do I get a core
>> file in all situations? I am using MPI source from the master branch.
>>
>> Thanks in advance
>> Durga
>>
>> The surgeon general advises you to eat right, exercise regularly and quit
>> ageing.
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org<mailto:us...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/05/29124.php
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org<mailto:us...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/05/29141.php
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/05/29154.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/05/29169.php

Reply via email to