Note the psm library sets its own signal handler, possibly after the OpenMPI one.

that can be disabled by

|export IPATH_NO_BACKTRACE=1 Cheers, Gilles |


On 5/12/2016 11:34 AM, dpchoudh . wrote:
Hello Gilles

Thank you for your continued support. With your help, I have a better understanding of what is happening. Here are the details.

1. Yes, I am sure that ulimit -c is 'unlimited' (and for the test in question, I am running it on a single node, so there are no other nodes)

2. The command I am running is possibly the simplest MPI command:
mpirun -np 2 <program>

It looks to me, after running your test code, that what is crashing is MPI_Init() itself. The output from your code (I called it 'btrtest') is as follows:

[durga@smallMPI ~]$ mpirun -np 2 ./btrtest
before MPI_Init : -1 -1
before MPI_Init : -1 -1

btrtest:7275 terminated with signal 11 at PC=7f401f49e7d8 SP=7ffec47e7578. Backtrace:
/lib64/libc.so.6(+0x3ba7d8)[0x7f401f49e7d8]

btrtest:7274 terminated with signal 11 at PC=7f1ba21897d8 SP=7ffc516ac8d8. Backtrace:
/lib64/libc.so.6(+0x3ba7d8)[0x7f1ba21897d8]
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[7936,1],1]
  Exit code:    1
--------------------------------------------------------------------------

So obviously the code does not make it past MPI_Init()

This is an issue that I have been observing for quite a while in different forms and have reported on the forum a few times also. Let me elaborate:

Both my 'well-behaving' and crashing clusters run CentOS 7 (the crashing one has the latest updates, the well-behaving one does not as I am not allowed to apply updates on that). They both have OMPI, from the master branch, compiled from the source. Both consist of 64 bit Dell servers, although not identical models (I doubt if that matters)

The only significant difference between the two is this:

The well behaved one (if it does core dump, that is because there is a bug in the MPI app) has very simple network hardware: two different NICs (one Broadcom GbE, one proprietary NIC that is currently being exposed as an IP interface). There is no RDMA capability there at all.

The crashing one have 4 different NICs:
1. Broadcom GbE
2. Chelsio T3 based 10Gb iWARP NIC
3. QLogic 20Gb Infiniband (PSM capable)
4. LSI logic Fibre channel

In this situation, WITH ALL BUT THE GbE LINK DOWN (the GbE connects the machine to the WAN link), it seems just the presence of these NICs matter.

Here are the various commands and outputs:

[durga@smallMPI ~]$ mpirun -np 2 ./btrtest
before MPI_Init : -1 -1
before MPI_Init : -1 -1

btrtest:10032 terminated with signal 11 at PC=7f6897c197d8 SP=7ffcae2b2ef8. Backtrace:
/lib64/libc.so.6(+0x3ba7d8)[0x7f6897c197d8]

btrtest:10033 terminated with signal 11 at PC=7fb035c3e7d8 SP=7ffe61a92088. Backtrace:
/lib64/libc.so.6(+0x3ba7d8)[0x7fb035c3e7d8]
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9294,1],0]
  Exit code:    1
--------------------------------------------------------------------------

[durga@smallMPI ~]$ mpirun -np 2 -mca pml ob1 ./btrtest
before MPI_Init : -1 -1
before MPI_Init : -1 -1

btrtest:10076 terminated with signal 11 at PC=7fa92d20b7d8 SP=7ffebb106028. Backtrace:
/lib64/libc.so.6(+0x3ba7d8)[0x7fa92d20b7d8]

btrtest:10077 terminated with signal 11 at PC=7f5012fa57d8 SP=7ffea4f4fdf8. Backtrace:
/lib64/libc.so.6(+0x3ba7d8)[0x7f5012fa57d8]
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9266,1],0]
  Exit code:    1
--------------------------------------------------------------------------

[durga@smallMPI ~]$ mpirun -np 2 -mca pml ob1 -mca btl self,sm ./btrtest
before MPI_Init : -1 -1
before MPI_Init : -1 -1

btrtest:10198 terminated with signal 11 at PC=400829 SP=7ffe6e148870. Backtrace:

btrtest:10197 terminated with signal 11 at PC=400829 SP=7ffe87be6cd0. Backtrace:
./btrtest[0x400829]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9473bbeb15]
./btrtest[0x4006d9]
./btrtest[0x400829]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fdfe2d8eb15]
./btrtest[0x4006d9]
after MPI_Init : -1 -1
after MPI_Init : -1 -1
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[9384,1],1]
  Exit code:    1
--------------------------------------------------------------------------


[durga@smallMPI ~]$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 216524
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[durga@smallMPI ~]$


I do realize that my setup is very unusual (I am a quasi-developer of MPI whereas most other folks in this list are likely end-users), but somehow just disabling this 'execinfo' MCA would allow me to make progress (and also find out why/where MPI_Init() is crashing!). Is there any way I can do that?

Thank you
Durga

The surgeon general advises you to eat right, exercise regularly and quit ageing.

On Wed, May 11, 2016 at 8:58 PM, Gilles Gouaillardet <gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:

    Are you sure ulimit -c unlimited is *really* applied on all hosts


    can you please run the simple program below and confirm that ?


    Cheers,


    Gilles


    #include <sys/time.h>
    #include <sys/resource.h>
    #include <poll.h>
    #include <stdio.h>

    int main(int argc, char *argv[]) {
        struct rlimit rlim;
        char * c = (char *)0;
        getrlimit(RLIMIT_CORE, &rlim);
        printf ("before MPI_Init : %d %d\n", rlim.rlim_cur,
    rlim.rlim_max);
        MPI_Init(&argc, &argv);
        getrlimit(RLIMIT_CORE, &rlim);
        printf ("after MPI_Init : %d %d\n", rlim.rlim_cur, rlim.rlim_max);
        *c = 0;
        MPI_Finalize();
        return 0;
    }


    On 5/12/2016 4:22 AM, dpchoudh . wrote:
    Hello Gilles

    Thank you for the advice. However, that did not seem to make any
    difference. Here is what I did (on the cluster that generates
    .btr files for core dumps):

    [durga@smallMPI git]$ ompi_info --all | grep opal_signal
               MCA opal base: parameter "opal_signal" (current value:
    "6,7,8,11", data source: default, level: 3 user/all, type: string)
    [durga@smallMPI git]$


    According to <bits/signum.h>, signals 6.7,8,11 are this:

    #define SIGABRT        6    /* Abort (ANSI).  */
    #define    SIGBUS        7    /* BUS error (4.2 BSD).  */
    #define    SIGFPE        8    /* Floating-point exception (ANSI).  */
    #define    SIGSEGV        11    /* Segmentation violation (ANSI).  */

    And thus I added the following just after MPI_Init()

    MPI_Init(&argc, &argv);
        signal(SIGABRT, SIG_DFL);
        signal(SIGBUS, SIG_DFL);
        signal(SIGFPE, SIG_DFL);
        signal(SIGSEGV, SIG_DFL);
        signal(SIGTERM, SIG_DFL);

    (I added the 'SIGTERM' part later, just in case it would make a
    difference; i didn't)

    The resulting code still generates .btr files instead of core files.

    It looks like the 'execinfo' MCA component is being used as the
    backtrace mechanism:

    [durga@smallMPI git]$ ompi_info | grep backtrace
               MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0,
    Component v3.0.0)

    However, I could not find any way to choose 'none' instead of
    'execinfo'

    And the strange thing is, on the cluster where regular core dump
    is happening, the output of
    $ ompi_info | grep backtrace
    is identical to the above. (Which kind of makes sense because
    they were created from the same source with the same configure
    options.)

    Sorry to harp on this, but without a core file it is hard to
    debug the application (e.g. examine stack variables).

    Thank you
    Durga


    The surgeon general advises you to eat right, exercise regularly
    and quit ageing.

    On Wed, May 11, 2016 at 3:37 AM, Gilles Gouaillardet
    <gilles.gouaillar...@gmail.com
    <mailto:gilles.gouaillar...@gmail.com>> wrote:

        Durga,

        you might wanna try to restore the signal handler for other
        signals as well
        (SIGSEGV, SIGBUS, ...)
        ompi_info --all | grep opal_signal
        does list the signal you should restore the handler


        only one backtrace component is built (out of several
        candidates :
        execinfo, none, printstack)
        nm -l libopen-pal.so | grep backtrace
        will hint you which component was built

        your two similar distros might have different backtrace component



        Gus,

        btr is a plain text file with a back trace "ala" gdb



        Nathan,

        i did a 'grep btr' and could not find anything :-(
        opal_backtrace_buffer and opal_backtrace_print are only used
        with stderr.
        so i am puzzled who creates the tracefile name and where ...
        also, no stack is printed by default unless
        opal_abort_print_stack is true

        Cheers,

        Gilles


        On Wed, May 11, 2016 at 3:43 PM, dpchoudh .
        <dpcho...@gmail.com <mailto:dpcho...@gmail.com>> wrote:
        > Hello Nathan
        >
        > Thank you for your response. Could you please be more
        specific? Adding the
        > following after MPI_Init() does not seem to make a difference.
        >
        >     MPI_Init(&argc, &argv);
        >   signal(SIGABRT, SIG_DFL);
        >   signal(SIGTERM, SIG_DFL);
        >
        > I also find it puzzling that nearly identical OMPI distro
        running on a
        > different machine shows different behaviour.
        >
        > Best regards
        > Durga
        >
        > The surgeon general advises you to eat right, exercise
        regularly and quit
        > ageing.
        >
        > On Tue, May 10, 2016 at 10:02 AM, Hjelm, Nathan Thomas
        <hje...@lanl.gov <mailto:hje...@lanl.gov>>
        > wrote:
        >>
        >> btr files are indeed created by open mpi's backtrace
        mechanism. I think we
        >> should revisit it at some point but for now the only
        effective way i have
        >> found to prevent it is to restore the default signal
        handlers after
        >> MPI_Init.
        >>
        >> Excuse the quoting style. Good sucks.
        >>
        >>
        >> ________________________________________
        >> From: users on behalf of dpchoudh .
        >> Sent: Monday, May 09, 2016 2:59:37 PM
        >> To: Open MPI Users
        >> Subject: Re: [OMPI users] No core dump in some cases
        >>
        >> Hi Gus
        >>
        >> Thanks for your suggestion. But I am not using any
        resource manager (i.e.
        >> I am launching mpirun from the bash shell.). In fact, both
        of the two
        >> clusters I talked about run CentOS 7 and I launch the job
        the same way on
        >> both of these, yet one of them creates standard core files
        and the other
        >> creates the 'btr; files. Strange thing is, I could not
        find anything on the
        >> .btr (= Backtrace?) files on Google, which is any I asked
        on this forum.
        >>
        >> Best regards
        >> Durga
        >>
        >> The surgeon general advises you to eat right, exercise
        regularly and quit
        >> ageing.
        >>
        >> On Mon, May 9, 2016 at 12:04 PM, Gus Correa
        >> <g...@ldeo.columbia.edu
        <mailto:g...@ldeo.columbia.edu><mailto:g...@ldeo.columbia.edu
        <mailto:g...@ldeo.columbia.edu>>> wrote:
        >> Hi Durga
        >>
        >> Just in case ...
        >> If you're using a resource manager to start the jobs
        (Torque, etc),
        >> you need to have them set the limits (for coredump size,
        stacksize, locked
        >> memory size, etc).
        >> This way the jobs will inherit the limits from the
        >> resource manager daemon.
        >> On Torque (which I use) I do this on the pbs_mom daemon
        >> init script (I am still before the systemd era, that
        lovely POS).
        >> And set the hard/soft limits on /etc/security/limits.conf
        as well.
        >>
        >> I hope this helps,
        >> Gus Correa
        >>
        >> On 05/07/2016 12:27 PM, Jeff Squyres (jsquyres) wrote:
        >> I'm afraid I don't know what a .btr file is -- that is not
        something that
        >> is controlled by Open MPI.
        >>
        >> You might want to look into your OS settings to see if it
        has some kind of
        >> alternate corefile mechanism...?
        >>
        >>
        >> On May 6, 2016, at 8:58 PM, dpchoudh .
        >> <dpcho...@gmail.com
        <mailto:dpcho...@gmail.com><mailto:dpcho...@gmail.com
        <mailto:dpcho...@gmail.com>>> wrote:
        >>
        >> Hello all
        >>
        >> I run MPI jobs (for test purpose only) on two different
        'clusters'. Both
        >> 'clusters' have two nodes only, connected back-to-back.
        The two are very
        >> similar, but not identical, both software and hardware wise.
        >>
        >> Both have ulimit -c set to unlimited. However, only one of
        the two creates
        >> core files when an MPI job crashes. The other creates a
        text file named
        >> something like
        >>
        >>
        
<program_name_that_crashed>.80s-<a-number-that-looks-like-a-PID>,<hostname-where-the-crash-happened>.btr
        >>
        >> I'd much prefer a core file because that allows me to
        debug with a lot
        >> more options than a static text file with addresses. How
        do I get a core
        >> file in all situations? I am using MPI source from the
        master branch.
        >>
        >> Thanks in advance
        >> Durga
        >>
        >> The surgeon general advises you to eat right, exercise
        regularly and quit
        >> ageing.
        >> _______________________________________________
        >> users mailing list
        >> us...@open-mpi.org
        <mailto:us...@open-mpi.org><mailto:us...@open-mpi.org
        <mailto:us...@open-mpi.org>>
        >> Subscription:
        https://www.open-mpi.org/mailman/listinfo.cgi/users
        >> Link to this post:
        >>
        http://www.open-mpi.org/community/lists/users/2016/05/29124.php
        >>
        >>
        >>
        >> _______________________________________________
        >> users mailing list
        >> us...@open-mpi.org
        <mailto:us...@open-mpi.org><mailto:us...@open-mpi.org
        <mailto:us...@open-mpi.org>>
        >> Subscription:
        https://www.open-mpi.org/mailman/listinfo.cgi/users
        >> Link to this post:
        >>
        http://www.open-mpi.org/community/lists/users/2016/05/29141.php
        >>
        >> _______________________________________________
        >> users mailing list
        >> us...@open-mpi.org <mailto:us...@open-mpi.org>
        >> Subscription:
        https://www.open-mpi.org/mailman/listinfo.cgi/users
        >> Link to this post:
        >>
        http://www.open-mpi.org/community/lists/users/2016/05/29154.php
        >
        >
        >
        > _______________________________________________
        > users mailing list
        > us...@open-mpi.org <mailto:us...@open-mpi.org>
        > Subscription:
        https://www.open-mpi.org/mailman/listinfo.cgi/users
        > Link to this post:
        > http://www.open-mpi.org/community/lists/users/2016/05/29169.php
        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2016/05/29170.php




    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/05/29176.php


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
    Link to this post:
    http://www.open-mpi.org/community/lists/users/2016/05/29177.php




_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/05/29178.php

Reply via email to