This is a known problem - I committed the fix for PSM with a link down just 
today.


> On May 11, 2016, at 7:34 PM, dpchoudh . <dpcho...@gmail.com> wrote:
> 
> Hello Gilles
> 
> Thank you for your continued support. With your help, I have a better 
> understanding of what is happening. Here are the details.
> 
> 1. Yes, I am sure that ulimit -c is 'unlimited' (and for the test in 
> question, I am running it on a single node, so there are no other nodes)
> 
> 2. The command I am running is possibly the simplest MPI command:
> mpirun -np 2 <program>
> 
> It looks to me, after running your test code, that what is crashing is 
> MPI_Init() itself. The output from your code (I called it 'btrtest') is as 
> follows:
> 
> [durga@smallMPI ~]$ mpirun -np 2 ./btrtest
> before MPI_Init : -1 -1
> before MPI_Init : -1 -1
> 
> btrtest:7275 terminated with signal 11 at PC=7f401f49e7d8 SP=7ffec47e7578.  
> Backtrace:
> /lib64/libc.so.6(+0x3ba7d8)[0x7f401f49e7d8]
> 
> btrtest:7274 terminated with signal 11 at PC=7f1ba21897d8 SP=7ffc516ac8d8.  
> Backtrace:
> /lib64/libc.so.6(+0x3ba7d8)[0x7f1ba21897d8]
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>   Process name: [[7936,1],1]
>   Exit code:    1
> --------------------------------------------------------------------------
> 
> So obviously the code does not make it past MPI_Init()
> 
> This is an issue that I have been observing for quite a while in different 
> forms and have reported on the forum a few times also. Let me elaborate:
> 
> Both my 'well-behaving' and crashing clusters run CentOS 7 (the crashing one 
> has the latest updates, the well-behaving one does not as I am not allowed to 
> apply updates on that). They both have OMPI, from the master branch, compiled 
> from the source. Both consist of 64 bit Dell servers, although not identical 
> models (I doubt if that matters)
> 
> The only significant difference between the two is this:
> 
> The well behaved one (if it does core dump, that is because there is a bug in 
> the MPI app) has very simple network hardware: two different NICs (one 
> Broadcom GbE, one proprietary NIC that is currently being exposed as an IP 
> interface). There is no RDMA capability there at all.
> 
> The crashing one have 4 different NICs:
> 1. Broadcom GbE
> 2. Chelsio T3 based 10Gb iWARP NIC
> 3. QLogic 20Gb Infiniband (PSM capable)
> 4. LSI logic Fibre channel
> 
> In this situation, WITH ALL BUT THE GbE LINK DOWN (the GbE connects the 
> machine to the WAN link), it seems just the presence of these NICs matter.
> 
> Here are the various commands and outputs:
> 
> [durga@smallMPI ~]$ mpirun -np 2 ./btrtest
> before MPI_Init : -1 -1
> before MPI_Init : -1 -1
> 
> btrtest:10032 terminated with signal 11 at PC=7f6897c197d8 SP=7ffcae2b2ef8.  
> Backtrace:
> /lib64/libc.so.6(+0x3ba7d8)[0x7f6897c197d8]
> 
> btrtest:10033 terminated with signal 11 at PC=7fb035c3e7d8 SP=7ffe61a92088.  
> Backtrace:
> /lib64/libc.so.6(+0x3ba7d8)[0x7fb035c3e7d8]
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>   Process name: [[9294,1],0]
>   Exit code:    1
> --------------------------------------------------------------------------
> 
> [durga@smallMPI ~]$ mpirun -np 2 -mca pml ob1 ./btrtest
> before MPI_Init : -1 -1
> before MPI_Init : -1 -1
> 
> btrtest:10076 terminated with signal 11 at PC=7fa92d20b7d8 SP=7ffebb106028.  
> Backtrace:
> /lib64/libc.so.6(+0x3ba7d8)[0x7fa92d20b7d8]
> 
> btrtest:10077 terminated with signal 11 at PC=7f5012fa57d8 SP=7ffea4f4fdf8.  
> Backtrace:
> /lib64/libc.so.6(+0x3ba7d8)[0x7f5012fa57d8]
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>   Process name: [[9266,1],0]
>   Exit code:    1
> --------------------------------------------------------------------------
> 
> [durga@smallMPI ~]$ mpirun -np 2 -mca pml ob1 -mca btl self,sm ./btrtest
> before MPI_Init : -1 -1
> before MPI_Init : -1 -1
> 
> btrtest:10198 terminated with signal 11 at PC=400829 SP=7ffe6e148870.  
> Backtrace:
> 
> btrtest:10197 terminated with signal 11 at PC=400829 SP=7ffe87be6cd0.  
> Backtrace:
> ./btrtest[0x400829]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9473bbeb15]
> ./btrtest[0x4006d9]
> ./btrtest[0x400829]
> /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fdfe2d8eb15]
> ./btrtest[0x4006d9]
> after MPI_Init : -1 -1
> after MPI_Init : -1 -1
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>   Process name: [[9384,1],1]
>   Exit code:    1
> --------------------------------------------------------------------------
> 
> 
> [durga@smallMPI ~]$ ulimit -a
> core file size          (blocks, -c) unlimited
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 216524
> max locked memory       (kbytes, -l) unlimited
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 8192
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 4096
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
> [durga@smallMPI ~]$ 
> 
> 
> I do realize that my setup is very unusual (I am a quasi-developer of MPI 
> whereas most other folks in this list are likely end-users), but somehow just 
> disabling this 'execinfo' MCA would allow me to make progress (and also find 
> out why/where MPI_Init() is crashing!). Is there any way I can do that?
> 
> Thank you
> Durga
> 
> The surgeon general advises you to eat right, exercise regularly and quit 
> ageing.
> 
> On Wed, May 11, 2016 at 8:58 PM, Gilles Gouaillardet <gil...@rist.or.jp 
> <mailto:gil...@rist.or.jp>> wrote:
> Are you sure ulimit -c unlimited is *really* applied on all hosts
> 
> 
> can you please run the simple program below and confirm that ?
> 
> 
> Cheers,
> 
> 
> Gilles
> 
> 
> #include <sys/time.h>
> #include <sys/resource.h>
> #include <poll.h>
> #include <stdio.h>
> 
> int main(int argc, char *argv[]) {
>     struct rlimit rlim;
>     char * c = (char *)0;
>     getrlimit(RLIMIT_CORE, &rlim);
>     printf ("before MPI_Init : %d %d\n", rlim.rlim_cur, rlim.rlim_max);
>     MPI_Init(&argc, &argv);
>     getrlimit(RLIMIT_CORE, &rlim);
>     printf ("after MPI_Init : %d %d\n", rlim.rlim_cur, rlim.rlim_max);
>     *c = 0;
>     MPI_Finalize();
>     return 0;
> }
> 
> 
> On 5/12/2016 4:22 AM, dpchoudh . wrote:
>> Hello Gilles
>> 
>> Thank you for the advice. However, that did not seem to make any difference. 
>> Here is what I did (on the cluster that generates .btr files for core dumps):
>> 
>> [durga@smallMPI git]$ ompi_info --all | grep opal_signal
>>            MCA opal base: parameter "opal_signal" (current value: 
>> "6,7,8,11", data source: default, level: 3 user/all, type: string)
>> [durga@smallMPI git]$ 
>> 
>> 
>> According to <bits/signum.h>, signals 6.7,8,11 are this:
>> 
>> #define    SIGABRT        6    /* Abort (ANSI).  */
>> #define    SIGBUS        7    /* BUS error (4.2 BSD).  */
>> #define    SIGFPE        8    /* Floating-point exception (ANSI).  */
>> #define    SIGSEGV        11    /* Segmentation violation (ANSI).  */
>> 
>> And thus I added the following just after MPI_Init()
>> 
>>     MPI_Init(&argc, &argv);
>>     signal(SIGABRT, SIG_DFL);
>>     signal(SIGBUS, SIG_DFL);
>>     signal(SIGFPE, SIG_DFL);
>>     signal(SIGSEGV, SIG_DFL);
>>     signal(SIGTERM, SIG_DFL);
>> 
>> (I added the 'SIGTERM' part later, just in case it would make a difference; 
>> i didn't)
>> 
>> The resulting code still generates .btr files instead of core files.
>> 
>> It looks like the 'execinfo' MCA component is being used as the backtrace 
>> mechanism:
>> 
>> [durga@smallMPI git]$ ompi_info | grep backtrace
>>            MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v3.0.0)
>> 
>> However, I could not find any way to choose 'none' instead of 'execinfo'
>> 
>> And the strange thing is, on the cluster where regular core dump is 
>> happening, the output of 
>> $ ompi_info | grep backtrace
>> is identical to the above. (Which kind of makes sense because they were 
>> created from the same source with the same configure options.)
>> 
>> Sorry to harp on this, but without a core file it is hard to debug the 
>> application (e.g. examine stack variables).
>> 
>> Thank you
>> Durga
>> 
>> 
>> The surgeon general advises you to eat right, exercise regularly and quit 
>> ageing.
>> 
>> On Wed, May 11, 2016 at 3:37 AM, Gilles Gouaillardet < 
>> <mailto:gilles.gouaillar...@gmail.com>gilles.gouaillar...@gmail.com 
>> <mailto:gilles.gouaillar...@gmail.com>> wrote:
>> Durga,
>> 
>> you might wanna try to restore the signal handler for other signals as well
>> (SIGSEGV, SIGBUS, ...)
>> ompi_info --all | grep opal_signal
>> does list the signal you should restore the handler
>> 
>> 
>> only one backtrace component is built (out of several candidates :
>> execinfo, none, printstack)
>> nm -l libopen-pal.so | grep backtrace
>> will hint you which component was built
>> 
>> your two similar distros might have different backtrace component
>> 
>> 
>> 
>> Gus,
>> 
>> btr is a plain text file with a back trace "ala" gdb
>> 
>> 
>> 
>> Nathan,
>> 
>> i did a 'grep btr' and could not find anything :-(
>> opal_backtrace_buffer and opal_backtrace_print are only used with stderr.
>> so i am puzzled who creates the tracefile name and where ...
>> also, no stack is printed by default unless opal_abort_print_stack is true
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> 
>> On Wed, May 11, 2016 at 3:43 PM, dpchoudh . < 
>> <mailto:dpcho...@gmail.com>dpcho...@gmail.com <mailto:dpcho...@gmail.com>> 
>> wrote:
>> > Hello Nathan
>> >
>> > Thank you for your response. Could you please be more specific? Adding the
>> > following after MPI_Init() does not seem to make a difference.
>> >
>> >     MPI_Init(&argc, &argv);
>> >   signal(SIGABRT, SIG_DFL);
>> >   signal(SIGTERM, SIG_DFL);
>> >
>> > I also find it puzzling that nearly identical OMPI distro running on a
>> > different machine shows different behaviour.
>> >
>> > Best regards
>> > Durga
>> >
>> > The surgeon general advises you to eat right, exercise regularly and quit
>> > ageing.
>> >
>> > On Tue, May 10, 2016 at 10:02 AM, Hjelm, Nathan Thomas <hje...@lanl.gov 
>> > <mailto:hje...@lanl.gov>>
>> > wrote:
>> >>
>> >> btr files are indeed created by open mpi's backtrace mechanism. I think we
>> >> should revisit it at some point but for now the only effective way i have
>> >> found to prevent it is to restore the default signal handlers after
>> >> MPI_Init.
>> >>
>> >> Excuse the quoting style. Good sucks.
>> >>
>> >>
>> >> ________________________________________
>> >> From: users on behalf of dpchoudh .
>> >> Sent: Monday, May 09, 2016 2:59:37 PM
>> >> To: Open MPI Users
>> >> Subject: Re: [OMPI users] No core dump in some cases
>> >>
>> >> Hi Gus
>> >>
>> >> Thanks for your suggestion. But I am not using any resource manager (i.e.
>> >> I am launching mpirun from the bash shell.). In fact, both of the two
>> >> clusters I talked about run CentOS 7 and I launch the job the same way on
>> >> both of these, yet one of them creates standard core files and the other
>> >> creates the 'btr; files. Strange thing is, I could not find anything on 
>> >> the
>> >> .btr (= Backtrace?) files on Google, which is any I asked on this forum.
>> >>
>> >> Best regards
>> >> Durga
>> >>
>> >> The surgeon general advises you to eat right, exercise regularly and quit
>> >> ageing.
>> >>
>> >> On Mon, May 9, 2016 at 12:04 PM, Gus Correa
>> >> <g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu><mailto: 
>> >> <mailto:g...@ldeo.columbia.edu>g...@ldeo.columbia.edu 
>> >> <mailto:g...@ldeo.columbia.edu>>> wrote:
>> >> Hi Durga
>> >>
>> >> Just in case ...
>> >> If you're using a resource manager to start the jobs (Torque, etc),
>> >> you need to have them set the limits (for coredump size, stacksize, locked
>> >> memory size, etc).
>> >> This way the jobs will inherit the limits from the
>> >> resource manager daemon.
>> >> On Torque (which I use) I do this on the pbs_mom daemon
>> >> init script (I am still before the systemd era, that lovely POS).
>> >> And set the hard/soft limits on /etc/security/limits.conf as well.
>> >>
>> >> I hope this helps,
>> >> Gus Correa
>> >>
>> >> On 05/07/2016 12:27 PM, Jeff Squyres (jsquyres) wrote:
>> >> I'm afraid I don't know what a .btr file is -- that is not something that
>> >> is controlled by Open MPI.
>> >>
>> >> You might want to look into your OS settings to see if it has some kind of
>> >> alternate corefile mechanism...?
>> >>
>> >>
>> >> On May 6, 2016, at 8:58 PM, dpchoudh .
>> >> <dpcho...@gmail.com <mailto:dpcho...@gmail.com><mailto: 
>> >> <mailto:dpcho...@gmail.com>dpcho...@gmail.com 
>> >> <mailto:dpcho...@gmail.com>>> wrote:
>> >>
>> >> Hello all
>> >>
>> >> I run MPI jobs (for test purpose only) on two different 'clusters'. Both
>> >> 'clusters' have two nodes only, connected back-to-back. The two are very
>> >> similar, but not identical, both software and hardware wise.
>> >>
>> >> Both have ulimit -c set to unlimited. However, only one of the two creates
>> >> core files when an MPI job crashes. The other creates a text file named
>> >> something like
>> >>
>> >> <program_name_that_crashed>.80s-<a-number-that-looks-like-a-PID>,<hostname-where-the-crash-happened>.btr
>> >>
>> >> I'd much prefer a core file because that allows me to debug with a lot
>> >> more options than a static text file with addresses. How do I get a core
>> >> file in all situations? I am using MPI source from the master branch.
>> >>
>> >> Thanks in advance
>> >> Durga
>> >>
>> >> The surgeon general advises you to eat right, exercise regularly and quit
>> >> ageing.
>> >> _______________________________________________
>> >> users mailing list
>> >> us...@open-mpi.org <mailto:us...@open-mpi.org><mailto: 
>> >> <mailto:us...@open-mpi.org>us...@open-mpi.org <mailto:us...@open-mpi.org>>
>> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
>> >> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> >> Link to this post:
>> >> http://www.open-mpi.org/community/lists/users/2016/05/29124.php 
>> >> <http://www.open-mpi.org/community/lists/users/2016/05/29124.php>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> us...@open-mpi.org <mailto:us...@open-mpi.org><mailto: 
>> >> <mailto:us...@open-mpi.org>us...@open-mpi.org <mailto:us...@open-mpi.org>>
>> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
>> >> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> >> Link to this post:
>> >> http://www.open-mpi.org/community/lists/users/2016/05/29141.php 
>> >> <http://www.open-mpi.org/community/lists/users/2016/05/29141.php>
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
>> >> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> >> Link to this post:
>> >> http://www.open-mpi.org/community/lists/users/2016/05/29154.php 
>> >> <http://www.open-mpi.org/community/lists/users/2016/05/29154.php>
>> >
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org <mailto:us...@open-mpi.org>
>> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
>> > <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> > Link to this post:
>> > http://www.open-mpi.org/community/lists/users/2016/05/29169.php 
>> > <http://www.open-mpi.org/community/lists/users/2016/05/29169.php>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/05/29170.php 
>> <http://www.open-mpi.org/community/lists/users/2016/05/29170.php>
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
>> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/05/29176.php 
>> <http://www.open-mpi.org/community/lists/users/2016/05/29176.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users 
> <https://www.open-mpi.org/mailman/listinfo.cgi/users>
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29177.php 
> <http://www.open-mpi.org/community/lists/users/2016/05/29177.php>
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/05/29178.php

Reply via email to