Mellanox -- What would cause a CQ to fail to be created?
On Jun 11, 2014, at 3:42 PM, "Fischer, Greg A." <fisch...@westinghouse.com> wrote: > Is there any other work around that I might try? Something that avoids UDCM? > > -----Original Message----- > From: Fischer, Greg A. > Sent: Tuesday, June 10, 2014 2:59 PM > To: Nathan Hjelm > Cc: Open MPI Users; Fischer, Greg A. > Subject: RE: [OMPI users] openib segfaults with Torque > > [binf316:fischega] $ ulimit -m > unlimited > > Greg > > -----Original Message----- > From: Nathan Hjelm [mailto:hje...@lanl.gov] > Sent: Tuesday, June 10, 2014 2:58 PM > To: Fischer, Greg A. > Cc: Open MPI Users > Subject: Re: [OMPI users] openib segfaults with Torque > > Out of curiosity what is the mlock limit on your system? If it is too low > that can cause ibv_create_cq to fail. To check run ulimit -m. > > -Nathan Hjelm > Application Readiness, HPC-5, LANL > > On Tue, Jun 10, 2014 at 02:53:58PM -0400, Fischer, Greg A. wrote: >> Yes, this fails on all nodes on the system, except for the head node. >> >> The uptime of the system isn't significant. Maybe 1 week, and it's received >> basically no use. >> >> -----Original Message----- >> From: Nathan Hjelm [mailto:hje...@lanl.gov] >> Sent: Tuesday, June 10, 2014 2:49 PM >> To: Fischer, Greg A. >> Cc: Open MPI Users >> Subject: Re: [OMPI users] openib segfaults with Torque >> >> >> Well, thats interesting. The output shows that ibv_create_cq is failing. >> Strange since an identical call had just succeeded (udcm creates two >> completion queues). Some questions that might indicate where the failure >> might be: >> >> Does this fail on any other node in your system? >> >> How long has the node been up? >> >> -Nathan Hjelm >> Application Readiness, HPC-5, LANL >> >> On Tue, Jun 10, 2014 at 02:06:54PM -0400, Fischer, Greg A. wrote: >>> Jeff/Nathan, >>> >>> I ran the following with my debug build of OpenMPI 1.8.1 - after opening a >>> terminal on a compute node with "qsub -l nodes 2 -I": >>> >>> mpirun -mca btl openib,self -mca btl_base_verbose 100 -np 2 >>> ring_c &> output.txt >>> >>> Output and backtrace are attached. Let me know if I can provide anything >>> else. >>> >>> Thanks for looking into this, >>> Greg >>> >>> -----Original Message----- >>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff >>> Squyres (jsquyres) >>> Sent: Tuesday, June 10, 2014 10:31 AM >>> To: Nathan Hjelm >>> Cc: Open MPI Users >>> Subject: Re: [OMPI users] openib segfaults with Torque >>> >>> Greg: >>> >>> Can you run with "--mca btl_base_verbose 100" on your debug build so that >>> we can get some additional output to see why UDCM is failing to setup >>> properly? >>> >>> >>> >>> On Jun 10, 2014, at 10:25 AM, Nathan Hjelm <hje...@lanl.gov> wrote: >>> >>>> On Tue, Jun 10, 2014 at 12:10:28AM +0000, Jeff Squyres (jsquyres) wrote: >>>>> I seem to recall that you have an IB-based cluster, right? >>>>> >>>>> From a *very quick* glance at the code, it looks like this might be a >>>>> simple incorrect-finalization issue. That is: >>>>> >>>>> - you run the job on a single server >>>>> - openib disqualifies itself because you're running on a single >>>>> server >>>>> - openib then goes to finalize/close itself >>>>> - but openib didn't fully initialize itself (because it >>>>> disqualified itself early in the initialization process), and >>>>> something in the finalization process didn't take that into >>>>> account >>>>> >>>>> Nathan -- is that anywhere close to correct? >>>> >>>> Nope. udcm_module_finalize is being called because there was an >>>> error setting up the udcm state. See btl_openib_connect_udcm.c:476. >>>> The opal_list_t destructor is getting an assert failure. Probably >>>> because the constructor wasn't called. I can rearrange the >>>> constructors to be called first but there appears to be a deeper >>>> issue with the user's >>>> system: udcm_module_init should not be failing! It creates a >>>> couple of CQs, allocates a small number of registered bufferes and >>>> starts monitoring the fd for the completion channel. All these >>>> things are also done in the setup of the openib btl itself. Keep >>>> in mind that the openib btl will not disqualify itself when running single >>>> server. >>>> Openib may be used to communicate on node and is needed for the dynamics >>>> case. >>>> >>>> The user might try adding -mca btl_base_verbose 100 to shed some >>>> light on what the real issue is. >>>> >>>> BTW, I no longer monitor the user mailing list. If something needs >>>> my attention forward it to me directly. >>>> >>>> -Nathan >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> >>> Core was generated by `ring_c'. >>> Program terminated with signal 6, Aborted. >>> #0 0x00007f8b6ae1cb55 in raise () from /lib64/libc.so.6 >>> #0 0x00007f8b6ae1cb55 in raise () from /lib64/libc.so.6 >>> #1 0x00007f8b6ae1e0c5 in abort () from /lib64/libc.so.6 >>> #2 0x00007f8b6ae15a10 in __assert_fail () from /lib64/libc.so.6 >>> #3 0x00007f8b664b684b in udcm_module_finalize (btl=0x717060, >>> cpc=0x7190c0) at >>> ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_ >>> co >>> nnect_udcm.c:734 >>> #4 0x00007f8b664b5474 in udcm_component_query (btl=0x717060, >>> cpc=0x718a48) at >>> ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_ >>> co >>> nnect_udcm.c:476 >>> #5 0x00007f8b664ae316 in >>> ompi_btl_openib_connect_base_select_for_local_port (btl=0x717060) at >>> ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_ >>> co >>> nnect_base.c:273 >>> #6 0x00007f8b66497817 in btl_openib_component_init >>> (num_btl_modules=0x7fffe34cebe0, enable_progress_threads=false, >>> enable_mpi_threads=false) >>> at >>> ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/btl_openib_component. >>> c:2703 >>> #7 0x00007f8b6b43fa5e in mca_btl_base_select >>> (enable_progress_threads=false, enable_mpi_threads=false) at >>> ../../../../openmpi-1.8.1/ompi/mca/btl/base/btl_base_select.c:108 >>> #8 0x00007f8b666d9d42 in mca_bml_r2_component_init >>> (priority=0x7fffe34cecb4, enable_progress_threads=false, >>> enable_mpi_threads=false) >>> at >>> ../../../../../openmpi-1.8.1/ompi/mca/bml/r2/bml_r2_component.c:88 >>> #9 0x00007f8b6b43ed1b in mca_bml_base_init >>> (enable_progress_threads=false, enable_mpi_threads=false) at >>> ../../../../openmpi-1.8.1/ompi/mca/bml/base/bml_base_init.c:69 >>> #10 0x00007f8b655ff739 in mca_pml_ob1_component_init >>> (priority=0x7fffe34cedf0, enable_progress_threads=false, >>> enable_mpi_threads=false) >>> at >>> ../../../../../openmpi-1.8.1/ompi/mca/pml/ob1/pml_ob1_component.c:27 >>> 1 >>> #11 0x00007f8b6b4659b2 in mca_pml_base_select >>> (enable_progress_threads=false, enable_mpi_threads=false) at >>> ../../../../openmpi-1.8.1/ompi/mca/pml/base/pml_base_select.c:128 >>> #12 0x00007f8b6b3d233c in ompi_mpi_init (argc=1, >>> argv=0x7fffe34cf0e8, requested=0, provided=0x7fffe34cef98) at >>> ../../openmpi-1.8.1/ompi/runtime/ompi_mpi_init.c:604 >>> #13 0x00007f8b6b407386 in PMPI_Init (argc=0x7fffe34cefec, >>> argv=0x7fffe34cefe0) at pinit.c:84 >>> #14 0x000000000040096f in main (argc=1, argv=0x7fffe34cf0e8) at >>> ring_c.c:19 >>> >> >>> [binf316:24591] mca: base: components_register: registering btl >>> components [binf316:24591] mca: base: components_register: found >>> loaded component openib [binf316:24592] mca: base: >>> components_register: registering btl components [binf316:24592] mca: >>> base: components_register: found loaded component openib >>> [binf316:24591] mca: base: components_register: component openib >>> register function successful [binf316:24591] mca: base: >>> components_register: found loaded component self [binf316:24591] mca: >>> base: components_register: component self register function >>> successful [binf316:24591] mca: base: components_open: opening btl >>> components [binf316:24591] mca: base: components_open: found loaded >>> component openib [binf316:24591] mca: base: components_open: >>> component openib open function successful [binf316:24591] mca: base: >>> components_open: >>> found loaded component self [binf316:24591] mca: base: >>> components_open: component self open function successful >>> [binf316:24592] mca: base: components_register: component openib >>> register function successful [binf316:24592] mca: base: >>> components_register: found loaded component self [binf316:24592] mca: >>> base: components_register: component self register function >>> successful [binf316:24592] mca: base: components_open: opening btl >>> components [binf316:24592] mca: base: components_open: found loaded >>> component openib [binf316:24592] mca: base: components_open: >>> component openib open function successful [binf316:24592] mca: base: >>> components_open: >>> found loaded component self [binf316:24592] mca: base: >>> components_open: component self open function successful >>> [binf316:24591] select: initializing btl component openib >>> [binf316:24592] select: initializing btl component openib >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_ip.c:364:add_rdma_addr] Adding addr 9.9.10.75 >>> (0x4b0a0909) subnet 0x9090000 as mlx4_0:1 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_ip.c:364:add_rdma_addr] Adding addr 9.9.10.75 >>> (0x4b0a0909) subnet 0x9090000 as mlx4_0:1 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:686:init_one_port] looking for mlx4_0:1 >>> GID index 0 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:717:init_one_port] my IB subnet_id for >>> HCA >>> mlx4_0 port 1 is fe80000000000000 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:1294:setup_qps] pp: rd_num is 256 rd_low >>> is >>> 192 rd_win 128 rd_rsv 4 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:1339:setup_qps] srq: rd_num is 1024 >>> rd_low is >>> 1008 sd_max is 64 rd_max is 256 srq_limit is 48 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:1339:setup_qps] srq: rd_num is 1024 >>> rd_low is >>> 1008 sd_max is 64 rd_max is 256 srq_limit is 48 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:1339:setup_qps] srq: rd_num is 1024 >>> rd_low is >>> 1008 sd_max is 64 rd_max is 256 srq_limit is 48 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni >>> b/connect/btl_openib_connect_rdmacm.c:1840:rdmacm_component_query] >>> rdmacm_component_query >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_ip.c:132:mca_btl_openib_rdma_get_ipv4addr] Looking >>> for >>> mlx4_0:1 in IP address list >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_ip.c:141:mca_btl_openib_rdma_get_ipv4addr] FOUND: >>> mlx4_0:1 is 9.9.10.75 (0x4b0a0909) >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/connect/btl_openib_connect_rdmacm.c:1750:ipaddrcheck] Found >>> device >>> mlx4_0:1 = IP address 9.9.10.75 (0x4b0a0909):51845 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/connect/btl_openib_connect_rdmacm.c:1776:ipaddrcheck] creating >>> new server to listen on 9.9.10.75 (0x4b0a0909):51845 [binf316:24591] >>> openib BTL: rdmacm CPC available for use on mlx4_0:1 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/connect/btl_openib_connect_udcm.c:542:udcm_module_init] created >>> cpc module 0x719220 for btl 0x716ee0 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:686:init_one_port] looking for mlx4_0:1 >>> GID index 0 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:717:init_one_port] my IB subnet_id for >>> HCA >>> mlx4_0 port 1 is fe80000000000000 >>> [binf316][[17980,1],0][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/connect/btl_openib_connect_udcm.c:565:udcm_module_init] error >>> creating ud send completion queue >>> ring_c: >>> ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:734: >>> udcm_module_finalize: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) >>> == ((opal_object_t *) (&m->cm_recv_msg_queue))->obj_magic_id' failed. >>> [binf316:24591] *** Process received signal *** [binf316:24591] >>> Signal: Aborted (6) [binf316:24591] Signal code: (-6) >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:1294:setup_qps] pp: rd_num is 256 rd_low >>> is >>> 192 rd_win 128 rd_rsv 4 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:1339:setup_qps] srq: rd_num is 1024 >>> rd_low is >>> 1008 sd_max is 64 rd_max is 256 srq_limit is 48 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:1339:setup_qps] srq: rd_num is 1024 >>> rd_low is >>> 1008 sd_max is 64 rd_max is 256 srq_limit is 48 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_component.c:1339:setup_qps] srq: rd_num is 1024 >>> rd_low is >>> 1008 sd_max is 64 rd_max is 256 srq_limit is 48 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni >>> b/connect/btl_openib_connect_rdmacm.c:1840:rdmacm_component_query] >>> rdmacm_component_query >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_ip.c:132:mca_btl_openib_rdma_get_ipv4addr] Looking >>> for >>> mlx4_0:1 in IP address list >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/btl_openib_ip.c:141:mca_btl_openib_rdma_get_ipv4addr] FOUND: >>> mlx4_0:1 is 9.9.10.75 (0x4b0a0909) >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/connect/btl_openib_connect_rdmacm.c:1750:ipaddrcheck] Found >>> device >>> mlx4_0:1 = IP address 9.9.10.75 (0x4b0a0909):57734 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/connect/btl_openib_connect_rdmacm.c:1776:ipaddrcheck] creating >>> new server to listen on 9.9.10.75 (0x4b0a0909):57734 [binf316:24592] >>> openib BTL: rdmacm CPC available for use on mlx4_0:1 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/connect/btl_openib_connect_udcm.c:542:udcm_module_init] created >>> cpc module 0x7190c0 for btl 0x717060 >>> [binf316][[17980,1],1][../../../../../openmpi-1.8.1/ompi/mca/btl/ope >>> ni b/connect/btl_openib_connect_udcm.c:565:udcm_module_init] error >>> creating ud send completion queue >>> ring_c: >>> ../../../../../openmpi-1.8.1/ompi/mca/btl/openib/connect/btl_openib_connect_udcm.c:734: >>> udcm_module_finalize: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) >>> == ((opal_object_t *) (&m->cm_recv_msg_queue))->obj_magic_id' failed. >>> [binf316:24592] *** Process received signal *** [binf316:24592] >>> Signal: Aborted (6) [binf316:24592] Signal code: (-6) >>> [binf316:24591] [ 0] /lib64/libpthread.so.0(+0xf7c0)[0x7fb35959c7c0] >>> [binf316:24591] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x7fb359248b55] >>> [binf316:24591] [ 2] /lib64/libc.so.6(abort+0x181)[0x7fb35924a131] >>> [binf316:24591] [ 3] >>> /lib64/libc.so.6(__assert_fail+0xf0)[0x7fb359241a10] >>> [binf316:24591] [ 4] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bt l_openib.so(+0x3784b)[0x7fb3548e284b] >>> [binf316:24591] [ 5] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bt l_openib.so(+0x36474)[0x7fb3548e1474] >>> [binf316:24591] [ 6] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bt >>> l_openib.so(ompi_btl_openib_connect_base_select_for_local_port+0x15b >>> )[ 0x7fb3548da316] [binf316:24591] [ 7] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bt l_openib.so(+0x18817)[0x7fb3548c3817] >>> [binf316:24591] [ 8] [binf316:24592] [ 0] >>> /lib64/libpthread.so.0(+0xf7c0)[0x7f8b6b1707c0] >>> [binf316:24592] [ 1] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> mc a_btl_base_select+0x1b2)[0x7fb35986ba5e] >>> [binf316:24591] [ 9] /lib64/libc.so.6(gsignal+0x35)[0x7f8b6ae1cb55] >>> [binf316:24592] [ 2] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bm l_r2.so(mca_bml_r2_component_init+0x20)[0x7fb354b05d42] >>> [binf316:24591] [10] /lib64/libc.so.6(abort+0x181)[0x7f8b6ae1e131] >>> [binf316:24592] [ 3] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> mc a_bml_base_init+0xd6)[0x7fb35986ad1b] >>> [binf316:24591] [11] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> pm l_ob1.so(+0x7739)[0x7fb353a2b739] [binf316:24591] [12] >>> /lib64/libc.so.6(__assert_fail+0xf0)[0x7f8b6ae15a10] >>> [binf316:24592] [ 4] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bt l_openib.so(+0x3784b)[0x7f8b664b684b] >>> [binf316:24592] [ 5] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bt l_openib.so(+0x36474)[0x7f8b664b5474] >>> [binf316:24592] [ 6] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> mc a_pml_base_select+0x26e)[0x7fb3598919b2] >>> [binf316:24591] [13] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bt >>> l_openib.so(ompi_btl_openib_connect_base_select_for_local_port+0x15b >>> )[ 0x7f8b664ae316] [binf316:24592] [ 7] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bt l_openib.so(+0x18817)[0x7f8b66497817] >>> [binf316:24592] [ 8] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> om >>> pi_mpi_init+0x5f6)[0x7fb3597fe33c] >>> [binf316:24591] [14] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> mc a_btl_base_select+0x1b2)[0x7f8b6b43fa5e] >>> [binf316:24592] [ 9] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> bm l_r2.so(mca_bml_r2_component_init+0x20)[0x7f8b666d9d42] >>> [binf316:24592] [10] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> MP >>> I_Init+0x17e)[0x7fb359833386] >>> [binf316:24591] [15] ring_c[0x40096f] [binf316:24591] [16] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> mc a_bml_base_init+0xd6)[0x7f8b6b43ed1b] >>> [binf316:24592] [11] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/openmpi/mca_ >>> pm l_ob1.so(+0x7739)[0x7f8b655ff739] [binf316:24592] [12] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> mc a_pml_base_select+0x26e)[0x7f8b6b4659b2] >>> [binf316:24592] [13] >>> /lib64/libc.so.6(__libc_start_main+0xe6)[0x7fb359234c36] >>> [binf316:24591] [17] ring_c[0x400889] [binf316:24591] *** End of >>> error message *** >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> om >>> pi_mpi_init+0x5f6)[0x7f8b6b3d233c] >>> [binf316:24592] [14] >>> /xxxx/yyyy_ib/gcc-4.8.3/toolset/openmpi-1.8.1_debug/lib/libmpi.so.1( >>> MP >>> I_Init+0x17e)[0x7f8b6b407386] >>> [binf316:24592] [15] ring_c[0x40096f] [binf316:24592] [16] >>> /lib64/libc.so.6(__libc_start_main+0xe6)[0x7f8b6ae08c36] >>> [binf316:24592] [17] ring_c[0x400889] [binf316:24592] *** End of >>> error message *** >>> -------------------------------------------------------------------- >>> -- >>> ---- mpirun noticed that process rank 0 with PID 24591 on node >>> xxxx316 exited on signal 6 (Aborted). >>> -------------------------------------------------------------------- >>> -- >>> ---- >> >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24632.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/