Hi, I am running on my SCG cluster the following script (using qsub):
#!/bin/bash
#$-cwd
#$ -S /bin/bash
#$ -V
#$ -q normal
#$ -pe mpi 40
#$ -P Lab219
#$ -o output
#$ -e error
module load PhyML/3.3
mpirun --mca pml yalla -np 40 phyml-mpi -i proteic -b 10 -d aa
where phyml-mpi is the parallel version for OMPI of the program PhyML.
--mca pml yalla option is called to used MXM (I have mellanox OFED).
It gives me lots of errors related to KNEM (see error and output files
from qsub in the attachments). However, I specified the KNEM directory
when installing OMPI. I can't really understand such errors, and would
appreciate any hint on this issue. I have run open-mpi on an own script
(just a loop running inside something as: command --help) and got no error.
Thanks in advance
--------------------------------------------------------------------------
WARNING: Open MPI failed to open the /dev/knem device due to a local
error. Please check with your system administrator to get the problem
fixed, or set the btl_vader_single_copy_mechanism MCA variable to none
to silence this warning and run without knem support.
The vader shared memory BTL will fall back on another single-copy
mechanism if one is available. This may result in lower performance.
Local host: NODE3
Errno: 2 (No such file or directory)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: Linux kernel knem support was requested via the
btl_vader_single_copy_mechanism MCA parameter, but Knem support was either not
compiled into this Open MPI installation, or Knem support was unable
to be activated in this process.
The vader BTL will fall back on another single-copy mechanism if one
is available. This may result in lower performance.
Local host: NODE3
--------------------------------------------------------------------------
[1484652493.596258] [NODE3:185606:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652493.596275] [NODE3:185604:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652493.596270] [NODE3:185607:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652493.596332] [NODE3:185605:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652493.596546] [NODE3:185608:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652493.597514] [NODE3:185610:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652493.599711] [NODE3:185613:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484653451.955637] [NODE7:155953:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484653451.955696] [NODE7:155953:0] shm.c:69 MXM WARN Unable to
close the KNEM device file
[1484652493.601480] [NODE3:185616:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484653451.955739] [NODE7:155954:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484653451.955783] [NODE7:155954:0] shm.c:69 MXM WARN Unable to
close the KNEM device file
[1484653451.958300] [NODE7:155955:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484653451.960995] [NODE7:155956:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484653451.961026] [NODE7:155956:0] shm.c:69 MXM WARN Unable to
close the KNEM device file
[1484653451.960948] [NODE7:155960:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484653451.960979] [NODE7:155960:0] shm.c:69 MXM WARN Unable to
close the KNEM device file
[1484653451.961482] [NODE7:155957:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.164585] [NODE2:22129:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.165089] [NODE2:22130:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.168559] [NODE2:22131:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.169180] [NODE2:22132:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.172381] [NODE2:22134:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.173899] [NODE2:22137:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.176080] [NODE2:22140:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.177936] [NODE2:22144:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.181078] [NODE2:22146:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.182780] [NODE2:22150:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
[1484652512.183750] [NODE2:22154:0] ib_dev.c:159 MXM ERROR Failed to
open uverbs0
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
0 /lib/x86_64-linux-gnu/libc.so.6(+0x360b0) [0x7f554717a0b0]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0x3) [0x7f553f5db0c3]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7f553c9f90a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7f553c9f88df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7f553c9fab63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7f553c9fda57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7f553d714577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7f5546c2a3a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7f55477bde09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7f5546c3322d]
0 /lib/x86_64-linux-gnu/libc.so.6(+0x360b0) [0x7f0201df10b0]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0x3) [0x7f01fe3900c3]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7f01f77310a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7f01f77308df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7f01f7732b63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7f01f7735a57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7f01fc4c9577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7f02018a13a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7f0202434e09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7f02018aa22d]
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7f02023dae66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7f02023f9fbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f0201ddc7ed]
14 phyml-mpi() [0x4028f9]
===================
0 /lib/x86_64-linux-gnu/libc.so.6(+0x360b0) [0x7f13a17e60b0]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0x3) [0x7f139dd850c3]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7f13970e00a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7f13970df8df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7f13970e1b63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7f13970e4a57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7f1397dfb577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7f13a12963a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7f13a1e29e09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7f13a129f22d]
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7f13a1dcfe66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7f13a1deefbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f13a17d17ed]
14 phyml-mpi() [0x4028f9]
===================
0 /lib/x86_64-linux-gnu/libc.so.6(+0x360b0) [0x7fc3a87530b0]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0) [0x7fc3a4cf20c0]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7fc39e0bb0a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7fc39e0ba8df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7fc39e0bcb63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7fc39e0bfa57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7fc39edd6577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7fc3a82033a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7fc3a8d96e09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7fc3a820c22d]
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7fc3a8d3ce66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7fc3a8d5bfbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fc3a873e7ed]
14 phyml-mpi() [0x4028f9]
===================
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7f5547763e66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7f5547782fbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f55471657ed]
14 phyml-mpi() [0x4028f9]
===================
0 /lib/x86_64-linux-gnu/libc.so.6(+0x360b0) [0x7f16aa05b0b0]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0x3) [0x7f16a65fa0c3]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7f169f9c10a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7f169f9c08df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7f169f9c2b63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7f169f9c5a57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7f16a4733577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7f16a9b0b3a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7f16aa69ee09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7f16a9b1422d]
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7f16aa644e66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7f16aa663fbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f16aa0467ed]
14 phyml-mpi() [0x4028f9]
===================
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
0 /lib/x86_64-linux-gnu/libc.so.6(+0x360b0) [0x7f0cdb7040b0]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0x3) [0x7f0cd3bf20c3]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7f0cd10100a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7f0cd100f8df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7f0cd1011b63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7f0cd1014a57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7f0cd1d2b577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7f0cdb1b43a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7f0cdbd47e09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7f0cdb1bd22d]
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7f0cdbcede66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7f0cdbd0cfbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f0cdb6ef7ed]
14 phyml-mpi() [0x4028f9]
===================
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
0 /lib/x86_64-linux-gnu/libc.so.6(+0x360b0) [0x7f14f8a580b0]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0x3) [0x7f14f4ff70c3]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7f14ee2be0a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7f14ee2bd8df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7f14ee2bfb63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7f14ee2c2a57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7f14eefd9577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7f14f85083a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7f14f909be09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7f14f851122d]
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7f14f9041e66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7f14f9060fbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f14f8a437ed]
14 phyml-mpi() [0x4028f9]
===================
[NODE7:155949] [[55778,0],2] usock_peer_send_blocking: send() to socket 56
failed: Broken pipe (32)
[NODE7:155949] [[55778,0],2] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE7:155949] [[55778,0],2]-[[55778,1],16] usock_peer_accept:
usock_peer_send_connect_ack failed
[NODE7:155949] [[55778,0],2] usock_peer_send_blocking: send() to socket 59
failed: Broken pipe (32)
[NODE7:155949] [[55778,0],2] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE7:155949] [[55778,0],2]-[[55778,1],17] usock_peer_accept:
usock_peer_send_connect_ack failed
Warning: Conflicting CPU frequencies detected, using: 2201.000000.
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
[NODE7:155953] PML yalla cannot be selected
Warning: Conflicting CPU frequencies detected, using: 2201.000000.
[NODE7:155954] PML yalla cannot be selected
Warning: Conflicting CPU frequencies detected, using: 2201.000000.
[NODE7:155955] PML yalla cannot be selected
Warning: Conflicting CPU frequencies detected, using: 2201.000000.
Warning: Conflicting CPU frequencies detected, using: 2201.000000.
Warning: Conflicting CPU frequencies detected, using: 2201.000000.
[NODE2:22125] [[55778,0],1] usock_peer_send_blocking: send() to socket 51
failed: Broken pipe (32)
[NODE2:22125] [[55778,0],1] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE2:22125] [[55778,0],1]-[[55778,1],1] usock_peer_accept:
usock_peer_send_connect_ack failed
[NODE2:22125] [[55778,0],1] usock_peer_send_blocking: send() to socket 57
failed: Broken pipe (32)
[NODE2:22125] [[55778,0],1] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE2:22125] [[55778,0],1]-[[55778,1],2] usock_peer_accept:
usock_peer_send_connect_ack failed
[NODE2:22125] [[55778,0],1] usock_peer_send_blocking: send() to socket 59
failed: Broken pipe (32)
[NODE2:22125] [[55778,0],1] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE2:22125] [[55778,0],1]-[[55778,1],3] usock_peer_accept:
usock_peer_send_connect_ack failed
[NODE2:22125] [[55778,0],1] usock_peer_send_blocking: send() to socket 68
failed: Broken pipe (32)
[NODE2:22125] [[55778,0],1] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE2:22125] [[55778,0],1]-[[55778,1],4] usock_peer_accept:
usock_peer_send_connect_ack failed
[NODE2:22125] [[55778,0],1] usock_peer_send_blocking: send() to socket 72
failed: Broken pipe (32)
[NODE2:22125] [[55778,0],1] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE2:22125] [[55778,0],1]-[[55778,1],5] usock_peer_accept:
usock_peer_send_connect_ack failed
[NODE2:22125] [[55778,0],1] usock_peer_send_blocking: send() to socket 75
failed: Broken pipe (32)
[NODE2:22125] [[55778,0],1] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE2:22125] [[55778,0],1]-[[55778,1],6] usock_peer_accept:
usock_peer_send_connect_ack failed
[NODE2:22125] [[55778,0],1] usock_peer_send_blocking: send() to socket 35
failed: Broken pipe (32)
[NODE2:22125] [[55778,0],1] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE2:22125] [[55778,0],1]-[[55778,1],7] usock_peer_accept:
usock_peer_send_connect_ack failed
[NODE2:22125] [[55778,0],1] usock_peer_send_blocking: send() to socket 62
failed: Broken pipe (32)
[NODE2:22125] [[55778,0],1] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE2:22125] [[55778,0],1]-[[55778,1],9] usock_peer_accept:
usock_peer_send_connect_ack failed
[NODE2:22125] [[55778,0],1] usock_peer_send_blocking: send() to socket 54
failed: Broken pipe (32)
[NODE2:22125] [[55778,0],1] ORTE_ERROR_LOG: Unreachable in file
oob_usock_connection.c at line 315
[NODE2:22125] [[55778,0],1]-[[55778,1],10] usock_peer_accept:
usock_peer_send_connect_ack failed
Warning: Conflicting CPU frequencies detected, using: 1900.000000.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
Warning: Conflicting CPU frequencies detected, using: 1900.000000.
[NODE2:22130] PML yalla cannot be selected
Warning: Conflicting CPU frequencies detected, using: 1900.000000.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
0 /lib/x86_64-linux-gnu/libc.so.6(+0x36150) [0x7fd3a1cb2150]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0) [0x7fd39de2e0c0]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7fd39751e0a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7fd39751d8df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7fd39751fb63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7fd397522a57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7fd39c17f577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7fd3a17623a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7fd3a22f6e09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7fd3a176b22d]
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7fd3a229ce66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7fd3a22bbfbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fd3a1c9d76d]
14 phyml-mpi() [0x4028f9]
===================
Warning: Conflicting CPU frequencies detected, using: 1900.000000.
[NODE2:22132] PML yalla cannot be selected
Warning: Conflicting CPU frequencies detected, using: 1900.000000.
[NODE2:22134] PML yalla cannot be selected
Warning: Conflicting CPU frequencies detected, using: 1900.000000.
[NODE2:22137] PML yalla cannot be selected
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
0 /lib/x86_64-linux-gnu/libc.so.6(+0x36150) [0x7fdf6f257150]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0) [0x7fdf673bb0c0]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7fdf64bfa0a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7fdf64bf98df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7fdf64bfbb63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7fdf64bfea57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7fdf6570c577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7fdf6ed073a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7fdf6f89be09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7fdf6ed1022d]
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7fdf6f841e66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7fdf6f860fbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fdf6f24276d]
14 phyml-mpi() [0x4028f9]
===================
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
[NODE2:22144] PML yalla cannot be selected
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
0 /lib/x86_64-linux-gnu/libc.so.6(+0x36150) [0x7ff41e98a150]
1 /usr/lib/libibverbs.so.1(ibv_dealloc_pd+0) [0x7ff4169b90c0]
2 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_ib_init_devices+0x401)
[0x7ff4141f80a1]
3 /opt/mellanox/mxm/lib/libmxm.so.2(+0x158df) [0x7ff4141f78df]
4 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_components_init+0x43)
[0x7ff4141f9b63]
5 /opt/mellanox/mxm/lib/libmxm.so.2(mxm_init+0x137) [0x7ff4141fca57]
6
/opt/openmpi-2.0.1/lib/openmpi/mca_pml_yalla.so(mca_pml_yalla_open+0x107)
[0x7ff414d0a577]
7
/opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc5)
[0x7ff41e43a3a5]
8 /opt/openmpi-2.0.1/lib/libmpi.so.20(+0x9ee09) [0x7ff41efcee09]
9 /opt/openmpi-2.0.1/lib/libopen-pal.so.20(mca_base_framework_open+0x9d)
[0x7ff41e44322d]
10 /opt/openmpi-2.0.1/lib/libmpi.so.20(ompi_mpi_init+0x4b6) [0x7ff41ef74e66]
11 /opt/openmpi-2.0.1/lib/libmpi.so.20(MPI_Init+0x8b) [0x7ff41ef93fbb]
12 phyml-mpi() [0x401cf9]
13 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7ff41e97576d]
14 phyml-mpi() [0x4028f9]
===================
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
[NODE2:22150] PML yalla cannot be selected
Warning: Conflicting CPU frequencies detected, using: 2001.000000.
[NODE2:22154] PML yalla cannot be selected
[NODE2:21943] 11 more processes have sent help message help-btl-vader.txt /
knem fail open
[NODE2:21943] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help
/ error messages
[NODE2:21943] 11 more processes have sent help message help-btl-vader.txt /
knem requested but not available
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users