Dear users,

I am totally stuck using openmpi. I have two versions on my machine: 1.8.1 and 2.0.0, and none of them work. When use the mpirun *1.8.1 version*, I get the following error:

librdmacm: Fatal: unable to open RDMA device
librdmacm: Fatal: unable to open RDMA device
librdmacm: Fatal: unable to open RDMA device
librdmacm: Fatal: unable to open RDMA device
librdmacm: Fatal: unable to open RDMA device
--------------------------------------------------------------------------
Open MPI failed to open the /dev/knem device due to a local error.
Please check with your system administrator to get the problem fixed,
or set the btl_sm_use_knem MCA parameter to 0 to run without /dev/knem
support.

  Local host: MYMACHINE
  Errno:      2 (No such file or directory)
--------------------------------------------------------------------------
--------------------------------------------------------------------------
Open MPI failed to open an OpenFabrics device.  This is an unusual
error; the system reported the OpenFabrics device as being present,
but then later failed to access it successfully.  This usually
indicates either a misconfiguration or a failed OpenFabrics hardware
device.

All OpenFabrics support has been disabled in this MPI process; your
job may or may not continue.

  Hostname:    MYMACHINE
  Device name: mlx4_0
  Errror (22): Invalid argument
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[[60527,1],4]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: usNIC
  Host: MYMACHINE

When I use the *2.0.0 version*, I get something strange, it seems openmpi-2.0.0 looks for openmpi-1.8.1 libraries?:

A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      MYMACHINE
Framework: ess
Component: pmi
--------------------------------------------------------------------------
[MYMACHINE:126820] *** Process received signal ***
[MYMACHINE:126820] Signal: Segmentation fault (11)
[MYMACHINE:126820] Signal code: Address not mapped (1)
[MYMACHINE:126820] Failing at address: 0x1c0
[MYMACHINE:126820] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f39b2ec4cb0] [MYMACHINE:126820] [ 1] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f39b23e7430] [MYMACHINE:126820] [ 2] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f39b2676a57] [MYMACHINE:126820] [ 3] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f39b2676fb7] [MYMACHINE:126820] [ 4] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f39b267718f] [MYMACHINE:126820] [ 5] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f39b23c5f2a] [MYMACHINE:126820] [ 6] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f39b23c70c3] [MYMACHINE:126820] [ 7] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f39b23c8278] [MYMACHINE:126820] [ 8] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f39b23d1e6c] [MYMACHINE:126820] [ 9] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f39b2666e21] [MYMACHINE:126820] [10] /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f39b3115c92] [MYMACHINE:126820] [11] /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f39b31387bb]
[MYMACHINE:126820] [12] mb[0x402024]
[MYMACHINE:126820] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f39b2b187ed]
[MYMACHINE:126820] [14] mb[0x402111]
[MYMACHINE:126820] *** End of error message ***
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      MYMACHINE
Framework: ess
Component: pmi
--------------------------------------------------------------------------
[MYMACHINE:126821] *** Process received signal ***
[MYMACHINE:126821] Signal: Segmentation fault (11)
[MYMACHINE:126821] Signal code: Address not mapped (1)
[MYMACHINE:126821] Failing at address: 0x1c0
[MYMACHINE:126821] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fed834bbcb0] [MYMACHINE:126821] [ 1] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7fed829de430] [MYMACHINE:126821] [ 2] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7fed82c6da57] [MYMACHINE:126821] [ 3] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7fed82c6dfb7] [MYMACHINE:126821] [ 4] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7fed82c6e18f] [MYMACHINE:126821] [ 5] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7fed829bcf2a] [MYMACHINE:126821] [ 6] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7fed829be0c3] [MYMACHINE:126821] [ 7] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7fed829bf278] [MYMACHINE:126821] [ 8] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7fed829c8e6c] [MYMACHINE:126821] [ 9] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7fed82c5de21] [MYMACHINE:126821] [10] /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7fed8370cc92] [MYMACHINE:126821] [11] /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7fed8372f7bb]
[MYMACHINE:126821] [12] mb[0x402024]
[MYMACHINE:126821] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fed8310f7ed]
[MYMACHINE:126821] [14] mb[0x402111]
[MYMACHINE:126821] *** End of error message ***
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      MYMACHINE
Framework: ess
Component: pmi
--------------------------------------------------------------------------
[MYMACHINE:126822] *** Process received signal ***
[MYMACHINE:126822] Signal: Segmentation fault (11)
[MYMACHINE:126822] Signal code: Address not mapped (1)
[MYMACHINE:126822] Failing at address: 0x1c0
[MYMACHINE:126822] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f0174bc0cb0] [MYMACHINE:126822] [ 1] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f01740e3430] [MYMACHINE:126822] [ 2] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f0174372a57] [MYMACHINE:126822] [ 3] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f0174372fb7] [MYMACHINE:126822] [ 4] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f017437318f] [MYMACHINE:126822] [ 5] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f01740c1f2a] [MYMACHINE:126822] [ 6] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f01740c30c3] [MYMACHINE:126822] [ 7] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f01740c4278] [MYMACHINE:126822] [ 8] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f01740cde6c] [MYMACHINE:126822] [ 9] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f0174362e21] [MYMACHINE:126822] [10] /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f0174e11c92] [MYMACHINE:126822] [11] /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f0174e347bb]
[MYMACHINE:126822] [12] mb[0x402024]
[MYMACHINE:126822] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f01748147ed]
[MYMACHINE:126822] [14] mb[0x402111]
[MYMACHINE:126822] *** End of error message ***
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      MYMACHINE
Framework: ess
Component: pmi
--------------------------------------------------------------------------
[MYMACHINE:126823] *** Process received signal ***
[MYMACHINE:126823] Signal: Segmentation fault (11)
[MYMACHINE:126823] Signal code: Address not mapped (1)
[MYMACHINE:126823] Failing at address: 0x1c0
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      MYMACHINE
Framework: ess
Component: pmi
--------------------------------------------------------------------------
[MYMACHINE:126823] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7fcd9cb58cb0]
[MYMACHINE:126823] [ 1] [MYMACHINE:126824] *** Process received signal ***
[MYMACHINE:126824] Signal: Segmentation fault (11)
[MYMACHINE:126824] Signal code: Address not mapped (1)
[MYMACHINE:126824] Failing at address: 0x1c0
/opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7fcd9c07b430]
[MYMACHINE:126823] [ 2] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7fcd9c30aa57] [MYMACHINE:126823] [ 3] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7fcd9c30afb7] [MYMACHINE:126823] [ 4] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7fcd9c30b18f] [MYMACHINE:126823] [ 5] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7fcd9c059f2a] [MYMACHINE:126823] [MYMACHINE:126824] [ 0] [ 6] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f2f0c611cb0] [MYMACHINE:126824] [ 1] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7fcd9c05b0c3] [MYMACHINE:126823] [ 7] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(opal_libevent2021_event_add+0x10)[0x7f2f0bb34430] [MYMACHINE:126824] [ 2] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(+0x25a57)[0x7f2f0bdc3a57] [MYMACHINE:126824] [ 3] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help_norender+0x197)[0x7f2f0bdc3fb7] [MYMACHINE:126824] [ 4] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_show_help+0x10f)[0x7f2f0bdc418f] [MYMACHINE:126824] [ 5] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7fcd9c05c278] [MYMACHINE:126823] [ 8] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7fcd9c065e6c] [MYMACHINE:126823] [ 9] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7fcd9c2fae21] [MYMACHINE:126823] [10] /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7fcd9cda9c92] [MYMACHINE:126823] [11] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(+0x41f2a)[0x7f2f0bb12f2a] [MYMACHINE:126824] [ 6] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_components_filter+0x273)[0x7f2f0bb140c3] [MYMACHINE:126824] [ 7] /opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7fcd9cdcc7bb]
[MYMACHINE:126823] [12] mb[0x402024]
[MYMACHINE:126823] [13] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_components_open+0x58)[0x7f2f0bb15278] [MYMACHINE:126824] [ 8] /opt/openmpi-1.8.1/lib/libopen-pal.so.6(mca_base_framework_open+0x7c)[0x7f2f0bb1ee6c] [MYMACHINE:126824] [ 9] /opt/openmpi-1.8.1/lib/libopen-rte.so.7(orte_init+0x111)[0x7f2f0bdb3e21] [MYMACHINE:126824] [10] /opt/openmpi-1.8.1/lib/libmpi.so.1(ompi_mpi_init+0x1c2)[0x7f2f0c862c92] [MYMACHINE:126824] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fcd9c7ac7ed]
[MYMACHINE:126823] [14] mb[0x402111]
[MYMACHINE:126823] *** End of error message ***
/opt/openmpi-1.8.1/lib/libmpi.so.1(MPI_Init+0x1ab)[0x7f2f0c8857bb]
[MYMACHINE:126824] [12] mb[0x402024]
[MYMACHINE:126824] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f2f0c2657ed]
[MYMACHINE:126824] [14] mb[0x402111]
[MYMACHINE:126824] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node MYMACHINE exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

I am running my script with *mpirun* in a *single node of a SGE cluster*.

I would be very grateful if somebody could give me some hints to solve this issue.

Thanks a lot in advance
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to