We seem to have an issue similar to this thread
"Bug in openmpi 1.5.4 in paffinity"
http://www.open-mpi.org/community/lists/users/2011/09/17151.php
Using the following version of hwloc (from EPEL repo - we run CentOS 5.6)
$ hwloc-info --version
hwloc-info 1.1rc6
A simple "mpi_hello" program works fine with cpusets and openMPI 1.4.2 but
with openMPI 1.5.3 and cpusets we get the following segfault (works fine
on the node without cpusets enabled):
[red2:28263] *** Process received signal ***
[red2:28263] Signal: Segmentation fault (11)
[red2:28263] Signal code: Address not mapped (1)
[red2:28263] Failing at address: 0x8
[red2:28263] [ 0] /lib64/libpthread.so.0 [0x2b3dce315b10]
[red2:28263] [ 1]
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so(opal_paffinity_hwloc_bitmap_or+0x142)
[0x2b3dcef75cb2]
[red2:28263] [ 2]
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so
[0x2b3dcef71404]
[red2:28263] [ 3]
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so
[0x2b3dcef6bb26]
[red2:28263] [ 4]
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so(opal_paffinity_hwloc_topology_load+0xe2)
[0x2b3dcef6e0b2]
[red2:28263] [ 5]
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so
[0x2b3dcef68b72]
[red2:28263] [ 6]
/opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(mca_base_components_open+0x302)
[0x2b3dcd2b08f2]
[red2:28263] [ 7]
/opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(opal_paffinity_base_open+0x67)
[0x2b3dcd2d3a87]
[red2:28263] [ 8]
/opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(opal_init+0x71)
[0x2b3dcd28bfb1]
[red2:28263] [ 9]
/opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(orte_init+0x23)
[0x2b3dcd2318f3]
[red2:28263] [10] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x4049b5]
[red2:28263] [11] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x404388]
[red2:28263] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b3dce540994]
[red2:28263] [13] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x4042b9]
[red2:28263] *** End of error message ***
/var/spool/torque/mom_priv/jobs/968.SC: line 3: 28263 Segmentation fault
/opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun -np 2 ./a.out
Please let me know if you need more information about this issue
thanks
-k