We seem to have an issue similar to this thread

"Bug in openmpi 1.5.4 in paffinity"
http://www.open-mpi.org/community/lists/users/2011/09/17151.php

Using the following version of hwloc (from EPEL repo - we run CentOS 5.6)

$ hwloc-info --version
hwloc-info 1.1rc6

A simple "mpi_hello" program works fine with cpusets and openMPI 1.4.2 but with openMPI 1.5.3 and cpusets we get the following segfault (works fine on the node without cpusets enabled):

[red2:28263] *** Process received signal ***
[red2:28263] Signal: Segmentation fault (11)
[red2:28263] Signal code: Address not mapped (1)
[red2:28263] Failing at address: 0x8
[red2:28263] [ 0] /lib64/libpthread.so.0 [0x2b3dce315b10]
[red2:28263] [ 1] 
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so(opal_paffinity_hwloc_bitmap_or+0x142)
 [0x2b3dcef75cb2]
[red2:28263] [ 2] 
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so 
[0x2b3dcef71404]
[red2:28263] [ 3] 
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so 
[0x2b3dcef6bb26]
[red2:28263] [ 4] 
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so(opal_paffinity_hwloc_topology_load+0xe2)
 [0x2b3dcef6e0b2]
[red2:28263] [ 5] 
/opt/sharcnet/openmpi/1.5.4/intel/lib/openmpi/mca_paffinity_hwloc.so 
[0x2b3dcef68b72]
[red2:28263] [ 6] 
/opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(mca_base_components_open+0x302)
 [0x2b3dcd2b08f2]
[red2:28263] [ 7] 
/opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(opal_paffinity_base_open+0x67)
 [0x2b3dcd2d3a87]
[red2:28263] [ 8] 
/opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(opal_init+0x71) 
[0x2b3dcd28bfb1]
[red2:28263] [ 9] 
/opt/sharcnet/openmpi/1.5.4/intel/lib/libopen-rte.so.3(orte_init+0x23) 
[0x2b3dcd2318f3]
[red2:28263] [10] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x4049b5]
[red2:28263] [11] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x404388]
[red2:28263] [12] /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b3dce540994]
[red2:28263] [13] /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun [0x4042b9]
[red2:28263] *** End of error message ***
/var/spool/torque/mom_priv/jobs/968.SC: line 3: 28263 Segmentation fault /opt/sharcnet/openmpi/1.5.4/intel/bin/mpirun -np 2 ./a.out

Please let me know if you need more information about this issue

thanks
-k

Reply via email to