Hello, Could you log again on this node (with same cgroups enabled), run hwloc-gather-topology <name> and send the resulting <name>.output and <name>.tar.bz2?
Send them to the hwloc-devel or open a ticket on https://svn.open-mpi.org/trac/hwloc (or send them to me in private if you don't want to subscribe). thanks Brice Le 04/09/2011 22:00, Ake Sandgren a écrit : > Hi! > > I'm getting a segfault in hwloc_setup_distances_from_os_matrix in the > call to hwloc_bitmap_or due to objs or objs[i]->cpuset being freed and > containing garbage, objs[i]->cpuset has infinite < 0. > > I only get this when using slurm with cgroups, asking for 2 nodes with 1 > cpu each. The cpuset is then already set when mpiexec starts and > something breaks down. > > valgrind on mpiexec says: > ==27540== Invalid read of size 8 > ==27540== at 0x7178F79: > opal_paffinity_hwloc_finalize_logical_distances (distances.c:412) > ==27540== by 0x7172C1E: hwloc_discover (topology.c:1805) > ==27540== by 0x71745F2: opal_paffinity_hwloc_topology_load > (topology.c:2244) > ==27540== by 0x7164FB4: hwloc_open (paffinity_hwloc_component.c:93) > ==27540== by 0x4F98D2E: mca_base_components_open > (mca_base_components_open.c:214) > ==27540== by 0x500084B: opal_paffinity_base_open > (paffinity_base_open.c:120) > ==27540== by 0x4F525BB: opal_init (opal_init.c:307) > ==27540== by 0x4E50CA8: orte_init (orte_init.c:78) > ==27540== by 0x403C8F: orterun (orterun.c:615) > ==27540== by 0x4032C3: main (main.c:13) > ==27540== Address 0x6e38380 is 160 bytes inside a block of size 248 > free'd > ==27540== at 0x4C270BD: free (vg_replace_malloc.c:366) > ==27540== by 0x716B6A1: unlink_and_free_object_and_children > (topology.c:1131) > ==27540== by 0x716BB35: remove_empty (topology.c:1150) > ==27540== by 0x7170CBB: hwloc_discover (topology.c:1768) > ==27540== by 0x71745F2: opal_paffinity_hwloc_topology_load > (topology.c:2244) > ==27540== by 0x7164FB4: hwloc_open (paffinity_hwloc_component.c:93) > ==27540== by 0x4F98D2E: mca_base_components_open > (mca_base_components_open.c:214) > ==27540== by 0x500084B: opal_paffinity_base_open > (paffinity_base_open.c:120) > ==27540== by 0x4F525BB: opal_init (opal_init.c:307) > ==27540== by 0x4E50CA8: orte_init (orte_init.c:78) > ==27540== by 0x403C8F: orterun (orterun.c:615) > ==27540== by 0x4032C3: main (main.c:13) > > I hope the above info is enough and that you can fix it :-) > > /Åke S. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users