Le 05/09/2011 21:29, Brice Goglin a écrit : > Dear Ake, > Could you try the attached patch? It's not optimized, but it's probably > going in the right direction. > (and don't forget to remove the above comment-out if you tried it).
Actually, now that I've seen your entire topology, I found out that the real fix is the attached patch. This is actually a Magny-Cours specific problem (having 2 NUMA nodes inside each socket is quite unusual). I've already committed this patch to hwloc trunk and backported to the v1.2 branch. It could be applied in OMPI 1.5.5. The patch that I sent earlier is not needed as long as cgroups don't reduce the available memory (your cgroups don't). I'll fix this other bug properly soon. Brice
commit e4d23aea23d8f4e96859e9fd88d565388b4a5763 Author: bgoglin <bgoglin@4b44e086-7f34-40ce-a3bd-00e031736276> List-Post: users@lists.open-mpi.org Date: Mon Sep 5 21:21:09 2011 +0000 Backport trunk commits r3764-3765 into v1.3 branch: * Don't remove empty objects if they contain some NUMA nodes We don't want to remove empty numanode. Make sure we don't remove them by mistake when removing empty sockets (each contains two NUMA nodes) on AMD magny-cours machines. * Add a testcase with cgroup on Magny-Cours It's interesting because we want to make sure we don't remove empty NUMA node that are below sockets. git-svn-id: https://svn.open-mpi.org/svn/hwloc/branches/v1.2@3767 4b44e086-7f34-40ce-a3bd-00e031736276 diff --git a/src/topology.c b/src/topology.c index 8d688fb..74f16eb 100644 --- a/src/topology.c +++ b/src/topology.c @@ -1142,12 +1142,12 @@ remove_empty(hwloc_topology_t topology, hwloc_obj_t *pobj) remove_empty(topology, pchild); if (obj->type != HWLOC_OBJ_NODE - && obj->cpuset /* FIXME: needed for PCI devices? */ + && !obj->first_child /* only remove if all children were removed above, so that we don't remove parents of NUMAnode */ && hwloc_bitmap_iszero(obj->cpuset)) { /* Remove empty children */ hwloc_debug("%s", "\nRemoving empty object "); print_object(topology, 0, obj); - unlink_and_free_object_and_children(pobj); + unlink_and_free_single_object(pobj); } }