On Jan 31, 2008 5:42 AM, Brice Goglin <[EMAIL PROTECTED]> wrote: > Paul Mundt wrote: > > On Wed, Jan 30, 2008 at 07:48:13PM -0500, Chris Snook wrote: > > > >> While pondering ways to optimize I/O and swapping on large NUMA machines, I > >> noticed that the numa_node field in struct device isn't actually used > >> anywhere. We just have a couple dozen lines of code to conditionally > >> create a sysfs file that will always return -1. Is anyone even working on > >> code to actually use this field? I think it's a good piece of information > >> to keep track of, so I'm not suggesting we remove it, but I want to make > >> sure I'm not stepping on toes or duplicating effort if I try to make it > >> useful. > >> > > It's manipulated with accessors. If you look at the users of > > dev_to_node()/set_dev_node() you can see where it's being used. It's > > primarily used in allocation paths for node locality, and the existing > > set_dev_node() callsites are places where node locality information > > already exists (ie, which node a given controller sits on). You can see > > this in places like PCI (pcibus_to_node()) and USB, with node allocation > > hints used in places like the dmapool and skb alloc paths. > > > > The in-kernel use looks perfectly sane in that regard, though I'm not > > sure what the point of exporting this as a RO attribute to userspace is. > > Presumably someone has a tool somewhere that cares about this. > > > > I added the numa_node sysfs attribute in the beginning to make it easier > to bind processes near some devices. So yes I have some user-space tool > using it. It is much easier to use than the local_cpus field on large > machines, especially when you use the libnuma interface to bind things, > since you don't have to translate numa_node from/to cpumasks. > > It works fine on regular machines such as dual opterons. However, I > noticed recently that it was wrong on some quad-opteron machines (see > http://marc.info/?l=linux-pci&m=119072400008538&w=2) because something > is not initialized in the right order. But I haven't tested 2.6.24 on > this hardware yet, and I don't know if things have changed regarding this.
that will depend if you dsdt have _PXM for your pci root bus. otherwise you will get all -1 I have a patchset locally that it call bus_numa, can get that from pci conf space for AMD64 based machine. so you can use that for AMD64 system without _PXM for pci root bus or even with acpi=off. let me know if you want test it. YH -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/