Le 28/02/2014 21:30, Gus Correa a écrit : > Hi Brice > > The (pdf) output of lstopo shows one L1d (16k) for each core, > and one L1i (64k) for each *pair* of cores. > Is this wrong?
It's correct. AMD uses this "dual-core compute unit" where L2 and L1i are shared but L1d isn't. > BTW, if there are any helpful web links, or references, or graphs > about the AMD cache structure, I would love to know. I don't have a common place to find all information unfortunately. Cache sizes is easy to find, but sharing isn't always specified. I often end up reading early processor reviews on tech sites such as http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested > I am a bit skeptical that the BIOS is the culprit because I replaced > two motherboards (node14 and node16), and only node14 doesn't pass > the hwloc-gather-topology test. > Just in case, I attach the diagnostic for node16 also, Hmmm that's very interesting. I assume you have the same kernels on all these machines? I have seen a couple cases where the kernel would change the topology for a same version of the BIOS (for instance old kernels didn't know that L1i is shared by pair of cores on your CPU), but I have never seen a case where the kernel changes and *breaks* things. Can you compare the output of "dmesg | grep SRAT" (or grep SRAT /var/log/dmesg or kern.log or whatever on your distro) on these nodes? SRAT is the hardware table that the kernel reads before filling sysfs. You'll see [ 0.000000] SRAT: PXM 0 -> APIC 0x07 -> Node 0 which basically means that CPU7 is close to NUMA node 0. If you only see Nodes 0-1 on node14, and Nodes 0-3 on node15 and node16, that would at least confirm that the bug is in the hardware. One last idea could be a different BIOS config, and the BIOS being buggy only in one of these configs. I've seen that with "interleaved" NUMA memory config in Supermicro BIOS several years ago. Brice > if you want to take a look. :) > > FYI, the two new motherboards (nodes 14 and 16) > have a *newer* BIOS version (AMI, version 3.5, 11/25/2013) > then the one in the > original nodes (node15 below) (AMI, version 3.0, 08/31/2012). > I even thought of upgrading the old nodes' BIOSes ... > ... but now I am not so sure about this ... :( > > New motherboards: > > [root@node14 ~]# dmidecode -s bios-vendor > American Megatrends Inc. > [root@node14 ~]# dmidecode -s bios-version > 3.5 > [root@node14 ~]# dmidecode -s bios-release-date > 11/25/2013 > > ** > > [root@node16 ~]# dmidecode -s bios-vendor > American Megatrends Inc. > [root@node16 ~]# dmidecode -s bios-version > 3.5 > [root@node16 ~]# dmidecode -s bios-release-date > 11/25/2013 > > ** > > Original motherboard: > > [root@node15 ~]# dmidecode -s bios-vendor > American Megatrends Inc. > [root@node15 ~]# dmidecode -s bios-version > 3.0 > [root@node15 ~]# dmidecode -s bios-release-date > 08/31/2012 > > ** > > Thanks again for your help and advice. > > Gus Correa > >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users