On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote: > > Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit : > > On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote: > >> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit : > >>> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote: > >>>> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote: > >>>>> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote: > >>>>>> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote: > >>>>>>> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote: > >>>>>>>> Hybrid CPU topologies combine CPUs of different microarchitectures > >>>>>>>> in the > >>>>>>>> same die. Thus, even though the instruction set is compatible among > >>>>>>>> all > >>>>>>>> CPUs, there may still be differences in features (e.g., some CPUs may > >>>>>>>> have counters that others CPU do not). There may be applications > >>>>>>>> interested in knowing the type of micro-architecture topology of the > >>>>>>>> system to make decisions about process affinity. > >>>>>>>> > >>>>>>>> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/ > >>>>>>>> cpu_capacity) may be used to infer the types of micro-architecture > >>>>>>>> of the > >>>>>>>> CPUs in the platform, it may not be entirely accurate. For instance, > >>>>>>>> two > >>>>>>>> subsets of CPUs with different types of micro-architecture may have > >>>>>>>> the > >>>>>>>> same capacity due to power or thermal constraints. > >>>>>>>> > >>>>>>>> Create the new directory /sys/devices/system/cpu/types. Under such > >>>>>>>> directory, create individual subdirectories for each type of CPU > >>>>>>>> micro- > >>>>>>>> architecture. Each subdirectory will have cpulist and cpumap files. > >>>>>>>> This > >>>>>>>> makes it convenient for user space to read all the CPUs of the same > >>>>>>>> type > >>>>>>>> at once without having to inspect each CPU individually. > >>>>>>>> > >>>>>>>> Implement a generic interface using weak functions that > >>>>>>>> architectures can > >>>>>>>> override to indicate a) support for CPU types, b) the CPU type > >>>>>>>> number, and > >>>>>>>> c) a string to identify the CPU vendor and type. > >>>>>>>> > >>>>>>>> For example, an x86 system with one Intel Core and four Intel Atom > >>>>>>>> CPUs > >>>>>>>> would look like this (other architectures have the hooks to use > >>>>>>>> whatever > >>>>>>>> directory naming convention below "types" that meets their needs): > >>>>>>>> > >>>>>>>> user@host:~$: ls /sys/devices/system/cpu/types > >>>>>>>> intel_atom_0 intel_core_0 > >>>>>>>> > >>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0 > >>>>>>>> cpulist cpumap > >>>>>>>> > >>>>>>>> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0 > >>>>>>>> cpulist cpumap > >>>>>>>> > >>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap > >>>>>>>> 0f > >>>>>>>> > >>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist > >>>>>>>> 0-3 > >>>>>>>> > >>>>>>>> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap > >>>>>>>> 10 > >>>>>>>> > >>>>>>>> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist > >>>>>>>> 4 > >>>>>> Thank you for the quick and detailed Greg! > >>>>>> > >>>>>>> The output of 'tree' sometimes makes it easier to see here, or: > >>>>>>> grep -R . * > >>>>>>> also works well. > >>>>>> Indeed, this would definitely make it more readable. > >>>>>> > >>>>>>>> On non-hybrid systems, the /sys/devices/system/cpu/types directory > >>>>>>>> is not > >>>>>>>> created. Add a hook for this purpose. > >>>>>>> Why should these not show up if the system is not "hybrid"? > >>>>>> My thinking was that on a non-hybrid system, it does not make sense to > >>>>>> create this interface, as all the CPUs will be of the same type. > >>>>> Why not just have this an attribute type in the existing cpuX directory? > >>>>> Why do this have to be a totally separate directory and userspace has to > >>>>> figure out to look in two different spots for the same cpu to determine > >>>>> what it is? > >>>> But if the type is located under cpuX, usespace would need to traverse > >>>> all the CPUs and create its own cpu masks. Under the types directory it > >>>> would only need to look once for each type of CPU, IMHO. > >>> What does a "mask" do? What does userspace care about this? You would > >>> have to create it by traversing the directories you are creating anyway, > >>> so it's not much different, right? > >> > >> Hello > >> > >> Sorry for the late reply. As the first userspace consumer of this > >> interface [1], I can confirm that reading a single file to get the mask > >> would be better, at least for performance reason. On large platforms, we > >> already have to read thousands of sysfs files to get CPU topology and > >> cache information, I'd be happy not to read one more file per cpu. > >> > >> Reading these sysfs files is slow, and it does not scale well when > >> multiple processes read them in parallel. > > Really? Where is the slowdown? Would something like readfile() work > > better for you for that? > > > > https://lore.kernel.org/linux-api/20200704140250.423345-1-gre...@linuxfoundation.org/ > > > I guess readfile would improve the sequential case by avoiding syscalls > but it would not improve the parallel case since syscalls shouldn't have > any parallel issue?
syscalls should not have parallel issues at all. > We've been watching the status of readfile() since it was posted on LKML > 6 months ago, but we were actually wondering if it would end up being > included at some point. It needs a solid reason to be merged. My "test" benchmarks are fun to run, but I have yet to find a real need for it anywhere as the open/read/close syscall overhead seems to be lost in the noise on any real application workload that I can find. If you have a real need, and it reduces overhead and cpu usage, I'm more than willing to update the patchset and resubmit it. > > How does multiple processes slow anything down, there shouldn't be any > > shared locks here. > > > When I benchmarked this in 2016, reading a single (small) sysfs file was > 41x slower when running 64 processes simultaneously on a 64-core Knights > Landing than reading from a single process. On a SGI Altix UV with 12x > 8-core CPUs, reading from one process per CPU (12 total) was 60x slower > (which could mean NUMA affinity matters), and reading from one process > per core (96 total) was 491x slower. > > I will try to find some time to dig further on recent kernels with perf > and readfile (both machines were running RHEL7). 2016 was a long time ago in kernel-land, please retest on a kernel.org release, not a RHEL monstrosity. > >> There are ways to avoid this > >> multiple discoveries by sharing hwloc info through XML or shmem, but it > >> will take years before all developers of different runtimes all > >> implement this :) > > I don't understand, what exactly are you suggesting we do here instead? > > > I was just saying userspace has ways to mitigate the issue but it will > take time because many different projects are involved. I still don't understand, what issue are you referring to? thanks, greg k-h