On Fri, Nov 13, 2020 at 2:15 PM Brice Goglin <brice.gog...@gmail.com> wrote: > > > Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit : > > On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote: > > Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit : > > On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote: > > Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit : > > On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote: > > On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote: > > On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote: > > On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote: > > On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote: > > Hybrid CPU topologies combine CPUs of different microarchitectures in the > same die. Thus, even though the instruction set is compatible among all > CPUs, there may still be differences in features (e.g., some CPUs may > have counters that others CPU do not). There may be applications > interested in knowing the type of micro-architecture topology of the > system to make decisions about process affinity. > > While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/ > cpu_capacity) may be used to infer the types of micro-architecture of the > CPUs in the platform, it may not be entirely accurate. For instance, two > subsets of CPUs with different types of micro-architecture may have the > same capacity due to power or thermal constraints. > > Create the new directory /sys/devices/system/cpu/types. Under such > directory, create individual subdirectories for each type of CPU micro- > architecture. Each subdirectory will have cpulist and cpumap files. This > makes it convenient for user space to read all the CPUs of the same type > at once without having to inspect each CPU individually. > > Implement a generic interface using weak functions that architectures can > override to indicate a) support for CPU types, b) the CPU type number, and > c) a string to identify the CPU vendor and type. > > For example, an x86 system with one Intel Core and four Intel Atom CPUs > would look like this (other architectures have the hooks to use whatever > directory naming convention below "types" that meets their needs): > > user@host:~$: ls /sys/devices/system/cpu/types > intel_atom_0 intel_core_0 > > user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0 > cpulist cpumap > > user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0 > cpulist cpumap > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap > 0f > > user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist > 0-3 > > user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap > 10 > > user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist > 4 > > Thank you for the quick and detailed Greg! > > The output of 'tree' sometimes makes it easier to see here, or: > grep -R . * > also works well. > > Indeed, this would definitely make it more readable. > > On non-hybrid systems, the /sys/devices/system/cpu/types directory is not > created. Add a hook for this purpose. > > Why should these not show up if the system is not "hybrid"? > > My thinking was that on a non-hybrid system, it does not make sense to > create this interface, as all the CPUs will be of the same type. > > Why not just have this an attribute type in the existing cpuX directory? > Why do this have to be a totally separate directory and userspace has to > figure out to look in two different spots for the same cpu to determine > what it is? > > But if the type is located under cpuX, usespace would need to traverse > all the CPUs and create its own cpu masks. Under the types directory it > would only need to look once for each type of CPU, IMHO. > > What does a "mask" do? What does userspace care about this? You would > have to create it by traversing the directories you are creating anyway, > so it's not much different, right? > > Hello > > Sorry for the late reply. As the first userspace consumer of this > interface [1], I can confirm that reading a single file to get the mask > would be better, at least for performance reason. On large platforms, we > already have to read thousands of sysfs files to get CPU topology and > cache information, I'd be happy not to read one more file per cpu. > > Reading these sysfs files is slow, and it does not scale well when > multiple processes read them in parallel. > > Really? Where is the slowdown? Would something like readfile() work > better for you for that? > https://lore.kernel.org/linux-api/20200704140250.423345-1-gre...@linuxfoundation.org/ > > I guess readfile would improve the sequential case by avoiding syscalls > but it would not improve the parallel case since syscalls shouldn't have > any parallel issue? > > syscalls should not have parallel issues at all. > > We've been watching the status of readfile() since it was posted on LKML > 6 months ago, but we were actually wondering if it would end up being > included at some point. > > It needs a solid reason to be merged. My "test" benchmarks are fun to > run, but I have yet to find a real need for it anywhere as the > open/read/close syscall overhead seems to be lost in the noise on any > real application workload that I can find. > > If you have a real need, and it reduces overhead and cpu usage, I'm more > than willing to update the patchset and resubmit it. > > > Good, I'll give it at try. > > > How does multiple processes slow anything down, there shouldn't be any > shared locks here. > > When I benchmarked this in 2016, reading a single (small) sysfs file was > 41x slower when running 64 processes simultaneously on a 64-core Knights > Landing than reading from a single process. On a SGI Altix UV with 12x > 8-core CPUs, reading from one process per CPU (12 total) was 60x slower > (which could mean NUMA affinity matters), and reading from one process > per core (96 total) was 491x slower. > > I will try to find some time to dig further on recent kernels with perf > and readfile (both machines were running RHEL7). > > 2016 was a long time ago in kernel-land, please retest on a kernel.org > release, not a RHEL monstrosity. > > > Quick test on 5.8.14 from Debian (fairly close to mainline) on a server with > 2x20 cores. > > I am measuring the time to do open+read+close of > /sys/devices/system/cpu/cpu15/topology/die_id 1000 times > > With a single process, it takes 2ms (2us per open+read+close, looks OK). > > With one process per core (with careful binding, etc), it jumps from 2ms to > 190ms (without much variation). > > It looks like locks in kernfs_iop_permission and kernfs_dop_revalidate are > causing the issue. > > I am attaching the perf report callgraph output below. > > > > There are ways to avoid this > multiple discoveries by sharing hwloc info through XML or shmem, but it > will take years before all developers of different runtimes all > implement this :) > > I don't understand, what exactly are you suggesting we do here instead? > > I was just saying userspace has ways to mitigate the issue but it will > take time because many different projects are involved. > > I still don't understand, what issue are you referring to? > > > Reading many sysfs files causing the application startup to takes many > seconds when launching multiple process at the same time. > > Brice > > > # To display the perf.data header info, please use --header/--header-only > options. > # > # > # Total Lost Samples: 0 > # > # Samples: 7K of event 'cycles' > # Event count (approx.): 5291578622 > # > # Children Self Command Shared Object Symbol > # ........ ........ ............. ................. > ....................................... > # > 99.91% 0.00% fops_overhead [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > | > ---entry_SYSCALL_64_after_hwframe > do_syscall_64 > | > |--98.69%--__x64_sys_openat > | | > | --98.67%--do_sys_openat2 > | | > | --98.57%--do_filp_open > | path_openat > | | > | > |--81.83%--link_path_walk.part.0 > | | | > | | > |--52.19%--inode_permission.part.0 > | | | | > | | | > --51.86%--kernfs_iop_permission > | | | > | > | | | > |--50.92%--__mutex_lock.constprop.0 > | | | > | | > | | | > | --49.58%--osq_lock > | | | > | > | | | > --0.59%--mutex_unlock > | | | > | | > --29.47%--walk_component > | | | > | | > --29.10%--lookup_fast > | | > | > | | > --28.76%--kernfs_dop_revalidate > | | > | > | | > --28.29%--__mutex_lock.constprop.0 > | | > | > | | > --27.65%--osq_lock > | | > | |--9.60%--lookup_fast > | | | > | | > --9.50%--kernfs_dop_revalidate > | | | > | | > --9.35%--__mutex_lock.constprop.0 > | | > | > | | > --9.18%--osq_lock > | | > | |--6.17%--may_open > | | | > | | > --6.13%--inode_permission.part.0 > | | | > | | > --6.10%--kernfs_iop_permission > | | > | > | | > --5.90%--__mutex_lock.constprop.0 > | | > | > | | > --5.80%--osq_lock > | | > | --0.52%--do_dentry_open > | > --0.63%--__prepare_exit_to_usermode > | > --0.58%--task_work_run > > 99.91% 0.01% fops_overhead [kernel.kallsyms] [k] do_syscall_64 > | > --99.89%--do_syscall_64 > | > |--98.69%--__x64_sys_openat > | | > | --98.67%--do_sys_openat2 > | | > | --98.57%--do_filp_open > | path_openat > | | > | > |--81.83%--link_path_walk.part.0 > | | | > | | > |--52.19%--inode_permission.part.0 > | | | > | > | | | > --51.86%--kernfs_iop_permission > | | | > | > | | | > |--50.92%--__mutex_lock.constprop.0 > | | | > | | > | | | > | --49.58%--osq_lock > | | | > | > | | | > --0.59%--mutex_unlock > | | | > | | > --29.47%--walk_component > | | > | > | | > --29.10%--lookup_fast > | | > | > | | > --28.76%--kernfs_dop_revalidate > | | > | > | | > --28.29%--__mutex_lock.constprop.0 > | | > | > | | > --27.65%--osq_lock > | | > | |--9.60%--lookup_fast > | | | > | | > --9.50%--kernfs_dop_revalidate > | | > | > | | > --9.35%--__mutex_lock.constprop.0 > | | > | > | | > --9.18%--osq_lock > | | > | |--6.17%--may_open > | | | > | | > --6.13%--inode_permission.part.0 > | | > | > | | > --6.10%--kernfs_iop_permission > | | > | > | | > --5.90%--__mutex_lock.constprop.0 > | | > | > | | > --5.80%--osq_lock > | | > | > --0.52%--do_dentry_open > | > --0.63%--__prepare_exit_to_usermode > | > --0.58%--task_work_run > > 98.72% 0.00% fops_overhead [unknown] [k] 0x7379732f73656369 > | > ---0x7379732f73656369 > __GI___libc_open > | > --98.70%--entry_SYSCALL_64_after_hwframe > do_syscall_64 > | > --98.66%--__x64_sys_openat > | > --98.65%--do_sys_openat2 > | > --98.55%--do_filp_open > path_openat > | > > |--81.80%--link_path_walk.part.0 > | | > | > |--52.16%--inode_permission.part.0 > | | > | > | | > --51.86%--kernfs_iop_permission > | | > | > | | > |--50.92%--__mutex_lock.constprop.0 > | | > | | > | | > | --49.58%--osq_lock > | | > | > | | > --0.59%--mutex_unlock > | | > | > --29.47%--walk_component > | > | > | > --29.10%--lookup_fast > | > | > | > --28.76%--kernfs_dop_revalidate > | > | > | > --28.29%--__mutex_lock.constprop.0 > | > | > | > --27.65%--osq_lock > | > > |--9.60%--lookup_fast > | | > | > --9.50%--kernfs_dop_revalidate > | > | > | > --9.35%--__mutex_lock.constprop.0 > | > | > | > --9.18%--osq_lock > | > |--6.17%--may_open > | | > | > --6.13%--inode_permission.part.0 > | > | > | > --6.10%--kernfs_iop_permission > | > | > | > --5.90%--__mutex_lock.constprop.0 > | > | > | > --5.80%--osq_lock > | > > --0.52%--do_dentry_open > > 98.72% 0.00% fops_overhead libc-2.31.so [.] __GI___libc_open > | > ---__GI___libc_open > | > --98.70%--entry_SYSCALL_64_after_hwframe > do_syscall_64 > | > --98.66%--__x64_sys_openat > | > --98.65%--do_sys_openat2 > | > --98.55%--do_filp_open > path_openat > | > > |--81.80%--link_path_walk.part.0 > | | > | > |--52.16%--inode_permission.part.0 > | | > | > | | > --51.86%--kernfs_iop_permission > | | > | > | | > |--50.92%--__mutex_lock.constprop.0 > | | > | | > | | > | --49.58%--osq_lock > | | > | > | | > --0.59%--mutex_unlock > | | > | > --29.47%--walk_component > | > | > | > --29.10%--lookup_fast > | > | > | > --28.76%--kernfs_dop_revalidate > | > | > | > --28.29%--__mutex_lock.constprop.0 > | > | > | > --27.65%--osq_lock > | > > |--9.60%--lookup_fast > | | > | > --9.50%--kernfs_dop_revalidate > | > | > | > --9.35%--__mutex_lock.constprop.0 > | > | > | > --9.18%--osq_lock > | > |--6.17%--may_open > | | > | > --6.13%--inode_permission.part.0 > | > | > | > --6.10%--kernfs_iop_permission > | > | > | > --5.90%--__mutex_lock.constprop.0 > | > | > | > --5.80%--osq_lock > | > > --0.52%--do_dentry_open > > 98.69% 0.01% fops_overhead [kernel.kallsyms] [k] __x64_sys_openat > | > --98.67%--__x64_sys_openat > do_sys_openat2 > | > --98.57%--do_filp_open > path_openat > | > |--81.83%--link_path_walk.part.0 > | | > | > |--52.19%--inode_permission.part.0 > | | | > | | > --51.86%--kernfs_iop_permission > | | | > | | > |--50.92%--__mutex_lock.constprop.0 > | | | > | > | | | > --49.58%--osq_lock > | | | > | | > --0.59%--mutex_unlock > | | > | --29.47%--walk_component > | | > | --29.10%--lookup_fast > | | > | > --28.76%--kernfs_dop_revalidate > | > | > | > --28.29%--__mutex_lock.constprop.0 > | > | > | > --27.65%--osq_lock > | > |--9.60%--lookup_fast > | | > | --9.50%--kernfs_dop_revalidate > | | > | > --9.35%--__mutex_lock.constprop.0 > | | > | > --9.18%--osq_lock > | > |--6.17%--may_open > | | > | --6.13%--inode_permission.part.0 > | | > | > --6.10%--kernfs_iop_permission > | | > | > --5.90%--__mutex_lock.constprop.0 > | > | > | > --5.80%--osq_lock > | > --0.52%--do_dentry_open > > 98.67% 0.03% fops_overhead [kernel.kallsyms] [k] do_sys_openat2 > | > --98.65%--do_sys_openat2 > | > --98.57%--do_filp_open > path_openat > | > |--81.83%--link_path_walk.part.0 > | | > | > |--52.19%--inode_permission.part.0 > | | | > | | > --51.86%--kernfs_iop_permission > | | | > | | > |--50.92%--__mutex_lock.constprop.0 > | | | > | > | | | > --49.58%--osq_lock > | | | > | | > --0.59%--mutex_unlock > | | > | --29.47%--walk_component > | | > | --29.10%--lookup_fast > | | > | > --28.76%--kernfs_dop_revalidate > | > | > | > --28.29%--__mutex_lock.constprop.0 > | > | > | > --27.65%--osq_lock > | > |--9.60%--lookup_fast > | | > | --9.50%--kernfs_dop_revalidate > | | > | > --9.35%--__mutex_lock.constprop.0 > | | > | > --9.18%--osq_lock > | > |--6.17%--may_open > | | > | --6.13%--inode_permission.part.0 > | | > | > --6.10%--kernfs_iop_permission > | | > | > --5.90%--__mutex_lock.constprop.0 > | > | > | > --5.80%--osq_lock > | > --0.52%--do_dentry_open > > 98.57% 0.00% fops_overhead [kernel.kallsyms] [k] do_filp_open > | > ---do_filp_open > path_openat > | > |--81.83%--link_path_walk.part.0 > | | > | |--52.19%--inode_permission.part.0 > | | | > | | --51.86%--kernfs_iop_permission > | | | > | | > |--50.92%--__mutex_lock.constprop.0 > | | | | > | | | --49.58%--osq_lock > | | | > | | --0.59%--mutex_unlock > | | > | --29.47%--walk_component > | | > | --29.10%--lookup_fast > | | > | > --28.76%--kernfs_dop_revalidate > | | > | > --28.29%--__mutex_lock.constprop.0 > | | > | > --27.65%--osq_lock > | > |--9.60%--lookup_fast > | | > | --9.50%--kernfs_dop_revalidate > | | > | --9.35%--__mutex_lock.constprop.0 > | | > | --9.18%--osq_lock > | > |--6.17%--may_open > | | > | --6.13%--inode_permission.part.0 > | | > | --6.10%--kernfs_iop_permission > | | > | > --5.90%--__mutex_lock.constprop.0 > | | > | --5.80%--osq_lock > | > --0.52%--do_dentry_open > > 98.57% 0.01% fops_overhead [kernel.kallsyms] [k] path_openat > | > --98.56%--path_openat > | > |--81.83%--link_path_walk.part.0 > | | > | |--52.19%--inode_permission.part.0 > | | | > | | --51.86%--kernfs_iop_permission > | | | > | | > |--50.92%--__mutex_lock.constprop.0 > | | | | > | | | > --49.58%--osq_lock > | | | > | | --0.59%--mutex_unlock > | | > | --29.47%--walk_component > | | > | --29.10%--lookup_fast > | | > | > --28.76%--kernfs_dop_revalidate > | | > | > --28.29%--__mutex_lock.constprop.0 > | > | > | > --27.65%--osq_lock > | > |--9.60%--lookup_fast > | | > | --9.50%--kernfs_dop_revalidate > | | > | > --9.35%--__mutex_lock.constprop.0 > | | > | --9.18%--osq_lock > | > |--6.17%--may_open > | | > | --6.13%--inode_permission.part.0 > | | > | --6.10%--kernfs_iop_permission > | | > | > --5.90%--__mutex_lock.constprop.0 > | | > | > --5.80%--osq_lock > | > --0.52%--do_dentry_open > > 94.52% 1.30% fops_overhead [kernel.kallsyms] [k] > __mutex_lock.constprop.0 > | > |--93.23%--__mutex_lock.constprop.0 > | | > | |--92.23%--osq_lock > | | > | --0.55%--mutex_spin_on_owner > | > --1.30%--0x7379732f73656369 > __GI___libc_open > entry_SYSCALL_64_after_hwframe > do_syscall_64 > __x64_sys_openat > do_sys_openat2 > do_filp_open > path_openat > | > --1.09%--link_path_walk.part.0 > | > --0.75%--inode_permission.part.0 > kernfs_iop_permission > __mutex_lock.constprop.0 > > 92.24% 92.22% fops_overhead [kernel.kallsyms] [k] osq_lock > | > --92.22%--0x7379732f73656369 > __GI___libc_open > entry_SYSCALL_64_after_hwframe > do_syscall_64 > __x64_sys_openat > do_sys_openat2 > do_filp_open > path_openat > | > |--77.21%--link_path_walk.part.0 > | | > | |--49.57%--inode_permission.part.0 > | | kernfs_iop_permission > | | __mutex_lock.constprop.0 > | | osq_lock > | | > | --27.64%--walk_component > | lookup_fast > | kernfs_dop_revalidate > | __mutex_lock.constprop.0 > | osq_lock > | > |--9.18%--lookup_fast > | kernfs_dop_revalidate > | __mutex_lock.constprop.0 > | osq_lock > | > --5.80%--may_open > inode_permission.part.0 > kernfs_iop_permission > __mutex_lock.constprop.0 > osq_lock > > 81.83% 0.03% fops_overhead [kernel.kallsyms] [k] > link_path_walk.part.0 > | > --81.80%--link_path_walk.part.0 > | > |--52.19%--inode_permission.part.0 > | | > | --51.86%--kernfs_iop_permission > | | > | > |--50.92%--__mutex_lock.constprop.0 > | | | > | | --49.58%--osq_lock > | | > | --0.59%--mutex_unlock > | > --29.47%--walk_component > | > --29.10%--lookup_fast > | > --28.76%--kernfs_dop_revalidate > | > > --28.29%--__mutex_lock.constprop.0 > | > > --27.65%--osq_lock > > 58.32% 0.24% fops_overhead [kernel.kallsyms] [k] > inode_permission.part.0 > | > --58.08%--inode_permission.part.0 > | > --57.97%--kernfs_iop_permission > | > |--56.81%--__mutex_lock.constprop.0 > | | > | --55.39%--osq_lock > | > --0.73%--mutex_unlock > > 57.97% 0.00% fops_overhead [kernel.kallsyms] [k] > kernfs_iop_permission > | > ---kernfs_iop_permission > | > |--56.81%--__mutex_lock.constprop.0 > | | > | --55.39%--osq_lock > | > --0.73%--mutex_unlock > > 38.71% 0.03% fops_overhead [kernel.kallsyms] [k] lookup_fast > | > --38.68%--lookup_fast > | > --38.26%--kernfs_dop_revalidate > | > --37.64%--__mutex_lock.constprop.0 > | > --36.83%--osq_lock > > 38.26% 0.04% fops_overhead [kernel.kallsyms] [k] > kernfs_dop_revalidate > | > --38.22%--kernfs_dop_revalidate > | > --37.64%--__mutex_lock.constprop.0 > | > --36.83%--osq_lock > > 29.47% 0.03% fops_overhead [kernel.kallsyms] [k] walk_component > | > --29.44%--walk_component > | > --29.10%--lookup_fast > | > --28.76%--kernfs_dop_revalidate > | > > --28.29%--__mutex_lock.constprop.0 > | > --27.65%--osq_lock > > 6.17% 0.03% fops_overhead [kernel.kallsyms] [k] may_open > | > --6.14%--may_open > | > --6.13%--inode_permission.part.0 > | > --6.10%--kernfs_iop_permission > | > > --5.90%--__mutex_lock.constprop.0 > | > --5.80%--osq_lock > > 1.22% 0.00% fops_overhead [unknown] [k] 0x5541d68949564100 > | > ---0x5541d68949564100 > __libc_start_main > | > |--0.68%--__close > | | > | --0.66%--entry_SYSCALL_64_after_hwframe > | do_syscall_64 > | | > | --0.61%--__prepare_exit_to_usermode > | | > | --0.58%--task_work_run > | > --0.54%--read > > 1.22% 0.00% fops_overhead libc-2.31.so [.] __libc_start_main > | > ---__libc_start_main > | > |--0.68%--__close > | | > | --0.66%--entry_SYSCALL_64_after_hwframe > | do_syscall_64 > | | > | --0.61%--__prepare_exit_to_usermode > | | > | --0.58%--task_work_run > | > --0.54%--read > > 1.06% 1.05% fops_overhead [kernel.kallsyms] [k] mutex_unlock > | > --1.02%--0x7379732f73656369 > __GI___libc_open > entry_SYSCALL_64_after_hwframe > do_syscall_64 > __x64_sys_openat > do_sys_openat2 > do_filp_open > path_openat > | > --0.80%--link_path_walk.part.0 > | > --0.60%--inode_permission.part.0 > kernfs_iop_permission > | > --0.59%--mutex_unlock > > 0.88% 0.79% fops_overhead [kernel.kallsyms] [k] mutex_lock > | > --0.68%--0x7379732f73656369 > __GI___libc_open > entry_SYSCALL_64_after_hwframe > do_syscall_64 > __x64_sys_openat > do_sys_openat2 > do_filp_open > path_openat > > 0.68% 0.01% fops_overhead libc-2.31.so [.] __close > | > --0.67%--__close > | > --0.66%--entry_SYSCALL_64_after_hwframe > do_syscall_64 > | > --0.61%--__prepare_exit_to_usermode > | > --0.58%--task_work_run > > 0.63% 0.05% fops_overhead [kernel.kallsyms] [k] > __prepare_exit_to_usermode > | > --0.58%--__prepare_exit_to_usermode > task_work_run > > 0.58% 0.00% fops_overhead [kernel.kallsyms] [k] task_work_run > | > ---task_work_run > > 0.58% 0.10% fops_overhead [kernel.kallsyms] [k] dput > 0.56% 0.55% fops_overhead [kernel.kallsyms] [k] > mutex_spin_on_owner > | > --0.55%--0x7379732f73656369 > __GI___libc_open > entry_SYSCALL_64_after_hwframe > do_syscall_64 > __x64_sys_openat > do_sys_openat2 > do_filp_open > path_openat > > 0.54% 0.00% fops_overhead libc-2.31.so [.] read > | > ---read > > 0.52% 0.12% fops_overhead [kernel.kallsyms] [k] do_dentry_open > 0.50% 0.00% fops_overhead [kernel.kallsyms] [k] ksys_read > 0.50% 0.03% fops_overhead [kernel.kallsyms] [k] vfs_read > 0.46% 0.05% fops_overhead [kernel.kallsyms] [k] __fput > 0.45% 0.45% fops_overhead [kernel.kallsyms] [k] lockref_put_return > 0.43% 0.43% fops_overhead [kernel.kallsyms] [k] osq_unlock > 0.41% 0.08% fops_overhead [kernel.kallsyms] [k] step_into > 0.41% 0.08% fops_overhead [kernel.kallsyms] [k] __d_lookup > 0.37% 0.35% fops_overhead [kernel.kallsyms] [k] _raw_spin_lock > 0.35% 0.03% fops_overhead [kernel.kallsyms] [k] seq_read > 0.28% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_fop_open > 0.27% 0.03% fops_overhead [kernel.kallsyms] [k] kernfs_fop_release > 0.16% 0.01% fops_overhead [kernel.kallsyms] [k] > kernfs_put_open_node > 0.16% 0.00% fops_overhead [kernel.kallsyms] [k] terminate_walk > 0.12% 0.01% fops_overhead [kernel.kallsyms] [k] __alloc_file > 0.12% 0.00% fops_overhead [kernel.kallsyms] [k] alloc_empty_file > 0.12% 0.01% fops_overhead [kernel.kallsyms] [k] unlazy_walk > 0.12% 0.05% fops_overhead [kernel.kallsyms] [k] _cond_resched > 0.12% 0.07% fops_overhead [kernel.kallsyms] [k] call_rcu > 0.10% 0.00% fops_overhead [kernel.kallsyms] [k] __legitimize_path > 0.09% 0.05% fops_overhead [kernel.kallsyms] [k] sysfs_kf_seq_show > 0.09% 0.09% fops_overhead [kernel.kallsyms] [k] generic_permission > 0.09% 0.07% fops_overhead [kernel.kallsyms] [k] rcu_all_qs > 0.09% 0.01% fops_overhead [kernel.kallsyms] [k] security_file_open > 0.08% 0.00% fops_overhead [kernel.kallsyms] [k] > security_file_alloc > 0.08% 0.08% fops_overhead [kernel.kallsyms] [k] > lockref_get_not_dead > 0.08% 0.03% fops_overhead [kernel.kallsyms] [k] kmem_cache_alloc > 0.08% 0.08% fops_overhead [kernel.kallsyms] [k] apparmor_file_open > 0.07% 0.05% fops_overhead [kernel.kallsyms] [k] kfree > 0.05% 0.05% fops_overhead [kernel.kallsyms] [k] kernfs_fop_read > 0.05% 0.05% fops_overhead [kernel.kallsyms] [k] set_nlink > 0.05% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_seq_start > 0.05% 0.03% fops_overhead [kernel.kallsyms] [k] path_init > 0.05% 0.00% fops_overhead [kernel.kallsyms] [k] __x64_sys_close > 0.05% 0.00% fops_overhead [kernel.kallsyms] [k] filp_close > 0.05% 0.05% fops_overhead [kernel.kallsyms] [k] > syscall_return_via_sysret > 0.05% 0.03% fops_overhead [kernel.kallsyms] [k] __kmalloc_node > 0.05% 0.05% fops_overhead [kernel.kallsyms] [k] > rcu_segcblist_enqueue > 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] vfs_open > 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] > _raw_spin_lock_irqsave > 0.04% 0.03% fops_overhead [kernel.kallsyms] [k] sprintf > 0.04% 0.00% fops_overhead [kernel.kallsyms] [k] dev_attr_show > 0.04% 0.00% fops_overhead [kernel.kallsyms] [k] die_id_show > 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] kmem_cache_free > 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] fsnotify_parent > 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] > security_inode_permission > 0.04% 0.01% fops_overhead [kernel.kallsyms] [k] > __check_object_size > 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] > apparmor_file_alloc_security > 0.04% 0.00% fops_overhead [kernel.kallsyms] [k] seq_release > 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] memset_erms > 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] kernfs_get_active > 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] try_to_wake_up > 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] vsnprintf > 0.03% 0.01% fops_overhead [kernel.kallsyms] [k] mntput_no_expire > 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] lockref_get > 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] kernfs_put_active > 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] fsnotify > 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] locks_remove_posix > 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] > security_file_permission > 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] rw_verify_area > 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] set_root > 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] nd_jump_root > 0.03% 0.01% fops_overhead [kernel.kallsyms] [k] wake_up_q > 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] > __mutex_unlock_slowpath.constprop.0 > 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] > getname_flags.part.0 > 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] task_work_add > 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] fput_many > 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] __legitimize_mnt > 0.03% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_seq_stop > 0.02% 0.02% fops_overhead [nfs] [k] nfs_do_access > 0.02% 0.00% fops_overhead ld-2.31.so [.] _dl_map_object > 0.02% 0.00% fops_overhead ld-2.31.so [.] open_path > 0.02% 0.00% fops_overhead ld-2.31.so [.] > __GI___open64_nocancel > 0.02% 0.00% fops_overhead [nfs] [k] nfs_permission > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_seq_next > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] available_idle_cpu > 0.01% 0.00% fops_overhead [unknown] [k] 0x3931206e69207364 > 0.01% 0.00% fops_overhead libc-2.31.so [.] __GI___libc_write > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] ksys_write > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] vfs_write > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] tty_write > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] n_tty_write > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] pty_write > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] queue_work_on > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __queue_work > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > select_task_rq_fair > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] > _raw_spin_unlock_irqrestore > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] > native_queued_spin_lock_slowpath > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] > slab_free_freelist_hook > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] > __list_del_entry_valid > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] > memcg_kmem_put_cache > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > __syscall_return_slowpath > 0.01% 0.01% fops_overhead libc-2.31.so [.] _dl_addr > 0.01% 0.00% fops_overhead [unknown] [.] 0x756e696c2d34365f > 0.01% 0.00% fops_overhead [unknown] [.] 0x00007f4b1ca1e000 > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > __x86_indirect_thunk_rax > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] __virt_addr_valid > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] locks_remove_file > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] memcpy_erms > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] update_rq_clock > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] entry_SYSCALL_64 > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] > __check_heap_object > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] > apparmor_file_free_security > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] security_file_free > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] __d_lookup_rcu > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] mntput > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > get_unused_fd_flags > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] alloc_slab_page > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __slab_alloc > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] ___slab_alloc > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] allocate_slab > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __alloc_fd > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] legitimize_root > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] strncpy_from_user > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] > kernfs_refresh_inode > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] build_open_flags > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] strcmp > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] > memcg_kmem_get_cache > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > asm_sysvec_apic_timer_interrupt > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > sysvec_apic_timer_interrupt > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > asm_call_sysvec_on_stack > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > __sysvec_apic_timer_interrupt > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] hrtimer_interrupt > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > __hrtimer_run_queues > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] tick_sched_timer > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] tick_sched_handle > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] > update_process_times > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] scheduler_tick > 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] perf_iterate_ctx > 0.01% 0.00% fops_overhead [unknown] [k] 0x00007fd34e3a0627 > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __x64_sys_execve > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] do_execve > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __do_execve_file > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] load_elf_binary > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] elf_map > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] vm_mmap_pgoff > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] do_mmap > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] mmap_region > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] perf_event_mmap > 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] perf_iterate_sb > 0.00% 0.00% perf_5.8 [unknown] [k] 0x00007fd34e3a0627 > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] > entry_SYSCALL_64_after_hwframe > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] perf_event_exec > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] do_syscall_64 > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] __x64_sys_execve > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] do_execve > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] __do_execve_file > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] load_elf_binary > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] begin_new_exec > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] native_write_msr > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] > __intel_pmu_enable_all.constprop.0 > 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] > acpi_os_read_memory > > > # > # (Tip: To count events in every 1000 msec: perf stat -I 1000) > # >
Hi Brice, I wrote a benchmark to do open+read+close on /sys/devices/system/cpu/cpu0/topology/die_id https://github.com/foxhlchen/sysfs_benchmark/blob/main/main.c + 3.39% 3.37% a.out [kernel.kallsyms] [k] mutex_unlock ◆ + 2.76% 2.74% a.out [kernel.kallsyms] [k] mutex_lock ▒ + 0.92% 0.42% a.out [kernel.kallsyms] [k] __mutex_lock.constprop.0 ▒ 0.38% 0.37% a.out [kernel.kallsyms] [k] mutex_spin_on_owner ▒ 0.05% 0.05% a.out [kernel.kallsyms] [k] __mutex_init ▒ 0.01% 0.01% a.out [kernel.kallsyms] [k] __mutex_lock_slowpath ▒ 0.01% 0.00% a.out [kernel.kallsyms] [k] __mutex_unlock_slowpath.constprop.0 But I failed to reproduce your result. If it is possible, would you mind providing your benchmark code? :) thanks, fox