Re: Programmatically cache line

David Chisnall Thu, 04 Jan 2018 02:04:52 -0800

On 3 Jan 2018, at 22:12, Nathan Whitehorn <nwhiteh...@freebsd.org> wrote:
> 
> On 01/03/18 13:37, Ed Schouten wrote:
>> 2018-01-01 11:36 GMT+01:00 Konstantin Belousov <kostik...@gmail.com>:
>>>>>> On x86, the CPUID instruction leaf 0x1 returns the information in
>>>>>> %ebx register.
>>>>> Hm, weird. Why don't we extend sysctl to include this info?
>>> For the same reason we do not provide a sysctl to add two integers.
>> I strongly agree with Kostik on this one. Why add stuff to the kernel,
>> if userspace is already capable of extracting this? Adding that stuff
>> to sysctl has the downside that it will effectively introduce yet
>> another FreeBSDism, whereas something generic already exists.
>> 
> 
> Well, kind of. The userspace version is platform-dependent and not always 
> available: for example, on PPC, you can't do this from userland and we 
> provide a sysctl machdep.cacheline_size to userland. It would be nice to have 
> an MI API.


On ARMv8, similarly, sometimes the kernel needs to advertise the wrong size.  A 
few big.LITTLE cores have 64-byte cache lines on one cluster and 32-byte on the 
other.  If you query the size from userspace while running on a 64-byte 
cluster, then issue the zero-cache-line instruction while migrated to the 
32-byte cluster, you only clear half the size.  Linux works around this by 
trapping and emulating the instruction to query the cache size and always 
reporting the size for the smallest cache lines.  ARM tells people not to build 
systems like this, but it doesn’t always stop them.  Trapping and emulating is 
much slower than just providing the information in a shared page, elf aux args 
vector, or even (often) a system call.

To give another example, Linux provides a very cheap way for a userspace 
process to enquire which core it’s running on.  Some more recent 
high-performance mallocs use this to have a second-layer per-core cache after 
the per-thread cache for free blocks.  Unlike the per-thread cache, the 
per-core cache does need a lock, but it’s very unlikely to be contended (it 
will only be contended if either a thread is migrated in between checking and 
locking, so acquires the wrong CPU’s lock, or if a thread is preempted in the 
middle of middle of the very brief fill operation).  The author of the 
SuperMalloc paper tried doing this with CPUID and found that it was slower by a 
sufficient margin to almost entirely offset the benefits of the extra layer of 
caching.  

Just because userspace can get at the information directly from the hardware 
doesn’t mean that this is the most efficient or best way for userspace to get 
at it.

Oh, and some of these things are useful in portable code, so having to write 
some assembly for every target to get information that the kernel already knows 
is wasteful.

David

_______________________________________________
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Programmatically cache line

Reply via email to