Related to change 18278[1], I was wondering if there is really a benefit of dealing with 128-byte cachelines like we do today. Compiling VPP with cacheline size set to 128 will basically just add 64 bytes of unused space at the end of each cacheline so vlib_buffer_t for example will grow from 128 bytes to 256 bytes, but we will still need to prefetch 2 cachelines like we do by default.
Whta will happen if we just leave that to be 64? 1. sometimes (and not very frequently) we will issue 2 prefetch instructions for same cacheline, but I hope hardware is smart enough to just ignore 2nd one 2. we may face false sharing issues if first 64 bytes is touched by one thread and another 64 bytes are touched by another one Second one sounds to me like a real problem, but it can be solved by aligning all per-thread data structures to 2 x cacheline size. Actually If i remember correctly, even on x86 some of hardware prefetchers are dealing with blocks of 2 cachelines. So unless I missed something, my proposal here is, instead of maintaining special 128 byte images for some ARM64 machines, let’s just align all per-thread data structures to 128 and have just one ARM image. Thoughts? -- Damjan [1] https://gerrit.fd.io/r/#/c/18278/
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#12532): https://lists.fd.io/g/vpp-dev/message/12532 Mute This Topic: https://lists.fd.io/mt/30426937/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-