On 10/26/22 17:44, Morten Brørup wrote:
Add __rte_cache_aligned to the objs array.
It makes no difference in the general case, but if get/put operations are
always 32 objects, it will reduce the number of memory (or last level
cache) accesses from five to four 64 B cache lines for every get/put
operation.
For readability reasons, an example using 16 objects follows:
Currently, with 16 objects (128B), we access to 3
cache lines:
┌────────┐
│len │
cache │********│---
line0 │********│ ^
│********│ |
├────────┤ | 16 objects
│********│ | 128B
cache │********│ |
line1 │********│ |
│********│ |
├────────┤ |
│********│_v_
cache │ │
line2 │ │
│ │
└────────┘
With the alignment, it is also 3 cache lines:
┌────────┐
│len │
cache │ │
line0 │ │
│ │
├────────┤---
│********│ ^
cache │********│ |
line1 │********│ |
│********│ |
├────────┤ | 16 objects
│********│ | 128B
cache │********│ |
line2 │********│ |
│********│ v
└────────┘---
However, accessing the objects at the bottom of the mempool cache is a
special case, where cache line0 is also used for objects.
Consider the next burst (and any following bursts):
Current:
┌────────┐
│len │
cache │ │
line0 │ │
│ │
├────────┤
│ │
cache │ │
line1 │ │
│ │
├────────┤
│ │
cache │********│---
line2 │********│ ^
│********│ |
├────────┤ | 16 objects
│********│ | 128B
cache │********│ |
line3 │********│ |
│********│ |
├────────┤ |
│********│_v_
cache │ │
line4 │ │
│ │
└────────┘
4 cache lines touched, incl. line0 for len.
With the proposed alignment:
┌────────┐
│len │
cache │ │
line0 │ │
│ │
├────────┤
│ │
cache │ │
line1 │ │
│ │
├────────┤
│ │
cache │ │
line2 │ │
│ │
├────────┤
│********│---
cache │********│ ^
line3 │********│ |
│********│ | 16 objects
├────────┤ | 128B
│********│ |
cache │********│ |
line4 │********│ |
│********│_v_
└────────┘
Only 3 cache lines touched, incl. line0 for len.
Credits go to Olivier Matz for the nice ASCII graphics.
Signed-off-by: Morten Brørup <m...@smartsharesystems.com>
Reviewed-by: Andrew Rybchenko <andrew.rybche...@oktetlabs.ru>