+1

Florin

> On Jul 14, 2021, at 10:47 AM, Damjan Marion via lists.fd.io 
> <dmarion=me....@lists.fd.io> wrote:
> 
> 
> I spent a bit of time to look at this and come up with some reasonable 
> solution.
> 
> First, 128-byte cacheline is not dead, recently announced Marvell Octeon 10 
> have 128 byte cacheline.
> 
> In current code cacheline size defines both amount of data prefetch 
> instruction prefetches and
> alignment of data in data structures needed to avoid false sharing.
> 
> So I think ideally we should have following:
> 
> - on x86:
>  - number of bytes prefetch instruction prefetches set to 64
>  - data structures should be aligned to 64 bytes
>  - due the fact that there is adjacent cacehline prefetcher on x86 it may be 
> worth
>    investigating if aligning to 128 brings some value
> 
> - on AArch64
>  - number of bytes prefetch instruction prefetches set to 64 or 128, based on 
> multiarch variant running
>  - data structures should be aligned to 128 bytes as that value prevents 
> false sharing for both 64 and 128 byte cacheline systems
> 
> Main problem is abuse of CLIB_PREFETCH() macro in our codebase.
> Original idea of it was good, somebody wanted to provide macro which 
> transparently emits 1-4 prefetch
> instructions based on data size recognising that there may be systems with 
> different cacheline size
> 
> Like:
>  CLIB_PREFETCH (p, sizeof (ip6_header_t), LOAD);
> 
> But reality is, most of the time we have:
>  CLIB_PREFETCH (p, CLIB_CACHE_LINE_BYTES, LOAD);
> 
> Where it is assumed that cacheline size is 64 and that just wasted resources 
> on system with 128-byte cacheline.
> 
> Also, most of places in our codebase are perfectly fine with whatever 
> cacheline size is, so I’m thinking about following:
> 
> 1. set CLIB_CACHE_LINE_BYTES to 64 on x86 and 128 on ARM, that will make sure 
> false sharing is not happening
> 
> 2. introduce CLIB_CACHE_PREFETCH_BYTES which can be set to different value 
> for different multiarch variant (64 for N1, 128 ThinderX2)
> 
> 3. modify CLIB_PREFETCH macro to use CLIB_CACHE_PREFETCH_BYTES to emit proper 
> number of prefetch instructions for cases where data size is specified
> 
> 4. take the stub and replace all `CLIB_PREFETCH (p, CLIB_CACHE_LINE_BYTES, 
> LOAD);` with `clib_prefetch_load (p);`.
>   There may be exceptions but those lines typically mean: "i want to prefetch 
> few (<=64) bytes at this address and i really don’t care what the cache line 
> size is”.
> 
> 5. analyse remaining few cases where CLIB_PREFETCH() is used with size 
> specified by CLIB_CACHE_LINE_BYTES.
> 
> Thoughts?
> 
> — 
> Damjan
> 
>> On 06.07.2021., at 03:48, Lijian Zhang <lijian.zh...@arm.com> wrote:
>> 
>> Thanks Damjan for your comments. Some replies in lines.
>> <snip>
>> Hi Lijian,
>> 
>> It will be good to know if 128 byte cacheline is something ARM platforms 
>> will be using in the future or it is just historical.
>> [Lijian] Existing ThunderX1 and OcteonTX2 CPUs are 128 byte cache-line. To 
>> my knowledge, there may be more CPUs with 128 byte cache-line in the future.
>> 
>> Cacheline size problem is not just about prefetching, even bigger issue is 
>> false sharing, so we need to address both.
>> [Lijian] Yes, there may be false-sharing issue when running VPP image with 
>> 64B definition on 128B cache-line CPUs. We will do some scalability testing 
>> with that case, and check the multi-core performance.
>> 
>> Probably best solution is to have 2 VPP images, one for 128 and one for 64 
>> byte cacheline size.
>> [Lijian] For native built image, that’s fine. But I’m not sure if it’s 
>> possible for cloud binaries installed via “apt-get install”.
>> 
>> Going across the whole codebase and replacing prefetch macros is something 
>> we should definitely avoid.
>> [Lijian] I got your concerns on large scope replacement. My concern is when 
>> CLIB_PREFETCH() is used to prefetch packet content into cache as below 
>> example, cache-line (CLIB_CACHE_LINE_BYTES) seems to be assumed as 64 bytes 
>> always.
>> CLIB_PREFETCH (p2->data, 3 * CLIB_CACHE_LINE_BYTES, LOAD);
>> 
>> — 
>> Damjan
>> 
>> 
>> On 05.07.2021., at 07:28, Lijian Zhang <lijian.zh...@arm.com> wrote:
>> 
>> Hi Damjan,
>> I committed several patches to address some issues around cache-line 
>> definitions in VPP.
>> 
>> Patch [1.1] is to resolve the build error [2] on 64Byte cache line Arm CPUs, 
>> e.g., ThunderX2, NeoverseN1, caused by the commit 
>> (https://gerrit.fd.io/r/c/vpp/+/32996, build: remove unused files and 
>> sections).
>> It also supports building Arm generic image (with command of “make 
>> build-release”) with 128Byte cache line definition, and building native 
>> image with 64Byte cache line definition on some Arm CPUs, e.g., ThunderX2, 
>> NeoverseN1 (with command of “make build-release TARGET_PLATFORM=native”).
>> 
>> Patch [1.5] is to set the default cache line definition in Arm generic image 
>> from 128Byte to 64Byte.
>> Setting cache line definition to 128Byte for Arm generic image is required 
>> for ThunderX1 (with 128Byte physical cache line), which is also the build 
>> machine in FD.io lab. I’m thinking for setting 64Byte cache line definition 
>> in VPP for Arm image, which will affect ThunderX1 and OcteonTX2 CPUs. So it 
>> requires the confirmation by Marvell.
>> 
>> Arm architecture CPUs have 128Byte or 64Byte physical designs. So no matter 
>> the cache line definition is 128Byte or 64Byte in VPP source code, the 
>> prefetch functions in generic image will not work properly on all Arm CPUs. 
>> Patches [1.2] [1.3] [1.4] are to resolve the issue.
>> 
>> For example when running Arm generic image (cache-line-size is defined as 
>> 128B in Makefile for all Arm architectures) on 64Byte cache-line-size CPUs, 
>> e.g., Neoverse-N1, Ampere altra, ThunderX2.
>> 
>> [3] shows the prefetch macro definitions in VPP. Using CLIB_PREFETCH(), you 
>> can prefetch data resides in multiple cache lines.
>> [4] shows some usage examples of the prefetch macros in VPP. When running 
>> Arm generic image (128B cache-line-size definition) on 64B cache-line CPUs 
>> (N1SDP for example), 4.2, 4.3 and 4.4 have issues.
>>      • For 4.2, the input for size parameter is 68. On N1SDP with 64B 
>> cache-line-size, there should be two prefetch instructions executed, but due 
>> to 68 is less than CLIB_CACHE_LINE_BYTES (128Byte definition in VPP), only 
>> the first prefetch instruction is executed.
>>      • For 4.3, if sizeof (ip0[0]) equals 68 or any other values larger than 
>> 64B, there will be the same issue as 4.2.
>>      • For 4.4, the code  is trying to prefetch the first 128B of packet 
>> content. It assumes  CLIB_CACHE_LINE_BYTES is 64B always. In Arm generic 
>> image, the input for size parameter is 256B, which will execute prefetches 
>> on unexpected cache-lines (expected prefetches on 64B-0 and 64B-1, but 
>> actually on B64-0 and B64-2) .
>> Packet content: [64B-0][64B-1][64B-2][64B-3]
>> 
>> Our proposal is introduce a macro CLIB_N_CACHELINE_BYTES via VPP multi-arch 
>> feature (check patch [1.2]), to reflect the runtime CPU cache-line-size in 
>> Arm generic image, so that the prefetch instructions can be executed 
>> correctly.
>> Then for 4.4, we will need to modify the parameter for size, from 
>> 2*CLIB_CACHE_LINE_BYTES to 128B, to reflect the actual intention.
>> 
>> Some additional macros [1.3] can be added for users to do prefetch based on 
>> number of cache-lines, besides number of bytes.
>> 
>> Could you please suggest on the issue and proposal?
>> 
>> [1]. Patches
>> [1.1] build: support 128B/64B cache line size in Arm image, 
>> https://gerrit.fd.io/r/c/vpp/+/32968/2
>> [1.2] vppinfra: refactor prefetch macro, 
>> https://gerrit.fd.io/r/c/vpp/+/32969/3
>> [1.3] vppinfra: fix functions to prefetch single line, 
>> https://gerrit.fd.io/r/c/vpp/+/32970/2
>> [1.4] misc: correct prefetch macro usage, 
>> https://gerrit.fd.io/r/c/vpp/+/32971/3
>> [1.5] build: set 64B cache line size in Arm image, 
>> https://gerrit.fd.io/r/c/vpp/+/32972/2
>> 
>> [2]. Error message
>> src/plugins/dpdk/device/init.c:1916:3: error: static_assert failed due to 
>> requirement '128 == 1 << 6' "DPDK RTE CACHE LINE SIZE does not match with 
>> 1<<CLIB_LOG2_CACHE_LINE_BYTES"
>>  STATIC_ASSERT (RTE_CACHE_LINE_SIZE == 1 << CLIB_LOG2_CACHE_LINE_BYTES,
>>  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> /home/lijian/tasks/plsremove/src/vppinfra/error_bootstrap.h:111:34: note: 
>> expanded from macro 'STATIC_ASSERT'
>> #define STATIC_ASSERT(truth,...) _Static_assert(truth, __VA_ARGS__)
>>                                 ^              ~~~~~
>> 1 error generated.
>> 
>> [3]. Prefetch macro definitions in VPP source code.
>> ‘’’
>> #define _CLIB_PREFETCH(n,size,type)                                          
>>       \
>>  if ((size) > (n)*CLIB_CACHE_LINE_BYTES)                                     
>>     \
>>    __builtin_prefetch (_addr + (n)*CLIB_CACHE_LINE_BYTES,       \
>>                                           CLIB_PREFETCH_##type,              
>>                             \
>>                                           /* locality */ 3);
>> 
>> #define CLIB_PREFETCH(addr,size,type)               \
>> do {                                                                         
>>       \
>>  void * _addr = (addr);                               \
>>                                                                              
>>         \
>>  ASSERT ((size) <= 4*CLIB_CACHE_LINE_BYTES); \
>>  _CLIB_PREFETCH (0, size, type);                            \
>>  _CLIB_PREFETCH (1, size, type);                            \
>>  _CLIB_PREFETCH (2, size, type);                            \
>>  _CLIB_PREFETCH (3, size, type);                            \
>> } while (0)
>> ‘’’
>> 
>> [4]
>> 4.1 CLIB_PREFETCH (p2->pre_data, 32, STORE);
>> 4.2 CLIB_PREFETCH (p2->pre_data, 68, STORE);
>> 4.3 CLIB_PREFETCH (b[4]->data, sizeof (ip0[0]), LOAD);
>> 4.4 CLIB_PREFETCH (p2->data, 2*CLIB_CACHE_LINE_BYTES, LOAD);
>> 
>> Thanks.
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19768): https://lists.fd.io/g/vpp-dev/message/19768
Mute This Topic: https://lists.fd.io/mt/83991823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to