[vpp-dev] Prefetches improvement for VPP Arm generic image

Lijian Zhang Sun, 04 Jul 2021 22:29:14 -0700

Hi Damjan,
I committed several patches to address some issues around cache-line 
definitions in VPP.


Patch [1.1] is to resolve the build error [2] on 64Byte cache line Arm CPUs, 
e.g., ThunderX2, NeoverseN1, caused by the commit 
(https://gerrit.fd.io/r/c/vpp/+/32996, build: remove unused files and sections).
It also supports building Arm generic image (with command of "make 
build-release") with 128Byte cache line definition, and building native image 
with 64Byte cache line definition on some Arm CPUs, e.g., ThunderX2, NeoverseN1 
(with command of "make build-release TARGET_PLATFORM=native").

Patch [1.5] is to set the default cache line definition in Arm generic image 
from 128Byte to 64Byte.
Setting cache line definition to 128Byte for Arm generic image is required for 
ThunderX1 (with 128Byte physical cache line), which is also the build machine 
in FD.io lab. I'm thinking for setting 64Byte cache line definition in VPP for 
Arm image, which will affect ThunderX1 and OcteonTX2 CPUs. So it requires the 
confirmation by Marvell.

Arm architecture CPUs have 128Byte or 64Byte physical designs. So no matter the 
cache line definition is 128Byte or 64Byte in VPP source code, the prefetch 
functions in generic image will not work properly on all Arm CPUs. Patches 
[1.2] [1.3] [1.4] are to resolve the issue.

For example when running Arm generic image (cache-line-size is defined as 128B 
in Makefile for all Arm architectures) on 64Byte cache-line-size CPUs, e.g., 
Neoverse-N1, Ampere altra, ThunderX2.

[3] shows the prefetch macro definitions in VPP. Using CLIB_PREFETCH(), you can 
prefetch data resides in multiple cache lines.
[4] shows some usage examples of the prefetch macros in VPP. When running Arm 
generic image (128B cache-line-size definition) on 64B cache-line CPUs (N1SDP 
for example), 4.2, 4.3 and 4.4 have issues.

  *   For 4.2, the input for size parameter is 68. On N1SDP with 64B 
cache-line-size, there should be two prefetch instructions executed, but due to 
68 is less than CLIB_CACHE_LINE_BYTES (128Byte definition in VPP), only the 
first prefetch instruction is executed.
  *   For 4.3, if sizeof (ip0[0]) equals 68 or any other values larger than 
64B, there will be the same issue as 4.2.
  *   For 4.4, the code  is trying to prefetch the first 128B of packet 
content. It assumes  CLIB_CACHE_LINE_BYTES is 64B always. In Arm generic image, 
the input for size parameter is 256B, which will execute prefetches on 
unexpected cache-lines (expected prefetches on 64B-0 and 64B-1, but actually on 
B64-0 and B64-2) .
Packet content: [64B-0][64B-1][64B-2][64B-3]

Our proposal is introduce a macro CLIB_N_CACHELINE_BYTES via VPP multi-arch 
feature (check patch [1.2]), to reflect the runtime CPU cache-line-size in Arm 
generic image, so that the prefetch instructions can be executed correctly.
Then for 4.4, we will need to modify the parameter for size, from 
2*CLIB_CACHE_LINE_BYTES to 128B, to reflect the actual intention.

Some additional macros [1.3] can be added for users to do prefetch based on 
number of cache-lines, besides number of bytes.

Could you please suggest on the issue and proposal?

[1]. Patches
[1.1] build: support 128B/64B cache line size in Arm image, 
https://gerrit.fd.io/r/c/vpp/+/32968/2
[1.2] vppinfra: refactor prefetch macro, https://gerrit.fd.io/r/c/vpp/+/32969/3
[1.3] vppinfra: fix functions to prefetch single line, 
https://gerrit.fd.io/r/c/vpp/+/32970/2
[1.4] misc: correct prefetch macro usage, https://gerrit.fd.io/r/c/vpp/+/32971/3
[1.5] build: set 64B cache line size in Arm image, 
https://gerrit.fd.io/r/c/vpp/+/32972/2

[2]. Error message
src/plugins/dpdk/device/init.c:1916:3: error: static_assert failed due to 
requirement '128 == 1 << 6' "DPDK RTE CACHE LINE SIZE does not match with 
1<<CLIB_LOG2_CACHE_LINE_BYTES"
  STATIC_ASSERT (RTE_CACHE_LINE_SIZE == 1 << CLIB_LOG2_CACHE_LINE_BYTES,
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/lijian/tasks/plsremove/src/vppinfra/error_bootstrap.h:111:34: note: 
expanded from macro 'STATIC_ASSERT'
#define STATIC_ASSERT(truth,...) _Static_assert(truth, __VA_ARGS__)
                                 ^              ~~~~~
1 error generated.

[3]. Prefetch macro definitions in VPP source code.
'''
#define _CLIB_PREFETCH(n,size,type)                                             
   \
  if ((size) > (n)*CLIB_CACHE_LINE_BYTES)                                       
  \
    __builtin_prefetch (_addr + (n)*CLIB_CACHE_LINE_BYTES,       \
                                           CLIB_PREFETCH_##type,                
                          \
                                           /* locality */ 3);

#define CLIB_PREFETCH(addr,size,type)               \
do {                                                                            
   \
  void * _addr = (addr);                               \
                                                                                
      \
  ASSERT ((size) <= 4*CLIB_CACHE_LINE_BYTES); \
  _CLIB_PREFETCH (0, size, type);                            \
  _CLIB_PREFETCH (1, size, type);                            \
  _CLIB_PREFETCH (2, size, type);                            \
  _CLIB_PREFETCH (3, size, type);                            \
} while (0)
'''

[4]
4.1 CLIB_PREFETCH (p2->pre_data, 32, STORE);
4.2 CLIB_PREFETCH (p2->pre_data, 68, STORE);
4.3 CLIB_PREFETCH (b[4]->data, sizeof (ip0[0]), LOAD);
4.4 CLIB_PREFETCH (p2->data, 2*CLIB_CACHE_LINE_BYTES, LOAD);

Thanks.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#19699): https://lists.fd.io/g/vpp-dev/message/19699
Mute This Topic: https://lists.fd.io/mt/83991823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

[vpp-dev] Prefetches improvement for VPP Arm generic image

Reply via email to