Re: [EXT] Re: [vpp-dev] 128 byte cache line support

Nitin Saxena Wed, 20 Mar 2019 22:51:39 -0700

Hi,


First all sorry for responding late to this mail chain. Please see my answers 
inline in blue


Thanks,

Nitin


________________________________
From: Damjan Marion <dmar...@me.com>
Sent: Monday, March 18, 2019 4:48 PM
To: Honnappa Nagarahalli
Cc: vpp-dev; Nitin Saxena
Subject: [EXT] Re: [vpp-dev] 128 byte cache line support

External Email
________________________________


On 15 Mar 2019, at 04:52, Honnappa Nagarahalli 
<honnappa.nagaraha...@arm.com<mailto:honnappa.nagaraha...@arm.com>> wrote:



Related to change 18278[1], I was wondering if there is really a benefit of 
dealing with 128-byte cachelines like we do today.
Compiling VPP with cacheline size set to 128 will basically just add 64 bytes 
of unused space at the end of each cacheline so
vlib_buffer_t for example will grow from 128 bytes to 256 bytes, but we will 
still need to prefetch 2 cachelines like we do by default.

[Nitin]: This is the existing model. In case of forwarding mainly first vlib 
cache line size is being used. We are utilising existing hole (in first vlib 
cache line) by putting packet parsing info (Size ==64B). This has many 
benefits, one of them is to avoid ipv4-input-no-chksum() software checks. It 
gives us ~20 cycles benefits on our platform. So I do not want to lose that 
gain.

Whta will happen if we just leave that to be 64?

[Nitin]: This will create L1D holes on 128B targets right? Unutilized holes are 
not acceptable as it will waste L1D space and thereby affecting performance. On 
the contrary we want to pack structures from  2x64B to 1x128B cache line size 
to reduce number of pending prefetches in core pipeline. VPP heavily prefetches 
LOAD/STORE version of 64B and our effort is to reduce them for our target.

[Honnappa] Currently, ThunderX1 and Octeon TX have 128B cache line. What I have 
heard from Marvel folks is 64B cache line setting in DPDK does not work. I have 
not gone into details on what does not work exactly. May be Nitin can elaborate.

I’m curious to hear details…

1. sometimes (and not very frequently) we will issue 2 prefetch instructions 
for same cacheline, but I hope hardware is smart enough to just ignore 2nd one

2. we may face false sharing issues if first 64 bytes is touched by one thread 
and another 64 bytes are touched by another one

Second one sounds to me like a real problem, but it can be solved by aligning 
all per-thread data structures to 2 x cacheline size.

[Honnappa] Sorry, I don’t understand you here. Even if the data structure is 
aligned on 128B (2 X 64B), 2 contiguous blocks of 64B data would be on a single 
cache line.

I wanted to say that we can align all per-thread data structures to 128 bytes, 
even on systems which have 64 byte cacheline size.

Actually If i remember correctly, even on x86 some of hardware prefetchers are 
dealing with blocks of 2 cachelines.

So unless I missed something, my proposal here is, instead of maintaining 
special 128 byte images for some ARM64 machines,
let’s just align all per-thread data structures to 128 and have just one ARM 
image.

[Honnappa] When we run VPP compiled with 128B cache line size on platforms with 
64B cache line size, there is a performance degradation.

Yeah, sure, what I’m suggesting here is how to address that perf degradation.
[Nitin]: Is this proposal for Intel as well? If yes then I am fine with the 
proposal but I think it will decrease performance on 64B architecture with 
existing code.

Hence the proposal is to make sure the distro packages run on all platforms. 
But one can get the best performance when compiled for a particular target.

Thoughts?

--
Damjan


[1] https://gerrit.fd.io/r/#/c/18278/

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12532): https://lists.fd.io/g/vpp-dev/message/12532
Mute This Topic: https://lists.fd.io/mt/30426937/675477
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[honnappa.nagaraha...@arm.com<mailto:honnappa.nagaraha...@arm.com>]
-=-=-=-=-=-=-=-=-=-=-=-
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12595): https://lists.fd.io/g/vpp-dev/message/12595
Mute This Topic: https://lists.fd.io/mt/30633927/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [EXT] Re: [vpp-dev] 128 byte cache line support

Reply via email to