> From: Bruce Richardson [mailto:bruce.richard...@intel.com] > Sent: Thursday, 9 May 2024 11.30 > > On Thu, May 09, 2024 at 09:43:16AM +0200, Morten Brørup wrote: > > > From: Stephen Hemminger [mailto:step...@networkplumber.org] > > > Sent: Wednesday, 8 May 2024 22.54 > > > > > > On Wed, 8 May 2024 20:48:06 +0100 > > > Ferruh Yigit <ferruh.yi...@amd.com> wrote: > > > > > > > > > > > > > The idea of load tearing is crazy talk of integral types. It > would > > > break so many things. > > > > > It is the kind of stupid compiler thing that would send Linus on > a > > > rant and get > > > > > the GCC compiler writers in trouble. > > > > > > > > > > The DPDK has always favored performance over strict safety guard > > > rails everywhere. > > > > > Switching to making every statistic an atomic operation is not > in > > > the spirit of > > > > > what is required. There is no strict guarantee necessary here. > > > > > > > > > > > > > I kind of agree with Stephen. > > > > > > > > Thanks Mattias, Morten & Stephen, it was informative discussion. > But > > > for > > > > *SW drivers* stats update and reset is not core functionality and > I > > > > think we can be OK to get hit on corner cases, instead of > > > > over-engineering or making code more complex. > > > > > > > > > I forgot the case of 64 bit values on 32 bit platforms! > > > Mostly because haven't cared about 32 bit for years... > > > > > > The Linux kernel uses some wrappers to handle this. > > > On 64 bit platforms they become noop. > > > On 32 bit platform, they are protected by a seqlock and updates are > > > wrapped by the sequence count. > > > > > > If we go this way, then doing similar Noop on 64 bit and atomic or > > > seqlock > > > on 32 bit should be done, but in common helper. > > > > > > Looking inside FreeBSD, it looks like that has changed over the > years as > > > well. > > > > > > if_inc_counter > > > counter_u64_add > > > atomic_add_64 > > > But the counters are always per-cpu in this case. So although it > does > > > use > > > locked operation, will always be uncontended. > > > > > > > > > PS: Does DPDK still actually support 32 bit on x86? Can it be > dropped > > > this cycle? > > > > We cannot drop 32 bit architecture support altogether. > > > > But, unlike the Linux kernel, DPDK doesn't need to support ancient 32 > bit architectures. > > If the few 32 bit architectures supported by DPDK provide non-tearing > 64 bit loads/stores, we don't need locks (in the fast path) for 64 bit > counters. > > > > In addition to 32 bit x86, DPDK supports ARMv7-A (a 32 bit > architecture) and 32 bit ARMv8. > > I don't think DPDK support any other 32 bit architectures. > > > > > > As Mattias mentioned, 32 bit x86 can use xmm registers to provide 64 > bit non-tearing load/store. > > > > Testing this a little in godbolt, I see gcc using xmm registers on 32- > bit > when updating 64-bit counters, but clang doesn't seem to do so, but > instead > does 2 stores when writing back the 64 value. (I tried with both > volatile > and non-volatile 64-bit values, just to see if volatile would encourage > clang to do a single store). > > GCC: https://godbolt.org/z/9eqKfT3hz > Clang: https://godbolt.org/z/PT5EqKn4c
Interesting. I guess this can be fixed by manually implementing what GCC does. I'm more concerned about finding a high-performance (in the fast path) 64 bit counter solution for 32 bit ARM.