On Thu, May 09, 2024 at 09:43:16AM +0200, Morten Brørup wrote: > > From: Stephen Hemminger [mailto:step...@networkplumber.org] > > Sent: Wednesday, 8 May 2024 22.54 > > > > On Wed, 8 May 2024 20:48:06 +0100 > > Ferruh Yigit <ferruh.yi...@amd.com> wrote: > > > > > > > > > > The idea of load tearing is crazy talk of integral types. It would > > break so many things. > > > > It is the kind of stupid compiler thing that would send Linus on a > > rant and get > > > > the GCC compiler writers in trouble. > > > > > > > > The DPDK has always favored performance over strict safety guard > > rails everywhere. > > > > Switching to making every statistic an atomic operation is not in > > the spirit of > > > > what is required. There is no strict guarantee necessary here. > > > > > > > > > > I kind of agree with Stephen. > > > > > > Thanks Mattias, Morten & Stephen, it was informative discussion. But > > for > > > *SW drivers* stats update and reset is not core functionality and I > > > think we can be OK to get hit on corner cases, instead of > > > over-engineering or making code more complex. > > > > > > I forgot the case of 64 bit values on 32 bit platforms! > > Mostly because haven't cared about 32 bit for years... > > > > The Linux kernel uses some wrappers to handle this. > > On 64 bit platforms they become noop. > > On 32 bit platform, they are protected by a seqlock and updates are > > wrapped by the sequence count. > > > > If we go this way, then doing similar Noop on 64 bit and atomic or > > seqlock > > on 32 bit should be done, but in common helper. > > > > Looking inside FreeBSD, it looks like that has changed over the years as > > well. > > > > if_inc_counter > > counter_u64_add > > atomic_add_64 > > But the counters are always per-cpu in this case. So although it does > > use > > locked operation, will always be uncontended. > > > > > > PS: Does DPDK still actually support 32 bit on x86? Can it be dropped > > this cycle? > > We cannot drop 32 bit architecture support altogether. > > But, unlike the Linux kernel, DPDK doesn't need to support ancient 32 bit > architectures. > If the few 32 bit architectures supported by DPDK provide non-tearing 64 bit > loads/stores, we don't need locks (in the fast path) for 64 bit counters. > > In addition to 32 bit x86, DPDK supports ARMv7-A (a 32 bit architecture) and > 32 bit ARMv8. > I don't think DPDK support any other 32 bit architectures. > > > As Mattias mentioned, 32 bit x86 can use xmm registers to provide 64 bit > non-tearing load/store. >
Testing this a little in godbolt, I see gcc using xmm registers on 32-bit when updating 64-bit counters, but clang doesn't seem to do so, but instead does 2 stores when writing back the 64 value. (I tried with both volatile and non-volatile 64-bit values, just to see if volatile would encourage clang to do a single store). GCC: https://godbolt.org/z/9eqKfT3hz Clang: https://godbolt.org/z/PT5EqKn4c /Bruce