On 5/8/2024 4:23 PM, Stephen Hemminger wrote: > On Wed, 8 May 2024 09:19:02 +0200 > Mattias Rönnblom <hof...@lysator.liu.se> wrote: > >> On 2024-05-04 00:00, Stephen Hemminger wrote: >>> On Fri, 3 May 2024 16:45:47 +0100 >>> Ferruh Yigit <ferruh.yi...@amd.com> wrote: >>> >>>> For stats reset, use an offset instead of zeroing out actual stats values, >>>> get_stats() displays diff between stats and offset. >>>> This way stats only updated in datapath and offset only updated in stats >>>> reset function. This makes stats reset function more reliable. >>>> >>>> As stats only written by single thread, we can remove 'volatile' qualifier >>>> which should improve the performance in datapath. >>>> >>>> While updating around, 'igb_stats' parameter renamed as 'stats'. >>>> >>>> Signed-off-by: Ferruh Yigit <ferruh.yi...@amd.com> >>>> --- >>>> Cc: Mattias Rönnblom <mattias.ronnb...@ericsson.com> >>>> Cc: Stephen Hemminger <step...@networkplumber.org> >>>> Cc: Morten Brørup <m...@smartsharesystems.com> >>>> >>>> This update triggered by mail list discussion [1]. >>>> >>>> [1] >>>> https://inbox.dpdk.org/dev/3b2cf48e-2293-4226-b6cd-5f4dd3969...@lysator.liu.se/ >>>> >>> >>> >>> NAK >>> >>> I did not hear a good argument why atomic or volatile was necessary in the >>> first place. >>> Why? >>> >> >> On the reader side, loads should be atomic. >> On the writer side, stores should be atomic. >> >> Updates (stores) should actually occur in a timely manner. The complete >> read-modify-write cycle need not be atomic, since we only have a single >> writer. All this for the per-lcore counter case. >> >> If load or store tearing occurs, the counter values may occasionally >> take totally bogus values. I think that should be avoided. Especially >> since it will likely come at a very reasonable cost. >> >> From what it seems to me, load or store tearing may well occur. GCC may >> generate two 32-bit stores for a program-level 64-bit store on 32-bit >> x86. If you have constant and immediate-data store instructions, >> constant writes may also be end up teared. The kernel documentation has >> some example of this. Add LTO, it's not necessarily going to be all that >> clear what is storing-a-constant and what is not. >> >> Maybe you care a little less if statistics are occasionally broken, or >> some transient, inconsistent state, but generally they should work, and >> they should never have some totally bogus values. So, statistics aren't >> snow flakes, mostly just business as usual. >> >> We can't both have a culture that promotes C11-style parallel >> programming, or, at the extreme, push the C11 APIs as-is, and the say >> "and btw you don't have to care about the standard when it comes to >> statistics". >> >> We could adopt the Linux kernel's rules, programming model, and APIs >> (ignoring legal issues). That would be very old school, maybe somewhat >> over-engineered for our purpose, include a fair amount of inline >> assembler, and also and may well depend on GCC or GCC-like compilers, >> just like what I believe the kernel does. >> >> We could use something in-between, heavily inspired by C11 but still >> with an opportunity to work around compiler issues, library issues, and >> extend the API for our use case. >> >> I agree we shouldn't have to mark statistics _Atomic, or RTE_ATOMIC(), >> rte_atomic64_t, or rte_sometimes_atomic_and_sometimes_not64_t. Just >> keeping the usual C integer types seems like a better option to me. >> >>> Why is this driver special (a snowflake) compared to all the other drivers >>> doing software >>> statistics (tap, virtio, xdp, ring, memif, netvsc, vmware)? >> >> If a broken piece of code has been copied around, one place is going to >> be the first to be fixed. > > > I dislike when any driver does something completely different than valid > precedent. > No other driver in DPDK, Vpp, FreeBSD, Linux (and probably Windows) uses > atomic for > updating statistics. We even got performance benefit at MS from removing > atomic > increment of staistics in internal layers. > > The idea of load tearing is crazy talk of integral types. It would break so > many things. > It is the kind of stupid compiler thing that would send Linus on a rant and > get > the GCC compiler writers in trouble. > > The DPDK has always favored performance over strict safety guard rails > everywhere. > Switching to making every statistic an atomic operation is not in the spirit > of > what is required. There is no strict guarantee necessary here. >
I kind of agree with Stephen. Thanks Mattias, Morten & Stephen, it was informative discussion. But for *SW drivers* stats update and reset is not core functionality and I think we can be OK to get hit on corner cases, instead of over-engineering or making code more complex. I am for putting priority as following (from high to low): - Datapath performance - Stats get accuracy - Stats reset accuracy With the restriction that stat reset requires forwarding to stop, we can even drop 'offset' logic. And I am not sure if it is a real requirement that stats reset should be supported during forwarding, although I can see it is convenient. If we get this requirement in the future, we can focus on a solution. As action, I am planning to send a new version of this RFC that only removes the 'volatile' qualifier. In next step we can remove atomic updates and volatile stat counters from more SW drivers. Thanks, ferruh