> -----Original Message----- > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Sent: Friday, March 20, 2020 6:32 PM > To: Van Haaren, Harry <harry.van.haa...@intel.com>; Phil Yang > <phil.y...@arm.com>; tho...@monjalon.net; Ananyev, Konstantin > <konstantin.anan...@intel.com>; step...@networkplumber.org; > maxime.coque...@redhat.com; dev@dpdk.org > Cc: david.march...@redhat.com; jer...@marvell.com; hemant.agra...@nxp.com; > Gavin Hu <gavin...@arm.com>; Ruifeng Wang <ruifeng.w...@arm.com>; Joyce Kong > <joyce.k...@arm.com>; Carrillo, Erik G <erik.g.carri...@intel.com>; nd > <n...@arm.com>; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; nd > <n...@arm.com> > Subject: RE: [PATCH v3 00/12] generic rte atomic APIs deprecate proposal > > + Erik as there are similar changes to timer library > > <snip> > > > > > > > Subject: [PATCH v3 00/12] generic rte atomic APIs deprecate proposal > > > > > > > > DPDK provides generic rte_atomic APIs to do several atomic operations. > > > > These APIs are using the deprecated __sync built-ins and enforce > > > > full memory barriers on aarch64. However, full barriers are not > > > > necessary in many use cases. In order to address such use cases, C > > > > language offers > > > > C11 atomic APIs. The C11 atomic APIs provide finer memory barrier > > > > control by making use of the memory ordering parameter provided by > > > > the > > > user. > > > > Various patches submitted in the past [2] and the patches in this > > > > series indicate significant performance gains on multiple aarch64 > > > > CPUs and no performance loss on x86. > > > > > > > > But the existing rte_atomic API implementations cannot be changed as > > > > the APIs do not take the memory ordering parameter. The only choice > > > > available is replacing the usage of the rte_atomic APIs with C11 > > > > atomic APIs. In order to make this change, the following steps are > > proposed: > > > > > > > > [1] deprecate rte_atomic APIs so that future patches do not use > > > > rte_atomic APIs (a script is added to flag the usages). > > > > [2] refactor the code that uses rte_atomic APIs to use c11 atomic > APIs. > > > > > > On [1] above, I feel deprecating DPDKs atomic functions and failing > > > checkpatch is a bit sudden. Perhaps noting that in a future release > > > (20.11?) DPDK will move to a > > > C11 based atomics model is a more gradual step to achieving the goal, > > > and at that point add a checkpatch warning for additions of rte_atomic*? > > We have been working on changing existing usages of rte_atomic APIs in > DPDK > > to use C11 atomics. Usually, the x.11 releases have significant amount of > > changes (not sure how many would use rte_atomic APIs). I would prefer that > > in 20.11 no additional code is added using rte_atomics APIs. However, I am > > open to suggestions on the exact time frame. > > Once we decide on the release, I think it makes sense to add a 'warning' > in the > > checkpatch to indicate the deprecation timeline and add an 'error' after > the > > release.
The above sounds reasonable - mainly let's not block any code that exists or is being developed today using rte_atomic_* APIs from making a release. > > > More on [2] in context below. > > > > > > The above is my point-of-view, of course I'd like more people from the > > > DPDK community to provide their input too. > > > > > > > > > > This patchset contains: > > > > 1) the checkpatch script changes to flag rte_atomic API usage in > patches. > > > > 2) changes to programmer guide describing writing efficient code for > > > aarch64. > > > > 3) changes to various libraries to make use of c11 atomic APIs. > > > > > > > > We are planning to replicate this idea across all the other > > > > libraries, drivers, examples, test applications. In the next phase, > > > > we will add changes to the mbuf, the EAL interrupts and the event > > > > timer adapter > > > libraries. > > > > > > About ~6/12 patches of this C11 set are targeting the Service Cores > > > area of DPDK. I have some concerns over increased complexity of C11 > > > implementation vs the (already complex) rte_atomic implementation today. > > I agree that it C11 changes are complex, especially if one is starting out > to > > understand what these APIs provide. From my experience, once few > > underlying concepts are understood, reviewing or making changes do not > take > > too much time. > > > > > I see other patchsets enabling C11 across other DPDK components, so > > > maybe we should also discuss C11 enabling in a wider context that just > > service cores? > > Yes, agree. We are in the process of making changes to other areas as > well. > > > > > > > > I don't think it fair to expect all developers to be well versed in > > > C11 atomic semantics like understanding the complex interactions > > > between the various > > > C11 RELEASE, AQUIRE barriers requires. > > C11 has been around from sometime now. To my surprise, OVS already uses > > C11 APIs extensively. VPP has been accepting C11 related changes from past > > couple of years. Having said that, I agree in general that not everyone is > > well versed. Fair point - like so many things, once familiar with it, it becomes easy :) > > > As maintainer of Service Cores I'm hesitant to accept the large-scale > > > refactor > > Right now, the patches are split into multiple commits. If required I can > host a > > call to go over simple C11 API usages (sufficient to cover the usage in > service > > core) and the changes in this patch. If you find that particular areas > need more > > understanding I can work on providing additional information such as > memory > > order ladder diagrams. Please let me know what you think. Thanks for the offer - I will need to do my due diligence on reiview before taking up any of your or other C11 folks time. > When I started working with C11 APIs, I had referred to the following blogs. > https://preshing.com/20120913/acquire-and-release-semantics/ > https://preshing.com/20130702/the-happens-before-relation/ > https://preshing.com/20130823/the-synchronizes-with-relation/ > > These will be helpful to understand the changes. Thanks, indeed good articles. I found the following slide deck particularly informative due to the fantastic diagrams (eg, slide 23): https://mariadb.org/wp-content/uploads/2017/11/2017-11-Memory-barriers.pdf That said, finding a nice diagram and understanding the implications of actually using it is different! I hope to properly review the service-cores patches next week. > > > of atomic-implementation, as it could lead to racey bugs that are > > > likely extremely difficult to track down. (The recent race-on-exit has > > > proven the difficulty in reproducing, and that's with an atomics model > > > I'm quite familiar with). > > > > > > Let me be very clear: I don't wish to block a C11 atomic > > > implementation, but I'd like to discuss how we (DPDK community) can > > > best port-to and maintain a complex multi-threaded service library > > > with best-in-class performance for the workload. > > > > > > To put some discussions/solutions on the table: > > > - Shared Maintainership of a component? > > > Split in functionality and C11 atomics implementation > > > Obviously there would be collaboration required in such a case. > > > - Maybe shared maintainership is too much? > > > A gentlemans/womans agreement of "helping out" with C11 atomics > > > debug is enough? > > I think shared maintainer ship could be too much as there are many > changes. > > But, I and other engineers from Arm (I would include Arm ecosystem as > well) > > can definitely help out on debug and reviews involving C11 APIs. (I see > > Thomas's reply on this topic). Thanks for the offer - as above, ball on my side, I'll go review. <snip>