On 03/04/15 20:54, Stephen Hemminger wrote: > On Wed, 04 Mar 2015 09:57:24 +0200 > Vlad Zolotarov <vladz at cloudius-systems.com> wrote: > >> >> On 03/04/15 02:33, Stephen Hemminger wrote: >>> On Tue, 3 Mar 2015 21:48:43 +0200 >>> Vlad Zolotarov <vladz at cloudius-systems.com> wrote: >>> >>>> + next_desc: >>>> + /* >>>> + * The code in this whole file uses the volatile pointer to >>>> + * ensure the read ordering of the status and the rest of the >>>> + * descriptor fields (on the compiler level only!!!). This is so >>>> + * UGLY - why not to just use the compiler barrier instead? DPDK >>>> + * even has the rte_compiler_barrier() for that. >>>> + * >>>> + * But most importantly this is just wrong because this doesn't >>>> + * ensure memory ordering in a general case at all. For >>>> + * instance, DPDK is supposed to work on Power CPUs where >>>> + * compiler barrier may just not be enough! >>>> + * >>>> + * I tried to write only this function properly to have a >>>> + * starting point (as a part of an LRO/RSC series) but the >>>> + * compiler cursed at me when I tried to cast away the >>>> + * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm >>>> + * keeping it the way it is for now. >>>> + * >>>> + * The code in this file is broken in so many other places and >>>> + * will just not work on a big endian CPU anyway therefore the >>>> + * lines below will have to be revisited together with the rest >>>> + * of the ixgbe PMD. >>>> + * >>>> + * TODO: >>>> + * - Get rid of "volatile" crap and let the compiler do its >>>> + * job. >>>> + * - Use the proper memory barrier (rte_rmb()) to ensure the >>>> + * memory ordering below. >>> This comment screams "this is broken". >>> Why not get proper architecture independent barriers in DPDK first. >> This series is orthogonal to the issue above. I just couldn't stand to >> mention this ugliness when I noticed it on the way. >> Note that although this is obviously not the right way to write this >> kind of code it is still not a bug and most likely the performance >> implications are minimal here. >> The only overhead is that there may be read "too much" data from the >> descriptor that we may not actually need. The descriptor is 16 bytes so >> this doesn't seem to be a critical issue. >> >> So, fixing the above issue may wait, especially since the same s..t may >> be found in other Intel PMDs (see i40e for example). Fixing this issue >> should be a matter of a massive cleanup series that cover all the >> relevant PMDs. Of course we may start with ixgbe but even in this single >> PMD there are at least 3 non-LRO related functions that have to be >> fixed, so IMHO even fixing ONLY ixgbe should be a matter of a separate >> series. > In userspace-rcu and kernel there is a simple macro that would make this > kind of code more sane. > > What about adding: > > #define rte_access_once(x) (*(volatile typeof(x) *)&(x)) > > Then doing > rxdp = rte_access_once(rx_ring + idx);
This workaround doesn't address the described above issue - it just hides it inside a macro, which is even uglier. The main reason I haven't fixed this issue in (at least) a function I've added is that the hw->rx_ring (HW ring) is defined as volatile and this fact is used all over the file in different places and all such places have to be fixed if I drop the "volatile" qualifier which should be the first thing to do. > > > >