On Wed, 04 Mar 2015 09:57:24 +0200 Vlad Zolotarov <vladz at cloudius-systems.com> wrote:
> > > On 03/04/15 02:33, Stephen Hemminger wrote: > > On Tue, 3 Mar 2015 21:48:43 +0200 > > Vlad Zolotarov <vladz at cloudius-systems.com> wrote: > > > >> + next_desc: > >> + /* > >> + * The code in this whole file uses the volatile pointer to > >> + * ensure the read ordering of the status and the rest of the > >> + * descriptor fields (on the compiler level only!!!). This is so > >> + * UGLY - why not to just use the compiler barrier instead? DPDK > >> + * even has the rte_compiler_barrier() for that. > >> + * > >> + * But most importantly this is just wrong because this doesn't > >> + * ensure memory ordering in a general case at all. For > >> + * instance, DPDK is supposed to work on Power CPUs where > >> + * compiler barrier may just not be enough! > >> + * > >> + * I tried to write only this function properly to have a > >> + * starting point (as a part of an LRO/RSC series) but the > >> + * compiler cursed at me when I tried to cast away the > >> + * "volatile" from rx_ring (yes, it's volatile too!!!). So, I'm > >> + * keeping it the way it is for now. > >> + * > >> + * The code in this file is broken in so many other places and > >> + * will just not work on a big endian CPU anyway therefore the > >> + * lines below will have to be revisited together with the rest > >> + * of the ixgbe PMD. > >> + * > >> + * TODO: > >> + * - Get rid of "volatile" crap and let the compiler do its > >> + * job. > >> + * - Use the proper memory barrier (rte_rmb()) to ensure the > >> + * memory ordering below. > > This comment screams "this is broken". > > Why not get proper architecture independent barriers in DPDK first. > > This series is orthogonal to the issue above. I just couldn't stand to > mention this ugliness when I noticed it on the way. > Note that although this is obviously not the right way to write this > kind of code it is still not a bug and most likely the performance > implications are minimal here. > The only overhead is that there may be read "too much" data from the > descriptor that we may not actually need. The descriptor is 16 bytes so > this doesn't seem to be a critical issue. > > So, fixing the above issue may wait, especially since the same s..t may > be found in other Intel PMDs (see i40e for example). Fixing this issue > should be a matter of a massive cleanup series that cover all the > relevant PMDs. Of course we may start with ixgbe but even in this single > PMD there are at least 3 non-LRO related functions that have to be > fixed, so IMHO even fixing ONLY ixgbe should be a matter of a separate > series. In userspace-rcu and kernel there is a simple macro that would make this kind of code more sane. What about adding: #define rte_access_once(x) (*(volatile typeof(x) *)&(x)) Then doing rxdp = rte_access_once(rx_ring + idx);