On Tue, Feb 17, 2015 at 04:00:56PM +0000, Bruce Richardson wrote: > On Tue, Feb 17, 2015 at 10:49:24AM -0500, Neil Horman wrote: > > On Tue, Feb 17, 2015 at 01:50:58PM +0000, Bruce Richardson wrote: > > > On Tue, Feb 17, 2015 at 02:28:02PM +0100, Olivier MATZ wrote: > > > > Hi Bruce, > > > > > > > > On 02/17/2015 01:25 PM, Bruce Richardson wrote: > > > > >On Mon, Feb 16, 2015 at 06:34:37PM +0100, Thomas Monjalon wrote: > > > > >>2015-02-16 15:16, Bruce Richardson: > > > > >>>In this specific instance, given that the application does little > > > > >>>else, there > > > > >>>is no real advantage to using the callbacks - it's just to have a > > > > >>>simple example > > > > >>>of how they can be used. > > > > >>> > > > > >>>Where callbacks are really designed to be useful, is for extending > > > > >>>or augmenting > > > > >>>hardware capabilities. Taking the example of sequence numbers - to > > > > >>>use the most > > > > >>>trivial example - an application could be written to take advantage > > > > >>>of sequence > > > > >>>numbers written to packets by the hardware which received them. > > > > >>>However, if such > > > > >>>an application was to be used with a NIC which does not provide > > > > >>>sequence numbering > > > > >>>capability, for example, anything using ixgbe driver, the > > > > >>>application writer has > > > > >>>two choices - either modify his application code to check each > > > > >>>packet for > > > > >>>a sequence number in the data path, and add it there post-rx, or > > > > >>>alternatively, > > > > >>>to check the NIC capabilities at initialization time, and add a > > > > >>>callback there > > > > >>>at initialization, if the hardware does not support it. In the > > > > >>>latter case, > > > > >>>the main packet processing body of the application can be written as > > > > >>>though > > > > >>>hardware always has sequence numbering capability, safe in the > > > > >>>knowledge that > > > > >>>any hardware not supporting it will be back-filled by a software > > > > >>>fallback at > > > > >>>initialization-time. > > > > >>> > > > > >>>By the same token, we could also look to extend hardware > > > > >>>capabilities. For > > > > >>>different filtering or hashing capabilities, there can be limits in > > > > >>>hardware > > > > >>>which are far less than what we need to use in software. Again, > > > > >>>callbacks will > > > > >>>allow the data path to be written in a way that is oblivious to the > > > > >>>underlying > > > > >>>hardware limits, because software will transparently fill in the > > > > >>>gaps. > > > > >>> > > > > >>>Hope this makes the use case clear. > > > > >> > > > > >>After thinking more about these callbacks, I realize these callbacks > > > > >>won't > > > > >>help, as Olivier said. > > > > >> > > > > >>With callback, > > > > >>1/ application checks device capability > > > > >>2/ application provides hardware emulation as DPDK callback > > > > >>3/ application forgets previous steps > > > > >>4/ application calls DPDK Rx > > > > >>5/ DPDK calls callback (without calling optimization) > > > > >> > > > > >>Without callback, > > > > >>1/ application checks device capability > > > > >>2/ application provides hardware emulation as internal function > > > > >>3/ application set an internal device-flag to enable this function > > > > >>4/ application calls DPDK Rx > > > > >>5/ application calls the hardware emulation if flag is set > > > > >> > > > > >>So the only difference is to keep persistent the device information in > > > > >>the application instead of storing it as a function pointer in the > > > > >>DPDK struct. > > > > >>You can also be faster with this approach: at initialization time, > > > > >>you can check that your NIC supports the feature and use a specific > > > > >>mainloop that adds or not the sequence number without any runtime > > > > >>test. > > > > > > > > > >That is assuming that all NICs are equal on your system. It's also > > > > >assuming > > > > >that you only have a single point in your application where you call > > > > >RX or > > > > >TX burst. In the case where you have a couple of different NICs on the > > > > >system, > > > > >or where you want to write an application to take advantage of > > > > >capabilities of > > > > >different NICs, the ability to resolve all these difference at > > > > >initialization > > > > >time is useful. The main packet handling code can be written with just > > > > >the > > > > >processing of packets in mind, rather than having to have a set of > > > > >branches > > > > >after each RX burst call, or before each TX burst call, to "smooth > > > > >out" the > > > > >different NIC capabilities. > > > > > > > > > >As for the option of maintaining different main loops for different > > > > >NICs with > > > > >different capabilities - that sounds like a maintenance nightmare to > > > > >me, due to duplicated code! Callbacks is a far cleaner solution than > > > > >that IMHO. > > > > > > > > Why not just provide a function like this: > > > > > > > > rte_do_unsupported_stuff_by_software(m[], m_count, wanted_features, > > > > dev_feature_flags) > > > > > > > > This function can be called (or not) from the application mainloop. > > > > You don't need to maintain several mainloops (for each device) as > > > > the specific work will be done depending on the given flags. And the > > > > applications that do not require these features (most applications?) > > > > are not penalized at all. > > > > > > Have you measured the performance hit due to this proposed change? In my > > > tests > > > it's very, very small, even for the fastest vectorized path. If > > > performance is > > > a real concern, I'm happy enough to have this as a compile-time option so > > > that > > > those who can't take the small performance hit can avoid it. > > > > > How can you assert performance metrics on a patch like this? The point of > > the > > change is to allow a callback to an application defined function, the > > contents > > of which are effectively arbitrary. Not saying that its the wrong thing to > > do, > > but you can't really claim performance is not impacted, because the details > > of > > whats executed is outside your purview. > > Neil > > > I think the performance hit being referenced is a hit due to the patch itself > without any callbacks being in use. (That was certainly my assumption in > replying) > I figured it was, but thats still something of a misnomer. Of course this change on its own is negligible in its performance impact. By itself, the impact is that of a branch that is unlikely to be taken, which is to say almost zero. But thats not an actionable number because the only time that performance is attainable if the user doesn't use it. Since you're posing a patch that makes application registered callbacks in a very fast path, I think its important to state very clearly that these callbacks will have a significant performance impact that individual applications will have to measure and be cogniscent of. Neil
> /Bruce >