On Tue, Feb 17, 2015 at 04:15:09PM +0000, Bruce Richardson wrote:
> On Tue, Feb 17, 2015 at 11:08:10AM -0500, Neil Horman wrote:
> > On Tue, Feb 17, 2015 at 04:00:56PM +0000, Bruce Richardson wrote:
> > > On Tue, Feb 17, 2015 at 10:49:24AM -0500, Neil Horman wrote:
> > > > On Tue, Feb 17, 2015 at 01:50:58PM +0000, Bruce Richardson wrote:
> > > > > On Tue, Feb 17, 2015 at 02:28:02PM +0100, Olivier MATZ wrote:
> > > > > > Hi Bruce,
> > > > > >
> > > > > > On 02/17/2015 01:25 PM, Bruce Richardson wrote:
> > > > > > >On Mon, Feb 16, 2015 at 06:34:37PM +0100, Thomas Monjalon wrote:
> > > > > > >>2015-02-16 15:16, Bruce Richardson:
> > > > > > >>>In this specific instance, given that the application does
> > > > > > >>>little else, there
> > > > > > >>>is no real advantage to using the callbacks - it's just to have
> > > > > > >>>a simple example
> > > > > > >>>of how they can be used.
> > > > > > >>>
> > > > > > >>>Where callbacks are really designed to be useful, is for
> > > > > > >>>extending or augmenting
> > > > > > >>>hardware capabilities. Taking the example of sequence numbers -
> > > > > > >>>to use the most
> > > > > > >>>trivial example - an application could be written to take
> > > > > > >>>advantage of sequence
> > > > > > >>>numbers written to packets by the hardware which received them.
> > > > > > >>>However, if such
> > > > > > >>>an application was to be used with a NIC which does not provide
> > > > > > >>>sequence numbering
> > > > > > >>>capability, for example, anything using ixgbe driver, the
> > > > > > >>>application writer has
> > > > > > >>>two choices - either modify his application code to check each
> > > > > > >>>packet for
> > > > > > >>>a sequence number in the data path, and add it there post-rx, or
> > > > > > >>>alternatively,
> > > > > > >>>to check the NIC capabilities at initialization time, and add a
> > > > > > >>>callback there
> > > > > > >>>at initialization, if the hardware does not support it. In the
> > > > > > >>>latter case,
> > > > > > >>>the main packet processing body of the application can be
> > > > > > >>>written as though
> > > > > > >>>hardware always has sequence numbering capability, safe in the
> > > > > > >>>knowledge that
> > > > > > >>>any hardware not supporting it will be back-filled by a software
> > > > > > >>>fallback at
> > > > > > >>>initialization-time.
> > > > > > >>>
> > > > > > >>>By the same token, we could also look to extend hardware
> > > > > > >>>capabilities. For
> > > > > > >>>different filtering or hashing capabilities, there can be limits
> > > > > > >>>in hardware
> > > > > > >>>which are far less than what we need to use in software. Again,
> > > > > > >>>callbacks will
> > > > > > >>>allow the data path to be written in a way that is oblivious to
> > > > > > >>>the underlying
> > > > > > >>>hardware limits, because software will transparently fill in the
> > > > > > >>>gaps.
> > > > > > >>>
> > > > > > >>>Hope this makes the use case clear.
> > > > > > >>
> > > > > > >>After thinking more about these callbacks, I realize these
> > > > > > >>callbacks won't
> > > > > > >>help, as Olivier said.
> > > > > > >>
> > > > > > >>With callback,
> > > > > > >>1/ application checks device capability
> > > > > > >>2/ application provides hardware emulation as DPDK callback
> > > > > > >>3/ application forgets previous steps
> > > > > > >>4/ application calls DPDK Rx
> > > > > > >>5/ DPDK calls callback (without calling optimization)
> > > > > > >>
> > > > > > >>Without callback,
> > > > > > >>1/ application checks device capability
> > > > > > >>2/ application provides hardware emulation as internal function
> > > > > > >>3/ application set an internal device-flag to enable this function
> > > > > > >>4/ application calls DPDK Rx
> > > > > > >>5/ application calls the hardware emulation if flag is set
> > > > > > >>
> > > > > > >>So the only difference is to keep persistent the device
> > > > > > >>information in
> > > > > > >>the application instead of storing it as a function pointer in the
> > > > > > >>DPDK struct.
> > > > > > >>You can also be faster with this approach: at initialization time,
> > > > > > >>you can check that your NIC supports the feature and use a
> > > > > > >>specific
> > > > > > >>mainloop that adds or not the sequence number without any runtime
> > > > > > >>test.
> > > > > > >
> > > > > > >That is assuming that all NICs are equal on your system. It's also
> > > > > > >assuming
> > > > > > >that you only have a single point in your application where you
> > > > > > >call RX or
> > > > > > >TX burst. In the case where you have a couple of different NICs on
> > > > > > >the system,
> > > > > > >or where you want to write an application to take advantage of
> > > > > > >capabilities of
> > > > > > >different NICs, the ability to resolve all these difference at
> > > > > > >initialization
> > > > > > >time is useful. The main packet handling code can be written with
> > > > > > >just the
> > > > > > >processing of packets in mind, rather than having to have a set of
> > > > > > >branches
> > > > > > >after each RX burst call, or before each TX burst call, to "smooth
> > > > > > >out" the
> > > > > > >different NIC capabilities.
> > > > > > >
> > > > > > >As for the option of maintaining different main loops for
> > > > > > >different NICs with
> > > > > > >different capabilities - that sounds like a maintenance nightmare
> > > > > > >to
> > > > > > >me, due to duplicated code! Callbacks is a far cleaner solution
> > > > > > >than that IMHO.
> > > > > >
> > > > > > Why not just provide a function like this:
> > > > > >
> > > > > > rte_do_unsupported_stuff_by_software(m[], m_count,
> > > > > > wanted_features,
> > > > > > dev_feature_flags)
> > > > > >
> > > > > > This function can be called (or not) from the application mainloop.
> > > > > > You don't need to maintain several mainloops (for each device) as
> > > > > > the specific work will be done depending on the given flags. And the
> > > > > > applications that do not require these features (most applications?)
> > > > > > are not penalized at all.
> > > > >
> > > > > Have you measured the performance hit due to this proposed change? In
> > > > > my tests
> > > > > it's very, very small, even for the fastest vectorized path. If
> > > > > performance is
> > > > > a real concern, I'm happy enough to have this as a compile-time
> > > > > option so that
> > > > > those who can't take the small performance hit can avoid it.
> > > > >
> > > > How can you assert performance metrics on a patch like this? The point
> > > > of the
> > > > change is to allow a callback to an application defined function, the
> > > > contents
> > > > of which are effectively arbitrary. Not saying that its the wrong
> > > > thing to do,
> > > > but you can't really claim performance is not impacted, because the
> > > > details of
> > > > whats executed is outside your purview.
> > > > Neil
> > > >
> > > I think the performance hit being referenced is a hit due to the patch
> > > itself
> > > without any callbacks being in use. (That was certainly my assumption in
> > > replying)
> > >
> > I figured it was, but thats still something of a misnomer. Of course this
> > change on its own is negligible in its performance impact. By itself, the
> > impact is that of a branch that is unlikely to be taken, which is to say
> > almost
> > zero. But thats not an actionable number because the only time that
> > performance
> > is attainable if the user doesn't use it. Since you're posing a patch that
> > makes application registered callbacks in a very fast path, I think its
> > important to state very clearly that these callbacks will have a significant
> > performance impact that individual applications will have to measure and be
> > cogniscent of.
> > Neil
> >
> Yes, agreed.
> But if the app were to directly implement the same functionality directly
> rather
> than via callbacks, the performance would be about the same (sometimes better,
> sometimes worse, I suspect, depending on how it's done).
>
No argument, but doing so makes it clearly apparent to the application developer
that they are adding cycles to a hot path. That becomes much more obfuscated
when you register callbacks, and so it is imperitive to not make ambiguous
claims like "the performance impact is zero".
Neil