On Fri, Jul 9, 2021 at 2:44 PM Bruce Richardson
<bruce.richard...@intel.com> wrote:
>
> On Fri, Jul 09, 2021 at 12:05:40AM +0530, Jerin Jacob wrote:
> > On Thu, Jul 8, 2021 at 8:41 AM fengchengwen <fengcheng...@huawei.com> wrote:
> > >
> >
> > > >>>
> > > >>> It's just more conditionals and branches all through the code. Inside 
> > > >>> the
> > > >>> user application, the user has to check whether to set the flag or 
> > > >>> not (or
> > > >>> special-case the last transaction outside the loop), and within the 
> > > >>> driver,
> > > >>> there has to be a branch whether or not to call the doorbell 
> > > >>> function. The
> > > >>> code on both sides is far simpler and more readable if the doorbell
> > > >>> function is exactly that - a separate function.
> > > >>
> > > >> I disagree. The reason is:
> > > >>
> > > >> We will have two classes of applications
> > > >>
> > > >> a) do dma copy request as and when it has data(I think, this is the
> > > >> prime use case), for those,
> > > >> I think, it is considerable overhead to have two function invocation
> > > >> per transfer i.e
> > > >> rte_dma_copy() and rte_dma_perform()
> > > >>
> > > >> b) do dma copy when the data is reached to a logical state,  like copy
> > > >> IP frame from Ethernet packets or so,
> > > >> In that case, the application will have  a LOGIC to detect when to
> > > >> perform it so on the end of
> > > >> that rte_dma_copy() flag can be updated to fire the doorbell.
> > > >>
> > > >> IMO, We are comparing against a branch(flag is already in register) vs
> > > >> a set of instructions for
> > > >> 1) function pointer overhead
> > > >> 2) Need to use the channel context again back in another function.
> > > >>
> > > >> IMO, a single branch is most optimal from performance PoV.
> > > >>
> > > > Ok, let's try it and see how it goes.
> > >
> > > Test result show:
> > > 1) For Kunpeng platform (ARMv8) could benefit very little with doorbell 
> > > in flags
> > > 2) For Xeon E5-2690 v2 (X86) could benefit with separate function
> > > 3) Both platform could benefit with doorbell in flags if burst < 5
> > >
> > > There is a performance gain in small bursts (<5). Given the extensive use 
> > > of bursts
> >  in DPDK applications and users are accustomed to the concept, I do
> > not recommend
> > > using the 'doorbell' in flags.
> >
> > There is NO concept change between one option vs other option. Just
> > argument differnet.
> > Also, _perform() scheme not used anywhere in DPDK. I
> >
> > Regarding performance, I have added dummy instructions to simulate the real 
> > work
> > load[1], now burst also has some gain in both x86 and arm64[3]
> >
> > I have modified your application[2] to dpdk test application to use
> > cpu isolation etc.
> > So this is gain in flag scheme ad code is checked in to Github[2[
> >
> <snip>
>
> The benchmark numbers all seem very close between the two schemes. On my
> team we pretty much have test ioat & idxd drivers ported internally to the
> last dmadev draft library, and have sample apps handling traffic using
> those. I'll therefore attempt to get these numbers with real traffic on
> real drivers to just double check that it's the same as these
> microbenchmarks.

Thanks.

>
> Assuming that perf is the same, how to resolve this? Some thoughts:
> * As I understand it, the main objection to the separate doorbell function
>   is the use of 8-bytes in fastpath slot. Therefore I will also attempt to
>   benchmark having the doorbell function not on the same cacheline and check
>   perf impact, if any.

Probably we can remove rte_dmadev_fill_sg() variant and keep sg only for copy
to save 8B.

> * If we don't have a impact to perf by having the doorbell function inside
>   the regular "ops" rather than on fastpath cacheline, there is no reason
>   we can't implement both schemes. The user can then choose themselves
>   whether to doorbell using a flag on last item, or to doorbell explicitly
>   using function call.

Yes. I think, we can keep both.

>
> Of the two schemes, and assuming they are equal, I do have a preference for
> the separate function one, primarily from a code readability point of view.
> Other than that, I have no strong opinions.
>
> /Bruce

Reply via email to