On Fri, Jul 9, 2021 at 2:44 PM Bruce Richardson <bruce.richard...@intel.com> wrote: > > On Fri, Jul 09, 2021 at 12:05:40AM +0530, Jerin Jacob wrote: > > On Thu, Jul 8, 2021 at 8:41 AM fengchengwen <fengcheng...@huawei.com> wrote: > > > > > > > > >>> > > > >>> It's just more conditionals and branches all through the code. Inside > > > >>> the > > > >>> user application, the user has to check whether to set the flag or > > > >>> not (or > > > >>> special-case the last transaction outside the loop), and within the > > > >>> driver, > > > >>> there has to be a branch whether or not to call the doorbell > > > >>> function. The > > > >>> code on both sides is far simpler and more readable if the doorbell > > > >>> function is exactly that - a separate function. > > > >> > > > >> I disagree. The reason is: > > > >> > > > >> We will have two classes of applications > > > >> > > > >> a) do dma copy request as and when it has data(I think, this is the > > > >> prime use case), for those, > > > >> I think, it is considerable overhead to have two function invocation > > > >> per transfer i.e > > > >> rte_dma_copy() and rte_dma_perform() > > > >> > > > >> b) do dma copy when the data is reached to a logical state, like copy > > > >> IP frame from Ethernet packets or so, > > > >> In that case, the application will have a LOGIC to detect when to > > > >> perform it so on the end of > > > >> that rte_dma_copy() flag can be updated to fire the doorbell. > > > >> > > > >> IMO, We are comparing against a branch(flag is already in register) vs > > > >> a set of instructions for > > > >> 1) function pointer overhead > > > >> 2) Need to use the channel context again back in another function. > > > >> > > > >> IMO, a single branch is most optimal from performance PoV. > > > >> > > > > Ok, let's try it and see how it goes. > > > > > > Test result show: > > > 1) For Kunpeng platform (ARMv8) could benefit very little with doorbell > > > in flags > > > 2) For Xeon E5-2690 v2 (X86) could benefit with separate function > > > 3) Both platform could benefit with doorbell in flags if burst < 5 > > > > > > There is a performance gain in small bursts (<5). Given the extensive use > > > of bursts > > in DPDK applications and users are accustomed to the concept, I do > > not recommend > > > using the 'doorbell' in flags. > > > > There is NO concept change between one option vs other option. Just > > argument differnet. > > Also, _perform() scheme not used anywhere in DPDK. I > > > > Regarding performance, I have added dummy instructions to simulate the real > > work > > load[1], now burst also has some gain in both x86 and arm64[3] > > > > I have modified your application[2] to dpdk test application to use > > cpu isolation etc. > > So this is gain in flag scheme ad code is checked in to Github[2[ > > > <snip> > > The benchmark numbers all seem very close between the two schemes. On my > team we pretty much have test ioat & idxd drivers ported internally to the > last dmadev draft library, and have sample apps handling traffic using > those. I'll therefore attempt to get these numbers with real traffic on > real drivers to just double check that it's the same as these > microbenchmarks.
Thanks. > > Assuming that perf is the same, how to resolve this? Some thoughts: > * As I understand it, the main objection to the separate doorbell function > is the use of 8-bytes in fastpath slot. Therefore I will also attempt to > benchmark having the doorbell function not on the same cacheline and check > perf impact, if any. Probably we can remove rte_dmadev_fill_sg() variant and keep sg only for copy to save 8B. > * If we don't have a impact to perf by having the doorbell function inside > the regular "ops" rather than on fastpath cacheline, there is no reason > we can't implement both schemes. The user can then choose themselves > whether to doorbell using a flag on last item, or to doorbell explicitly > using function call. Yes. I think, we can keep both. > > Of the two schemes, and assuming they are equal, I do have a preference for > the separate function one, primarily from a code readability point of view. > Other than that, I have no strong opinions. > > /Bruce