On 10/11/2016 08:04 AM, Yuanhan Liu wrote: > On Mon, Oct 10, 2016 at 04:54:39PM +0200, Maxime Coquelin wrote: >> >> >> On 10/10/2016 04:42 PM, Yuanhan Liu wrote: >>> On Mon, Oct 10, 2016 at 02:40:44PM +0200, Maxime Coquelin wrote: >>>>>>> At that time, a packet always use 2 descs. Since indirect desc is >>>>>>> enabled (by default) now, the assumption is not true then. What's >>>>>>> worse, it might even slow things a bit down. That should also be >>>>>>> part of the reason why performance is slightly worse than before. >>>>>>> >>>>>>> --yliu >>>>>> >>>>>> I'm not sure I get what you are saying >>>>>> >>>>>>> commit 1d41d77cf81c448c1b09e1e859bfd300e2054a98 >>>>>>> Author: Yuanhan Liu <yuanhan.liu at linux.intel.com> >>>>>>> Date: Mon May 2 17:46:17 2016 -0700 >>>>>>> >>>>>>> vhost: optimize dequeue for small packets >>>>>>> >>>>>>> A virtio driver normally uses at least 2 desc buffers for Tx: the >>>>>>> first for storing the header, and the others for storing the data. >>>>>>> >>>>>>> Therefore, we could fetch the first data desc buf before the main >>>>>>> loop, and do the copy first before the check of "are we done yet?". >>>>>>> This could save one check for small packets that just have one data >>>>>>> desc buffer and need one mbuf to store it. >>>>>>> >>>>>>> Signed-off-by: Yuanhan Liu <yuanhan.liu at linux.intel.com> >>>>>>> Acked-by: Huawei Xie <huawei.xie at intel.com> >>>>>>> Tested-by: Rich Lane <rich.lane at bigswitch.com> >>>>>> >>>>>> This fast-paths the 2-descriptors format but it's not active >>>>>> for indirect descriptors. Is this what you mean? >>>>> >>>>> Yes. It's also not active when ANY_LAYOUT is actually turned on. >>>>>> Should be a simple matter to apply this optimization for indirect. >>>>> >>>>> Might be. >>>> >>>> If I understand the code correctly, indirect descs also benefit from this >>>> optimization, or am I missing something? >>> >>> Aha..., you are right! >> >> The interesting thing is that the patch I send on Thursday that removes >> header access when no offload has been negotiated[0] seems to reduce >> almost to zero the performance seen with indirect descriptors enabled. > > Didn't follow that. > >> I see this with 64 bytes packets using testpmd on both ends. >> >> When I did the patch, I would have expected the same gain with both >> modes, whereas I measured +1% for direct and +4% for indirect. > > IIRC, I did a test before (remove those offload code piece), and the > performance was basically the same before and after that. Well, there > might be some small difference, say 1% as you said. But the result has > never been steady. > > Anyway, I think your patch is good to have: I just didn't see v2.
I waited to gather some comments/feedback before sending the v2. I'll send it today or tomorrow. Thanks, Maxime