> -----Original Message-----
> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> Sent: Monday, September 26, 2016 1:39 PM
> To: Wang, Zhihong <zhihong.wang at intel.com>
> Cc: Thomas Monjalon <thomas.monjalon at 6wind.com>; dev at dpdk.org; Yuanhan
> Liu <yuanhan.liu at linux.intel.com>; Maxime Coquelin
> <maxime.coquelin at redhat.com>
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> 
> On 26 September 2016 at 13:25, Wang, Zhihong <zhihong.wang at intel.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> >> Sent: Monday, September 26, 2016 1:13 PM
> >> To: Wang, Zhihong <zhihong.wang at intel.com>
> >> Cc: Thomas Monjalon <thomas.monjalon at 6wind.com>; dev at dpdk.org;
> Yuanhan
> >> Liu <yuanhan.liu at linux.intel.com>; Maxime Coquelin
> >> <maxime.coquelin at redhat.com>
> >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> >>
> >> On 25 September 2016 at 13:41, Wang, Zhihong <zhihong.wang at intel.com>
> >> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> >> >> Sent: Friday, September 23, 2016 9:41 PM
> >> >> To: Jianbo Liu <jianbo.liu at linaro.org>
> >> >> Cc: dev at dpdk.org; Wang, Zhihong <zhihong.wang at intel.com>; Yuanhan
> Liu
> >> >> <yuanhan.liu at linux.intel.com>; Maxime Coquelin
> >> >> <maxime.coquelin at redhat.com>
> >> ....
> >> > This patch does help in ARM for small packets like 64B sized ones,
> >> > this actually proves the similarity between x86 and ARM in terms
> >> > of caching optimization in this patch.
> >> >
> >> > My estimation is based on:
> >> >
> >> >  1. The last patch are for mrg_rxbuf=on, and since you said it helps
> >> >     perf, we can ignore it for now when we discuss mrg_rxbuf=off
> >> >
> >> >  2. Vhost enqueue perf =
> >> >     Ring overhead + Virtio header overhead + Data memcpy overhead
> >> >
> >> >  3. This patch helps small packets traffic, which means it helps
> >> >     ring + virtio header operations
> >> >
> >> >  4. So, when you say perf drop when packet size larger than 512B,
> >> >     this is most likely caused by memcpy in ARM not working well
> >> >     with this patch
> >> >
> >> > I'm not saying glibc's memcpy is not good enough, it's just that
> >> > this is a rather special use case. And since we see specialized
> >> > memcpy + this patch give better performance than other combinations
> >> > significantly on x86, we suggest to hand-craft a specialized memcpy
> >> > for it.
> >> >
> >> > Of course on ARM this is still just my speculation, and we need to
> >> > either prove it or find the actual root cause.
> >> >
> >> > It can be **REALLY HELPFUL** if you could help to test this patch on
> >> > ARM for mrg_rxbuf=on cases to see if this patch is in fact helpful
> >> > to ARM at all, since mrg_rxbuf=on the more widely used cases.
> >> >
> >> Actually it's worse than mrg_rxbuf=off.
> >
> > I mean compare the perf of original vs. original + patch with
> > mrg_rxbuf turned on. Is there any perf improvement?
> >
> Yes, orig + patch + on is better than orig + on, but orig + patch + on
> is worse than orig + patch + off.


Hi Jianbo,

That's the way it is for virtio, if you compare the current enqueue,
the mrg on perf is even slower.

We should compare:

 1. mrg on: orig vs. orig + patch

 2. mrg off: orig vs. orig + patch

There's more memory touch and in the frontend that brings down the
performance when mrg is on.

Finally, even though mrg on is slower, it's still the mainstream use case
as far as I know.


Thanks
Zhihong

Reply via email to