> -----Original Message----- > From: Jianbo Liu [mailto:jianbo.liu at linaro.org] > Sent: Monday, September 26, 2016 1:13 PM > To: Wang, Zhihong <zhihong.wang at intel.com> > Cc: Thomas Monjalon <thomas.monjalon at 6wind.com>; dev at dpdk.org; Yuanhan > Liu <yuanhan.liu at linux.intel.com>; Maxime Coquelin > <maxime.coquelin at redhat.com> > Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue > > On 25 September 2016 at 13:41, Wang, Zhihong <zhihong.wang at intel.com> > wrote: > > > > > >> -----Original Message----- > >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com] > >> Sent: Friday, September 23, 2016 9:41 PM > >> To: Jianbo Liu <jianbo.liu at linaro.org> > >> Cc: dev at dpdk.org; Wang, Zhihong <zhihong.wang at intel.com>; Yuanhan Liu > >> <yuanhan.liu at linux.intel.com>; Maxime Coquelin > >> <maxime.coquelin at redhat.com> > .... > > This patch does help in ARM for small packets like 64B sized ones, > > this actually proves the similarity between x86 and ARM in terms > > of caching optimization in this patch. > > > > My estimation is based on: > > > > 1. The last patch are for mrg_rxbuf=on, and since you said it helps > > perf, we can ignore it for now when we discuss mrg_rxbuf=off > > > > 2. Vhost enqueue perf = > > Ring overhead + Virtio header overhead + Data memcpy overhead > > > > 3. This patch helps small packets traffic, which means it helps > > ring + virtio header operations > > > > 4. So, when you say perf drop when packet size larger than 512B, > > this is most likely caused by memcpy in ARM not working well > > with this patch > > > > I'm not saying glibc's memcpy is not good enough, it's just that > > this is a rather special use case. And since we see specialized > > memcpy + this patch give better performance than other combinations > > significantly on x86, we suggest to hand-craft a specialized memcpy > > for it. > > > > Of course on ARM this is still just my speculation, and we need to > > either prove it or find the actual root cause. > > > > It can be **REALLY HELPFUL** if you could help to test this patch on > > ARM for mrg_rxbuf=on cases to see if this patch is in fact helpful > > to ARM at all, since mrg_rxbuf=on the more widely used cases. > > > Actually it's worse than mrg_rxbuf=off.
I mean compare the perf of original vs. original + patch with mrg_rxbuf turned on. Is there any perf improvement?