Hi Thomas, On 23 September 2016 at 21:41, Thomas Monjalon <thomas.monjalon at 6wind.com> wrote: > 2016-09-23 18:41, Jianbo Liu: >> On 23 September 2016 at 10:56, Wang, Zhihong <zhihong.wang at intel.com> >> wrote: >> ..... >> > This is expected because the 2nd patch is just a baseline and all >> > optimization >> > patches are organized in the rest of this patch set. >> > >> > I think you can do bottleneck analysis on ARM to see what's slowing down >> > the >> > perf, there might be some micro-arch complications there, mostly likely in >> > memcpy. >> > >> > Do you use glibc's memcpy? I suggest to hand-crafted it on your own. >> > >> > Could you publish the mrg_rxbuf=on data also? Since it's more widely used >> > in terms of spec integrity. >> > >> I don't think it will be helpful for you, considering the differences >> between x86 and arm. >> So please move on with this patchset... > > Jianbo, > I don't understand. > You said that the 2nd patch is a regression: > - volatile uint16_t last_used_idx; > + uint16_t last_used_idx; > No, I meant "vhost: rewrite enqueue".
> And the overrall series lead to performance regression > for packets > 512 B, right? > But we don't know wether you have tested the v6 or not. Yes, I tested v6, and found performance regression for size >=512B. > > Zhihong talked about some improvements possible in rte_memcpy. > ARM64 is using libc memcpy in rte_memcpy. > > Now you seem to give up. > Does it mean you accept having a regression in 16.11 release? > Are you working on rte_memcpy? This patchset actually improves performance according to Zhihong's result on x86 platfrom. And I also get improvement as least with small-size packet on ARM. I don't want to give up, but I need more time to find out the reason for the regression. I think rte_memcpy definitely is one of the ways to improve performance, but it could be the reason?