2016-09-23 18:41, Jianbo Liu: > On 23 September 2016 at 10:56, Wang, Zhihong <zhihong.wang at intel.com> > wrote: > ..... > > This is expected because the 2nd patch is just a baseline and all > > optimization > > patches are organized in the rest of this patch set. > > > > I think you can do bottleneck analysis on ARM to see what's slowing down the > > perf, there might be some micro-arch complications there, mostly likely in > > memcpy. > > > > Do you use glibc's memcpy? I suggest to hand-crafted it on your own. > > > > Could you publish the mrg_rxbuf=on data also? Since it's more widely used > > in terms of spec integrity. > > > I don't think it will be helpful for you, considering the differences > between x86 and arm. > So please move on with this patchset...
Jianbo, I don't understand. You said that the 2nd patch is a regression: - volatile uint16_t last_used_idx; + uint16_t last_used_idx; And the overrall series lead to performance regression for packets > 512 B, right? But we don't know wether you have tested the v6 or not. Zhihong talked about some improvements possible in rte_memcpy. ARM64 is using libc memcpy in rte_memcpy. Now you seem to give up. Does it mean you accept having a regression in 16.11 release? Are you working on rte_memcpy?