Hi Peter, On Thu, Apr 7, 2016 at 5:15 PM, Peter Maydell <peter.mayd...@linaro.org> wrote: > On 7 April 2016 at 11:56, Vijay Kilari <vijay.kil...@gmail.com> wrote: >> On Thu, Apr 7, 2016 at 3:41 PM, Peter Maydell <peter.mayd...@linaro.org> >> wrote: >>> On 7 April 2016 at 10:58, <vija...@caviumnetworks.com> wrote: >>>> From: Vijaya Kumar K <vijaya.ku...@caviumnetworks.com> >>>> >>>> utils cannot read target cpu information to >>>> fetch cpu information to implement cpu specific >>>> features or erratas. For this parse /proc/cpuinfo >>>> and fetch cpu information. >>>> >>>> For now this helper only fetches cpu information >>>> for arm architectures. >>> >>> As I understand it /proc/cpuinfo is intended only for >>> humans to read. Please don't write code to parse it; >>> find a different way to get this information instead >>> if you really need it. > >> Also unlike x86 there is no cpuid.h where we can get cpu identification >> information for arm64. > > I'm told there are kernel patches in progress to get this sort > of information in a maintainable way to userspace, which are > currently somewhat stalled due to lack of anybody who wants to > consume it. If you have a use case then you should probably > flag it up with the kernel devs.
Can you please give references to those patches/discussion? > > That said, I think we should probably hold off on this > discussion until we have clearer benchmarking info that > demonstrates that doing these prefetches really does make > a significant difference. I would much prefer to have a Thunderx pass2 board does not have hardware prefetch. So explicit sw prefetch instructions is required for this platform. Here is the benchmarking result with and without prefetch. of an idle VM with 4 VCPUS, 8GB RAM. Without prefech, total migration time is 8.2 seconds With prefetch total migration time is 2.7 seconds. Without prefetch: ------------------------ (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off Migration status: completed total time: 8217 milliseconds downtime: 86 milliseconds setup: 4 milliseconds transferred ram: 212624 kbytes throughput: 212.08 mbps remaining ram: 0 kbytes total ram: 8520128 kbytes duplicate: 2085805 pages skipped: 0 pages normal: 48478 pages normal bytes: 193912 kbytes dirty sync count: 3 With prefetch: -------------------- (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off Migration status: completed total time: 2744 milliseconds downtime: 48 milliseconds setup: 5 milliseconds transferred ram: 213526 kbytes throughput: 637.76 mbps remaining ram: 0 kbytes total ram: 8520128 kbytes duplicate: 2085014 pages skipped: 0 pages normal: 48705 pages normal bytes: 194820 kbytes dirty sync count: 3 > single aarch64 routine that works for everybody, rather > than a thunderx-only special case. Now, I found that the generic existings function by name buffer_find_nonzero_offset_inner() can be made to work with neon. So no need of special function by name buffer_find_nonzero_offset_neon() for arm64 creating in this patch series. However, adding prefetch code needs to be added for performance reason.