buffer_find_nonzero_offset() is a hot function during live migration. Now it use SSE2 instructions for optimization. For platform supports AVX2 instructions, use the AVX2 instructions for optimization can help to improve the performance of zero page checking about 30% comparing to SSE2. Live migration can be faster with this optimization, the test result shows that for an 8GB RAM idle guest, this patch can help to shorten the total live migration time about 6%.
This patch use the ifunc mechanism to select the proper function when running, for platform supports AVX2, execute the AVX2 instructions, else, execute the original instructions. With this patch, the QEMU binary can run on both platforms support AVX2 or not. Compiler which doesn't support the AVX2 and ifunc attribute can also build the source code successfully. v5 -> v6 changes: * Restrict the optimization to GCC 4.9+ to prevent compiling failure in some case (Paolo's suggestion) v4 -> v5 changes: * Enhance the ifunc attribute detection (Paolo's suggestion) v3 -> v4 changes: * Use the GCC #pragma to make things simple (Paolo's suggestion) * Put avx2 related code in cutils.c (Richard's suggestion) * Change the configure, detect ifunc and avx2 attributes together v2 -> v3 changes: * Detect the ifunc attribute support (Paolo's suggestion) * Use the ifunc attribute instead of the inline asm (Richard's suggestion) * Change the configure (Juan's suggestion) Liang Li (2): configure: detect ifunc and avx2 attribute cutils: add avx2 instruction optimization configure | 21 +++++++++ include/qemu-common.h | 8 +--- util/cutils.c | 124 ++++++++++++++++++++++++++++++++++++++++++++++++-- 3 files changed, 142 insertions(+), 11 deletions(-) -- 1.9.1