On Thu, Aug 25, 2016 at 12:07 PM, Vijay Kilari <vijay.kil...@gmail.com> wrote: > Hi Richard, > > Migration fails on arm64 with these patches. > On the destination VM, follow errors are appearing. > > qemu-system-aarch64: VQ 0 size 0x400 Guest index 0x0 inconsistent with > Host index 0x1937: delta 0xe6c9 > qemu-system-aarch64: error while loading state for instance 0x0 of > device 'virtio-mmio@000000000a003e00/virtio-net' > qemu-system-aarch64: load of migration failed: Operation not permitted > qemu-system-aarch64: network script /etc/qemu-ifdown failed with status 256
With below changes, migration is working fine on arm64. diff --git a/util/cutils.c b/util/cutils.c index 30fac02..9bbf31f 100644 --- a/util/cutils.c +++ b/util/cutils.c @@ -170,6 +170,7 @@ static bool __attribute__((noinline)) \ NAME(const void *buf, size_t len) \ { \ const void *end = buf + len; \ + const VECTYPE zero = (VECTYPE){0}; \ do { \ const VECTYPE *p = buf; \ VECTYPE t; \ @@ -185,7 +186,7 @@ NAME(const void *buf, size_t len) \ } else { \ link_error(); \ } \ - if (unlikely(!ZERO(t))) { \ + if (unlikely(!ZERO(t, zero))) { \ return false; \ } \ buf += SIZE; \ @@ -227,7 +228,7 @@ buffer_zero_base(const void *buf, size_t len) return true; } -#define IDENT_ZERO(X) (X) +#define IDENT_ZERO(X1, X2) (X1 == X2) ACCEL_BUFFER_ZERO(buffer_zero_int, 4*sizeof(long), long, IDENT_ZERO) static bool select_accel_int(const void *buf, size_t len) @@ -511,7 +512,9 @@ static bool select_accel_fn(const void *buf, size_t len) #elif defined(__aarch64__) #include "arm_neon.h" -#define DO_ZERO(X) (vgetq_lane_u64((X), 0) | vgetq_lane_u64((X), 1)) +#define DO_ZERO(X1, X2) \ + ((vgetq_lane_u64(X1, 0) == vgetq_lane_u64(X2, 0)) && \ + (vgetq_lane_u64(X1, 1) == vgetq_lane_u64(X2, 1))) ACCEL_BUFFER_ZERO(buffer_zero_neon_64, 64, uint64x2_t, DO_ZERO) ACCEL_BUFFER_ZERO(buffer_zero_neon_128, 128, uint64x2_t, DO_ZERO) @@ -526,7 +529,7 @@ static void __attribute__((constructor)) init_buffer_zero_accel(void) since the later is not available to userspace. This seems to work in practice for existing implementations. */ asm("mrs %0, dczid_el0" : "=r"(t)); - if ((t & 15) * 16 >= 128) { + if (pow(2, (t & 0xf)) * 4 >= 128) { buffer_zero_line_mask = 128 - 1; buffer_zero_accel = buffer_zero_neon_128; } else { > > Regards > Vijay > > > On Wed, Aug 24, 2016 at 9:47 AM, Richard Henderson <r...@twiddle.net> wrote: >> Patches 1-3 remove the use of ifunc from the implementation. >> >> Patch 5 adjusts the x86 implementation a bit more to take >> advantage of ptest (in sse4.1) and unaligned accesses (in avx1). >> >> Patches 2 and 6 are the result of my conversation with Vijaya >> Kumar with respect to ThunderX. >> >> Patch 7 is the result of seeing some really really horrible code >> produced for ppc64le (gcc 4.9 and mainline). >> >> This has had limited testing. What I don't know is the best way >> to benchmark this -- the only way I know to trigger this is via >> the console, by hand, which doesn't make for reasonable timing. >> >> >> r~ >> >> >> Richard Henderson (7): >> cutils: Remove SPLAT macro >> cutils: Export only buffer_is_zero >> cutils: Rearrange buffer_is_zero acceleration >> cutils: Add generic prefetch >> cutils: Rewrite x86 buffer zero checking >> cutils: Rewrite aarch64 buffer zero checking >> cutils: Rewrite ppc buffer zero checking >> >> configure | 21 +- >> include/qemu/cutils.h | 2 - >> migration/ram.c | 2 +- >> migration/rdma.c | 5 +- >> util/cutils.c | 526 >> +++++++++++++++++++++++++++++++++----------------- >> 5 files changed, 352 insertions(+), 204 deletions(-) >> >> -- >> 2.7.4 >>