On 04/02/2016 08:48, Gonglei (Arei) wrote: > 11.44% qemu-kvm [.] memory_region_find > 6.31% qemu-kvm [.] qemu_get_ram_ptr > 4.61% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt > 3.54% qemu-kvm [.] qemu_ram_addr_from_host > 2.80% libpthread-2.19.so [.] pthread_mutex_lock > 2.55% qemu-kvm [.] object_unref > 2.49% libc-2.19.so [.] malloc > 2.47% libc-2.19.so [.] _int_malloc > 2.34% libc-2.19.so [.] _int_free > 2.18% qemu-kvm [.] object_ref > 2.18% qemu-kvm [.] address_space_translate > 2.03% libc-2.19.so [.] __memcpy_sse2_unaligned > 1.76% libc-2.19.so [.] malloc_consolidate > 1.56% qemu-kvm [.] addrrange_intersection > 1.52% qemu-kvm [.] vring_pop > 1.36% qemu-kvm [.] find_next_zero_bit > 1.30% [kernel] [k] native_write_msr_safe > 1.29% qemu-kvm [.] addrrange_intersects > 1.21% qemu-kvm [.] vring_map > 0.93% qemu-kvm [.] virtio_notify > > Do you have any thoughts to decrease the cpu overhead and get higher through > output? Thanks!
Using bigger chunks than 256 bytes will reduce the overhead in memory_region_find and qemu_get_ram_ptr. You could expect a further 10-12% improvement. Paolo