> On 04/15/14 01:55, Michael R. Hines wrote: >> On 04/14/2014 05:19 PM, Laszlo Ersek wrote: >>> On 04/14/14 04:27, Amos Kong wrote: >>>> We already have a function buffer_is_zero() in util/cutils.c >>>> >>>> Signed-off-by: Amos Kong <ak...@redhat.com> >>>> --- >>>> arch_init.c | 9 ++------- >>>> 1 file changed, 2 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/arch_init.c b/arch_init.c >>>> index 60c975d..342e5dc 100644 >>>> --- a/arch_init.c >>>> +++ b/arch_init.c >>>> @@ -152,11 +152,6 @@ int qemu_read_default_config_files(bool userconfig) >>>> return 0; >>>> } >>>> -static inline bool is_zero_range(uint8_t *p, uint64_t size) >>>> -{ >>>> - return buffer_find_nonzero_offset(p, size) == size; >>>> -} >>>> - >>>> /* struct contains XBZRLE cache and a static page >>>> used by the compression */ >>>> static struct { >>>> @@ -587,7 +582,7 @@ static int ram_save_block(QEMUFile *f, bool >>>> last_stage) >>>> acct_info.dup_pages++; >>>> } >>>> } >>>> - } else if (is_zero_range(p, TARGET_PAGE_SIZE)) { >>>> + } else if (buffer_is_zero(p, TARGET_PAGE_SIZE)) { >>>> acct_info.dup_pages++; >>>> bytes_sent = save_block_hdr(f, block, offset, cont, >>>> RAM_SAVE_FLAG_COMPRESS); >>>> @@ -983,7 +978,7 @@ static inline void >>>> *host_from_stream_offset(QEMUFile *f, >>>> */ >>>> void ram_handle_compressed(void *host, uint8_t ch, uint64_t size) >>>> { >>>> - if (ch != 0 || !is_zero_range(host, size)) { >>>> + if (ch != 0 || !buffer_is_zero(host, size)) { >>>> memset(host, ch, size); >>>> } >>>> } >>>> >>> This seems to be correct, functionally -- apparently buffer_is_zero() >>> has laxer size requirements than buffer_find_nonzero_offset(). However, >>> I think the latter might be faster. >>> >>> For ram_save_block() I guess the difference is negligible. But >>> ram_handle_compressed() is also called from "migration-rdma.c", where I >>> can't even guess if a little bit of slowdown would count. >>> >>> I'm OK with the patch if Michael (CC'd) is. >>> >>> Thanks >>> Laszlo >>> >> >> Thanks for the CC. >> >> Actually, it looks like buffer_is_zero() is calling >> buffer_find_nonzero_offset() >> as a "first try" anyway > > I have no idea how I managed to miss that. > >> - which is the same thing RDMA is doing. So, all >> calls to ram_handle_compressed() should hit the branch target there in >> buffer_is_zero() for can_use_buffer_find_nonzero_offset() and automatically >> drop into the vectorized-optimization to search for zeros, so there >> shouldn't >> be any change in performance. The same should apply for TCP migration >> as well - it's working on page-granularity, which is always aligned to >> 32 or 64 bits. >> >> Paolo? I see that some of the block-migration code and the qemu-img code >> is also >> calling buffer_is_zero() - are you guys depending on the performance of any >> buffer_is_zero() calls to use the vector-optimized version like >> migration does? > > The patch doesn't change buffer_is_zero() internally, so callers of > buffer_is_zero() are unaffected. The patch turns two indirect callers of > buffer_find_nonzero_offset() into two "differently indirect" callers of > the same (which I now see thanks to your explanation). Hence, > > Before: > ram_save_block -> is_zero_range -> buffer_find_nonzero_offset > ram_handle_compressed -> is_zero_range -> buffer_find_nonzero_offset Because of *static inline *is_zero_range, it may be like that: ram_save_block -> buffer_find_nonzero_offset ram_handle_compressed -> buffer_find_nonzero_offset But this affects little in my testing.
Best regards ChenLiang > After: > ram_save_block -> buffer_is_zero -> buffer_find_nonzero_offset > ram_handle_compressed -> buffer_is_zero -> buffer_find_nonzero_offset > > Reviewed-by: Laszlo Ersek <ler...@redhat.com> > > Thanks! > Laszlo