"you.chen" <you.c...@intel.com> wrote: > Add config and logics to use qatzip for page compression, in order to > support qatzip compression better, we collect multipe pages together > to do qatzip compression for best performance. > And we use compile option CONFIG_QATZIP to determine whether should qatzip > related code be compiled or not. > > Co-developed-by: dennis.wu <dennis...@intel.com> > Signed-off-by: you.chen <you.c...@intel.com>
two questions: a - why are you using this? I guess that because it is faster, but you are not telling. b - why are you using it with compression thread instead of multifd compression? probably just another method. [...] > MigrationState *s; > @@ -4451,6 +4470,8 @@ static Property migration_properties[] = { > DEFINE_PROP_UINT8("x-compress-threads", MigrationState, > parameters.compress_threads, > DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT), > + DEFINE_PROP_BOOL("x-compress-with-qat", MigrationState, > + parameters.compress_with_qat, false), > DEFINE_PROP_BOOL("x-compress-wait-thread", MigrationState, > parameters.compress_wait_thread, true), > DEFINE_PROP_UINT8("x-decompress-threads", MigrationState, > @@ -4580,6 +4601,7 @@ static void migration_instance_init(Object *obj) > params->has_compress_level = true; > params->has_compress_threads = true; > params->has_compress_wait_thread = true; > + params->has_compress_with_qat = true; > params->has_decompress_threads = true; > params->has_throttle_trigger_threshold = true; > params->has_cpu_throttle_initial = true; If you use it as a parameter, this bits are ok, but I still think that it will work better with a multifd compression method. > -#define IO_BUF_SIZE 32768 > +#ifdef CONFIG_QATZIP > +#define IO_BUF_SIZE (512 * KiB) > +#else > +#define IO_BUF_SIZE (32 * KiB) > +#endif 1st part that it is already weird O:-) I mean, we are supposed to not have bigger buffers. > +uint8_t *qemu_get_pos(QEMUFile *f) > +{ > + return f->buf + f->buf_index; > +} > + I know I shouldn't ask, but why do you need this. /me looks later in the patch. > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -117,6 +117,20 @@ static void __attribute__((constructor)) > init_cpu_flag(void) > > XBZRLECacheStats xbzrle_counters; > > +/* define the max page number to compress together */ > +#define MULTI_PAGE_NUM 64 You get 64 pages to compress with just using multifd, nothing else needed. > +#define COMP_BUF_SIZE (TARGET_PAGE_SIZE * MULTI_PAGE_NUM * 2) > +#define DECOMP_BUF_SIZE (TARGET_PAGE_SIZE * MULTI_PAGE_NUM) > + > +typedef struct MultiPageAddr { > + /* real pages that will compress together */ > + uint64_t pages; > + /* the last index of the addr*/ > + uint64_t last_idx; > + /* each address might contain contineous pages*/ > + uint64_t addr[MULTI_PAGE_NUM]; > +} MultiPageAddr; > + > /* used by the search for pages to send */ > struct PageSearchStatus { > /* The migration channel used for a specific host page */ > @@ -127,6 +141,12 @@ struct PageSearchStatus { > RAMBlock *block; > /* Current page to search from */ > unsigned long page; > + /* > + * multi page search from current page > + * for compress together with qatzip > + * stream APIs > + */ > + MultiPageAddr mpa; > /* Set once we wrap around */ > bool complete_round; > /* Whether we're sending a host page */ > @@ -506,6 +526,15 @@ struct CompressParam { > /* internally used fields */ > z_stream stream; > uint8_t *originbuf; > + > +#ifdef CONFIG_QATZIP > + /*multi page address for compression*/ > + MultiPageAddr mpa; > + QzSession_T qzsess; > + uint8_t *decompbuf; > + uint8_t *compbuf; > + /* QzStream_T qzstream; */ > +#endif > }; > typedef struct CompressParam CompressParam; > > @@ -518,6 +547,15 @@ struct DecompressParam { > uint8_t *compbuf; > int len; > z_stream stream; > + RAMBlock *block; > + > +#ifdef CONFIG_QATZIP Aside. I would expect to detect at configure time if QATZIP is available. > + /* decompress multi pages with qzlib*/ > + QzSession_T qzsess; > + /* QzStream_T qzstream; */ > + uint8_t *decompbuf; /* buffer after decompress */ The comment is wrong. You uses this variable on do_compress_ram_page, that has nothing to do with decompression. > + MultiPageAddr mpa; /* destination */ > +#endif > }; I am not going looking after this. We already have a mechanism to look for several pages. See multifd_queue_page(). We have to options here: - multifd_queue_page() is better, you need to generalize it and wrote this on top of that. - yours is better. Then you need to write multifd one on top of that. What I don't want is two different mechanisms to collect the next 64 dirty pages. I will even consider good that you _always_ search for multiple pages, let say 64 that are the ones that we have now in both multifd and your approach and you do a for loop in case that it is normal precopy. Another note, I am not a big far of: #ifdef CONFIG_QATZIP if (migrate_compress_with_qat()) { do_compress_ram_page_multiple(param->file, ¶m->qzsess, param->decompbuf, param->compbuf, block, ¶m->mpa); } else { #endif zero_page = do_compress_ram_page(param->file, ¶m->stream, block, offset, param->originbuf); param->zero_page = zero_page; #ifdef CONFIG_QATZIP } #endif Two ifdefs and an if mixed, ouch. Suggestions: - Just create a function that calls one or another. - You add another function call to migration_ops, and you add the proper function there during initialization. Another nitbit: - You are dropping the zero page optimization, you are just compressing it. - See how it is done on my zero pages on multifd on the list. a - we only sent the page address for a zero page (hard to win sending zero bytes for a page) b - I don't know how QzSession_T works, but I guess that it is going to be a loosy equivalent of how zlib works. Notice that compression code does: do_compress_ram_page() -> qemu_put_compression_data() -> ... static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len, const uint8_t *source, size_t source_len) { int err; err = deflateReset(stream); if (err != Z_OK) { return -1; } stream->avail_in = source_len; stream->next_in = (uint8_t *)source; stream->avail_out = dest_len; stream->next_out = dest; err = deflate(stream, Z_FINISH); if (err != Z_STREAM_END) { return -1; } return stream->next_out - dest; } I can't see the equivalent of deflateReset on your code (remember that I don't understand qat). Rememeber that the number of threads for compression and decompression can be different, so it is not necessary that you have the same dicctionaries on destination on the thread that you are using. That is one of the reasons why multifd-zlib is much, much better at compression that compression threads. a- it uses 64pages at a time b- as recv thread is paired with the same send thread, we don't need to reset the dictionaries, and that makes things much, much better. I thought it was a bug the 1st time that I saw that it compressed 64 pages to less than 64bytes O:-) Later, Juan.