Paolo Bonzini <pbonz...@redhat.com> wrote: > On 17/03/2017 20:36, Dr. David Alan Gilbert wrote: >> * Paolo Bonzini (pbonz...@redhat.com) wrote: >>> On 17/03/2017 14:02, Dr. David Alan Gilbert wrote: >>>>>> case RAM_SAVE_FLAG_MULTIFD_PAGE: >>>>>> fd_num = qemu_get_be16(f); >>>>>> - if (fd_num != 0) { >>>>>> - /* this is yet an unused variable, changed later */ >>>>>> - fd_num = fd_num; >>>>>> - } >>>>>> + multifd_recv_page(host, fd_num); >>>>>> qemu_get_buffer(f, host, TARGET_PAGE_SIZE); >>>>>> break; >>>>> I still believe this design is a mistake. >>>> Is it a use of a separate FD carrying all of the flags/addresses that >>>> you object to? >>> >>> Yes, it introduces a serialization point unnecessarily, and I don't >>> believe the rationale that Juan offered was strong enough. >>> >>> This is certainly true on the receive side, but serialization is not >>> even necessary on the send side. >> >> Is there an easy way to benchmark it (without writing both) to figure >> out if sending (word) (page) on one fd is less efficient than sending >> two fd's with the pages and words separate? > > I think it shouldn't be hard to write a version which keeps the central > distributor but puts the metadata in the auxiliary fds too.
That is not difficult to do (famous last words). I will try to test both approachs for next version, thanks. > > But I think what matters is not efficiency, but rather being more > forward-proof. Besides liberty of changing implementation, Juan's > current code simply has no commands in auxiliary file descriptors, which > can be very limiting. > > Paolo > >>> Multiple threads can efficiently split >>> the work among themselves and visit the dirty bitmap without a central >>> distributor. >> >> I mostly agree; I kind of fancy the idea of having one per NUMA node; >> but a central distributor might be a good idea anyway in the cases >> where you find the heavy-writer all happens to be in the same area. >> >>> >>> I need to study the code more to understand another issue. Say you have >>> a page that is sent to two different threads in two different >>> iterations, like >>> >>> thread 1 >>> iteration 1: pages 3, 7 >>> thread 2 >>> iteration 1: page 3 >>> iteration 2: page 7 >>> >>> Does the code ensure that all threads wait at the end of an iteration? >>> Otherwise, thread 2 could process page 7 from iteration 2 before or >>> while thread 1 processes the same page from iteration 1. >> >> I think there's a sync at the end of each iteration on Juan's current code >> that stops that. This can't happen by design. We sync all threads at the end of each migration. Later, Juan.