On Wed, Apr 20, 2016 at 04:44:28PM +0200, Juan Quintela wrote: > Hi > > This patch series is "an" initial implementation of multiple fd migration. > This is to get something out for others to comment, it is not finished at all. > > So far: > > - we create threads for each new fd > > - only for tcp of course, rest of transports are out of luck > I need to integrate this with daniel channel changes > > - I *think* the locking is right, at least I don't get more random > lookups (and yes, it was not trivial). And yes, I think that the > compression code locking is not completely correct. I think it > would be much, much better to do the compression code on top of this > (will avoid a lot of copies), but I need to finish this first. > > - Last patch, I add a BIG hack to try to know what the real bandwidth > is. > > > Preleminar testing so far: > > - quite good, the latency is much better, but was change so far, I > think I found the problem for the random high latencies, but more > testing is needed. > > - under load, I think our bandwidth calculations are *not* completely > correct (This is the way to spell it to be allowed for a family audience). > > > ToDo list: > - bandwidth calculation: I am going to send another mail > with my ToDo list for migration, see there. > > - stats: We need better stats, by thread, etc > > - sincronize less times with the worker threads. > right now we syncronize for each page, there are two obvious optimizations > * send a list of pages each time we wakeup an fd > * if we have to sent a HUGE page, dont' do a single split, just sent the > whole page > in one send() and read things with a single recv() on destination. > My understanding is that this would make Transparent Huge pages trivial. > - measure things under bigger loads > > Comments, please?
Nice to see this take shape. There's something that looks suspicious from quick look at the patches: - imagine that the same page gets transmitted on two sockets on first, then on second one - it's possible that the second update is received and handled on destination before the first one Note: you do make sure a single thread sends data for a page at a time, but that does not seem to affect the order in which it's received. In that case, I suspect the first one will overwrite the page with stale data. A simple fix would be to change static int multifd_send_page(uint8_t *address) to calculate the fd based on address. E.g. (long)address/PAGE_SIZE % thread_count. Or split memory between threads in some other way. HTH > Later, Juan. > > Juan Quintela (13): > migration: create Migration Incoming State at init time > migration: Pass TCP args in an struct > migration: [HACK] Don't create decompression threads if not enabled > migration: Add multifd capability > migration: Create x-multifd-threads parameter > migration: create multifd migration threads > migration: Start of multiple fd work > migration: create ram_multifd_page > migration: Create thread infrastructure for multifd send side > migration: Send the fd number which we are going to use for this page > migration: Create thread infrastructure for multifd recv side > migration: Test new fd infrastructure > migration: [HACK]Transfer pages over new channels > > hmp.c | 10 ++ > include/migration/migration.h | 13 ++ > migration/migration.c | 100 ++++++++---- > migration/ram.c | 350 > +++++++++++++++++++++++++++++++++++++++++- > migration/savevm.c | 3 +- > migration/tcp.c | 76 ++++++++- > qapi-schema.json | 29 +++- > 7 files changed, 540 insertions(+), 41 deletions(-) > > -- > 2.5.5 >