On Mon, May 30, 2022 at 08:07:35PM +0300, Avihai Horon wrote: > +/* Returns 1 if end-of-stream is reached, 0 if more data and -1 if error */ > +static int vfio_save_block(QEMUFile *f, VFIOMigration *migration) > +{ > + ssize_t data_size; > + > + data_size = read(migration->data_fd, migration->data_buffer, > + migration->data_buffer_size); > + if (data_size < 0) { > + return -1; > + } > + if (data_size == 0) { > + return 1; > + } > + > + qemu_put_be64(f, VFIO_MIG_FLAG_DEV_DATA_STATE); > + qemu_put_be64(f, data_size); > + qemu_put_buffer_async(f, migration->data_buffer, data_size, false); > + qemu_fflush(f); > + bytes_transferred += data_size; > + > + trace_vfio_save_block(migration->vbasedev->name, data_size); > + > + return qemu_file_get_error(f); > +}
We looked at this from an eye to "how much data is transfered" per callback. The above function is the basic data mover, and 'migration->data_buffer_size' is set to 1MB at the moment. So, we product up to 1MB VFIO_MIG_FLAG_DEV_DATA_STATE sections. This series does not include the precopy support, but that will include a precopy 'save_live_iterate' function like this: static int vfio_save_iterate(QEMUFile *f, void *opaque) { VFIODevice *vbasedev = opaque; VFIOMigration *migration = vbasedev->migration; int ret; ret = vfio_save_block(f, migration); if (ret < 0) { return ret; } if (ret == 1) { return 1; } qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE); return 0; } Thus, during precopy this will never do more than 1MB per callback. > +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) > +{ > + VFIODevice *vbasedev = opaque; > + enum vfio_device_mig_state recover_state; > + int ret; > + > + /* We reach here with device state STOP or STOP_COPY only */ > + recover_state = VFIO_DEVICE_STATE_STOP; > + ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY, > + recover_state); > + if (ret) { > + return ret; > + } > + > + do { > + ret = vfio_save_block(f, vbasedev->migration); > + if (ret < 0) { > + return ret; > + } > + } while (!ret); This seems to be the main problem where we chain together 1MB blocks until the entire completed precopy data is completed. The above is hooked to 'save_live_complete_precopy' So, if we want to break the above up into some 'save_iterate' like function, do you have some advice how to do it? The above do/while must happen after the VFIO_DEVICE_STATE_STOP_COPY. For mlx5 the above loop will often be ~10MB's for small VMs and 100MB's for big VMs (big meaning making extensive use of RDMA functionality), and this will not change with pre-copy support or not. Is it still a problem? For other devices, like a GPU, I would imagine pre-copy support is implemented and this will be a smaller post-precopy residual. Jason