On 1/11/2021 1:04 PM, Keqian Zhu wrote:
For now the switch of vfio dirty page tracking is integrated into
the vfio_save_handler, it causes some problems [1].
Sorry, I missed [1] mail, somehow it didn't landed in my inbox.
The object of dirty tracking is guest memory, but the object of
the vfio_save_handler is device state. This mixed logic produces
unnecessary coupling and conflicts:
1. Coupling: Their saving granule is different (perVM vs perDevice).
vfio will enable dirty_page_tracking for each devices, actually
once is enough.
That's correct, enabling dirty page tracking once is enough. But
log_start and log_stop gets called on address space update transaction,
region_add() or region_del(), at this point migration may not be active.
We don't want to allocate bitmap memory in kernel for lifetime of VM,
without knowing migration will be happen or not. vfio_iommu_type1 module
should allocate bitmap memory only while migration is active.
Paolo's suggestion here to use log_global_start and log_global_stop
callbacks seems correct here. But at this point vfio device state is not
yet changed to |_SAVING as you had identified it in [1]. May be we can
start tracking bitmap in iommu_type1 module while device is not yet
_SAVING, but getting dirty bitmap while device is yet not in
_SAVING|_RUNNING state doesn't seem optimal solution.
Pasting here your question from [1]
> Before start dirty tracking, we will check and ensure that the device
> is at _SAVING state and return error otherwise. But the question is
> that what is the rationale? Why does the VFIO_IOMMU_DIRTY_PAGES
> ioctl have something to do with the device state?
Lets walk through the types of devices we are supporting:
1. mdev devices without IOMMU backed device
Vendor driver pins pages as and when required during runtime. We can
say that vendor driver is smart which identifies the pages to pin. We
are good here.
2. mdev device with IOMMU backed device
This is similar to vfio-pci, direct assigned device, where all pages
are pinned at VM bootup. Vendor driver is not smart, so bitmap query
will report all pages dirty always. If --auto-converge is not set, VM
stucks infinitely in pre-copy phase. This is known to us.
3. mdev device with IOMMU backed device with smart vendor driver
In this case as well all pages are pinned at VM bootup, but vendor
driver is smart to identify the pages and pin them explicitly.
Pages can be pinned anytime, i.e. during normal VM runtime or on setting
_SAVING flag (entering pre-copy phase) or while in iterative pre-copy
phase. There is no restriction based on these phases for calling
vfio_pin_pages(). Vendor driver can start pinning pages based on its
device state when _SAVING flag is set. In that case, if dirty bitmap is
queried before that then it will report all sysmem as dirty with an
unnecessary copy of sysmem.
As an optimal solution, I think its better to query bitmap only after
all vfio devices are in pre-copy phase, i.e. after _SAVING flag is set.
2. Conflicts: The ram_save_setup() traverses all memory_listeners
to execute their log_start() and log_sync() hooks to get the
first round dirty bitmap, which is used by the bulk stage of
ram saving. However, it can't get dirty bitmap from vfio, as
@savevm_ram_handlers is registered before @vfio_save_handler.
Right, but it can get dirty bitmap from vfio device in it's iterative
callback
ram_save_pending ->
migration_bitmap_sync_precopy() .. ->
vfio_listerner_log_sync
Thanks,
Kirti
Move the switch of vfio dirty_page_tracking into vfio_memory_listener
can solve above problems. Besides, Do not require devices in SAVING
state for vfio_sync_dirty_bitmap().
[1] https://www.spinics.net/lists/kvm/msg229967.html
Reported-by: Zenghui Yu <yuzeng...@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqi...@huawei.com>
---
hw/vfio/common.c | 53 +++++++++++++++++++++++++++++++++++++--------
hw/vfio/migration.c | 35 ------------------------------
2 files changed, 44 insertions(+), 44 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6ff1daa763..9128cd7ee1 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -311,7 +311,7 @@ bool vfio_mig_active(void)
return true;
}
-static bool vfio_devices_all_saving(VFIOContainer *container)
+static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
{
VFIOGroup *group;
VFIODevice *vbasedev;
@@ -329,13 +329,8 @@ static bool vfio_devices_all_saving(VFIOContainer
*container)
return false;
}
- if (migration->device_state & VFIO_DEVICE_STATE_SAVING) {
- if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
- && (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
- return false;
- }
- continue;
- } else {
+ if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
+ && (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
return false;
}
}
@@ -987,6 +982,44 @@ static void vfio_listener_region_del(MemoryListener
*listener,
}
}
+static void vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
+{
+ int ret;
+ struct vfio_iommu_type1_dirty_bitmap dirty = {
+ .argsz = sizeof(dirty),
+ };
+
+ if (start) {
+ dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
+ } else {
+ dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
+ }
+
+ ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
+ if (ret) {
+ error_report("Failed to set dirty tracking flag 0x%x errno: %d",
+ dirty.flags, errno);
+ }
+}
+
+static void vfio_listener_log_start(MemoryListener *listener,
+ MemoryRegionSection *section,
+ int old, int new)
+{
+ VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+
+ vfio_set_dirty_page_tracking(container, true);
+}
+
+static void vfio_listener_log_stop(MemoryListener *listener,
+ MemoryRegionSection *section,
+ int old, int new)
+{
+ VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+
+ vfio_set_dirty_page_tracking(container, false);
+}
+
static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
uint64_t size, ram_addr_t ram_addr)
{
@@ -1128,7 +1161,7 @@ static void vfio_listerner_log_sync(MemoryListener
*listener,
return;
}
- if (vfio_devices_all_saving(container)) {
+ if (vfio_devices_all_dirty_tracking(container)) {
vfio_sync_dirty_bitmap(container, section);
}
}
@@ -1136,6 +1169,8 @@ static void vfio_listerner_log_sync(MemoryListener
*listener,
static const MemoryListener vfio_memory_listener = {
.region_add = vfio_listener_region_add,
.region_del = vfio_listener_region_del,
+ .log_start = vfio_listener_log_start,
+ .log_stop = vfio_listener_log_stop,
.log_sync = vfio_listerner_log_sync,
};
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 00daa50ed8..c0f646823a 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -395,40 +395,10 @@ static int vfio_load_device_config_state(QEMUFile *f,
void *opaque)
return qemu_file_get_error(f);
}
-static int vfio_set_dirty_page_tracking(VFIODevice *vbasedev, bool start)
-{
- int ret;
- VFIOMigration *migration = vbasedev->migration;
- VFIOContainer *container = vbasedev->group->container;
- struct vfio_iommu_type1_dirty_bitmap dirty = {
- .argsz = sizeof(dirty),
- };
-
- if (start) {
- if (migration->device_state & VFIO_DEVICE_STATE_SAVING) {
- dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
- } else {
- return -EINVAL;
- }
- } else {
- dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
- }
-
- ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty);
- if (ret) {
- error_report("Failed to set dirty tracking flag 0x%x errno: %d",
- dirty.flags, errno);
- return -errno;
- }
- return ret;
-}
-
static void vfio_migration_cleanup(VFIODevice *vbasedev)
{
VFIOMigration *migration = vbasedev->migration;
- vfio_set_dirty_page_tracking(vbasedev, false);
-
if (migration->region.mmaps) {
vfio_region_unmap(&migration->region);
}
@@ -469,11 +439,6 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
return ret;
}
- ret = vfio_set_dirty_page_tracking(vbasedev, true);
- if (ret) {
- return ret;
- }
-
qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
ret = qemu_file_get_error(f);