State what are the requeriments to get migration working between qemu
versions.  And once there explain how one is supposed to implement a
new feature/default value and not break migration.

Signed-off-by: Juan Quintela <quint...@redhat.com>

---

Hi

I will really appreciate reviews:

- I don't speak natively .rst format, so let me what I have done
  wrong.

- English is not my native language either (no points if had guessed
  that).

- This is stuff is obvious to me, so let me when I have assumed
  things, things that need to be claried, explained better, etc.

Thanks, Juan.
---
 docs/devel/migration.rst | 212 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 212 insertions(+)

diff --git a/docs/devel/migration.rst b/docs/devel/migration.rst
index 6f65c23b47..daa510da42 100644
--- a/docs/devel/migration.rst
+++ b/docs/devel/migration.rst
@@ -142,6 +142,218 @@ General advice for device developers
   may be different on the destination.  This can result in the
   device state being loaded into the wrong device.
 
+How backwards compatibility work
+--------------------------------
+
+When we do migration, we have to qemus: source and target qemu.  There
+are two cases, they are the same version or they are a different
+version.  The easy case is when they are the same version.  The
+difficult one is when they are different versions.
+
+There are two things that are different, but they have very similar
+names and sometimes get confused:
+- qemu version
+- machine version
+
+Let's start with a practical example, we start with:
+
+- qemu-system-x86_64 (v5.2), from now one qemu-5.2.
+- qemu-system-x86_64 (v5.1), from now one qemu-5.1.
+
+Related to this are the "latest" machine types defined on each of
+them:
+
+- pc-q35-5.2 (newer one in qemu-5.2) from now on pc-5.2
+- pc-q35-5.1 (newer one qemu-5.1) from now on pc-5.1
+
+First of all, migration is only supposed to work if you use the same
+machine type in both source and destination. The qemu configuration
+needs to be the same also on source and destination.  But that is not
+relevant for this section.
+
+I am going to list the number of combinations that we can have.  Let's
+start with the trivial ones, qemu is the same on source and
+destination:
+
+1 - qemu-5.2 -M pc-5.2  -> migrates to -> qemu-5.2 -M pc-5.2
+
+  This is the latest qemu with the latest machine type.
+  This have to work, and if it don't work it is a bug.
+
+2 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
+
+  Exactly the same case than the previous one, but for 5.1.
+  Nothing to see here either.
+
+This are the easiest ones, we will not talk more about them in this
+section.
+
+Now we start with the more interesting cases.  Let start with the
+same qemu but not the same machine type.
+
+3 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
+
+  It needs to use the definition of pc-5.1 and the devices as they
+  were configured on 5.1, but this should be easy in the sense that
+  both sides are the same qemu and both sides have exactly the same
+  idea of what the pc-5.1 machine is.
+
+4 - qemu-5.1 -M pc-5.2  -> migrates to -> qemu-5.1 -M pc-5.2
+
+  This combination is not possible as the qemu-5.1 don't understand
+  pc-5.2 machine type.  So nothing to worry here.
+
+Now it comes the interesting ones, when both qemus are different.
+Notice also that the machine type needs to be pc-5.1, because we have
+the limitation than qemu-5.1 don't know pc-5.2.  So the possible cases
+are:
+
+5 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
+
+  This migration is known as newer to older.  We need to make sure
+  when we are developing 5.2 we need to take care about not to break
+  migration to qemu-5.1.  Notice that we can't make updates to
+  qemu-5.1 to understand whatever qemu-5.2 decides to change, so it is
+  in qemu-5.2 side to make the relevant changes.
+
+6 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
+
+  This migration is known as older to newer.  We need to make sure
+  than we are able to receive migrations from qemu-5.1. The problem is
+  similar to the previous one.
+
+If qemu-5.1 and qemu-5.2 were the same, there will not be any
+compatibility problems.  But the reason that we create qemu-5.2 is to
+get new features, devices, defaults, etc.
+
+If we get a device that get a new feature, or change a default value,
+we have a problem when we try to migrate between different qemu
+versions.
+
+So we need a way to tell qemu-5.2 than when we are using machine type
+pc-5.1, it needs to **not** use the feature, to be able to migrate to
+read qemu-5.1.
+
+And the equivalent part when migrating from qemu-5.1 to qemu-5.2.
+qemu-5.2 have to expect that it is not going to get data for the new
+feature, because qemu-5.1 don't know about it.
+
+How do we tell qemu about this device feature changes?  In
+hw/core/machine.c:hw_compat_X_Y arrays.
+
+If we change a default value, we need to put back the old value on
+that array.  And the device, during initialization needs to look at
+that array to see what value it needs to get for that feature.  And
+what are we going to put on that array, the value of a property.
+
+To create a property for a device, we need to use one of the
+DEFINE_PROP_*() macros. See include/hw/qdev-properties.h to find the
+macros that exist.  With it, we set the default value for that
+property, and that is what it is going to get in the latest released
+version.  But if we want a different value for a previous version, we
+can change that in the hw_compat_X_Y arrays.
+
+hw_compat_X_Y is an array of registers that have the format:
+
+- name_device
+- name_property
+- value
+
+Let's see a practical example.
+
+In qemu-5.2 virtio-blk-device got multi queue support.  This is a
+change that is not backward compatible.  In qemu-5.1 it has one
+queue. In qemu-5.2 it has the same number of queues than the number of
+cpus in the system.
+
+When we are doing migration, if we migrate from a device that has 4
+queues to a device that have only one queue, we don't know where to
+put the extra information for the other 3 queues, and we fail
+migration.
+
+Similar problem when we migrate from qemu-5.1 that has only one queue
+to qemu-5.2, we only sent information for one queue, but destination
+has 4, and we have 3 queues that are not properly initialized and
+anything can happen.
+
+So, how can we address this problem.  Easy, just convince qemu-5.2
+that when it is running pc-5.1, it needs to set the number of queues
+for virtio-blk-devices to 1.
+
+That way we fix the cases 5 and 6.
+
+5 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.1 -M pc-5.1
+
+    qemu-5.2 -M pc-5.1 sets number of queues to be 1.
+    qemu-5.1 -M pc-5.1 expects number of queues to be 1.
+
+    correct.  migration works.
+
+6 - qemu-5.1 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
+
+    qemu-5.1 -M pc-5.1 sets number of queues to be 1.
+    qemu-5.2 -M pc-5.1 expects number of queues to be 1.
+
+    correct.  migration works.
+
+And now the other interesting case, case 3.  In this case we have:
+
+3 - qemu-5.2 -M pc-5.1  -> migrates to -> qemu-5.2 -M pc-5.1
+
+    Here we have the same qemu in both sides.  So it don't matter a
+    lot if we have setup the number of queues to 1 or not, because
+    they are the same.
+
+    WRONG!
+
+    Think what happens if we do one of this double migrations:
+
+    A -> migrates -> B -> migrates -> C
+
+    where:
+
+    A: qemu-5.1 -M pc-5.1
+    B: qemu-5.2 -M pc-5.1
+    C: qemu-5.2 -M pc-5.1
+
+    migration A -> B is case 6, so number of queues needs to be 1.
+
+    migration B -> C is case 3, so we don't care.  But actually we
+    care because we haven't started the guest in qemu-5.2, it came
+    migrated from qemu-5.1.  So to be in the safe place, we need to
+    always use number of queues 1 when we are using pc-5.1.
+
+Now, how was this done in reality?  The following commit shows how it
+was done.
+
+commit 9445e1e15e66c19e42bea942ba810db28052cd05
+Author: Stefan Hajnoczi <stefa...@redhat.com>
+Date:   Tue Aug 18 15:33:47 2020 +0100
+
+    virtio-blk-pci: default num_queues to -smp N
+
+The relevant parts for migration are:
+
+@@ -1281,7 +1284,8 @@ static Property virtio_blk_properties[] = {
+ #endif
+     DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
+                     true),
+-    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
++    DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues,
++                       VIRTIO_BLK_AUTO_NUM_QUEUES),
+     DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 256),
+
+It changes the default value of num_queues.  But it fishes it for old
+machine types to have the right value:
+
+@@ -31,6 +31,7 @@
+ GlobalProperty hw_compat_5_1[] = {
+     ...
++    { "virtio-blk-device", "num-queues", "1"},
+     ...
+ };
+
+
 VMState
 -------
 
-- 
2.40.1


Reply via email to