On 05/29/2012 02:52 PM, Paolo Bonzini wrote:
Does the drive-mirror coroutine send the writes to the target in the
same order as they are sent to the source? I assume so.
No, it doesn't. It's asynchronous; for continuous replication, the
target knows that it has a consistent view whenever it sees a flush on
the NBD stream. Flushing the target is needed anyway before writing the
dirty bitmap, so the target might as well exploit them to get
information about the state of the source.
The target _must_ flush to disk when it receives a flush commands, not
matter how close they are. It _may_ choose to snapshot the disk, for
example establishing one new snapshot every 5 seconds.
Interesting. So it works quite differently than i had assumed. Some
follow-up questions hope you don't mind...
* I assume a flush roughly corresponds to an fsync() in the guest OS?
Or is it more frequent than that?
* Writes will not be re-ordered over a flush boundary, right?
A synchronous implementation is not forbidden by the spec (by design),
but at the moment it's a bit more complex to implement because, as you
mention, it requires buffering the I/O data on the host.
So if i understand correctly, you'd only be keeping a list of
(len, offset) tuples without any data, and drive-mirror then reads the
data from the disk image? If that is the case how do you handle a flush?
Does a flush need to wait for drive-mirror to drain the entire outgoing
queue to the target before it can complete? If not how do prevent writes
that happen after a flush from overwriting the data that will be sent to
the target in case that hasn't reached the flush point yet.
If so, that could have significant performance impact on the guest.
After the copy phase is done, in order to avoid race conditions, the
bitmap should be reset and mirroring should start directly and
atomically. Is that currently handed by your design?
Yes, this is already all correct.
OK, i think i was confused by your description of "drive-mirror" in the
wiki. It says that starts mirroring, but what it also does is that it
copies the source to the target before it does that. It is clear from
the description of the "sync" option though.
Thanks,
Geert