cc'ing in a couple of the COLOers. * Michael R. Hines (mrhi...@linux.vnet.ibm.com) wrote: > On 08/13/2014 10:03 PM, Walid Nouri wrote: > > > >While looking to find some ideas for approaches to replicating block > >devices I have read the paper about the Remus implementation. I think MC > >can take a similar approach for local disk. > > > > I agree. > > >Here are the main facts that I have understood: > > > >Local disk contents is viewed as internal state the primary and secondary. > >In the explanation they describe that for keeping disc semantics of the > >primary and to allow the primary to run speculatively all disc state > >changes are directly written to the disk. In parrallel and asynchronously > >send to the secondary. The secondary keeps the pending writing requests in > >two disk buffers. A speculation-disk-buffer and a write-out-buffer. > > > >After the reception of the next checkpoint the secondary copies the > >speculation buffer to the write out buffer, commits the checkpoint and > >applies the write out buffer to its local disk. > > > >When the primary fails the secondary must wait until write-out-buffer has > >been completely written to disk before before changing the execution mode > >to run as primary. In this case (failure of primary) the secondary > >discards pending disk writes in its speculation buffer. This protocol > >keeps the disc state consistent with the last checkpoint. > > > >Remus uses the XEN specific blktap driver. As far as I know this can?t be > >used with QEMU (KVM). > > > >I must see how drive-mirror can be used for this kind of protocol. > > > > That's all correct. Theoretically, we would do exactly the same thing: > drive-mirror on the source would write immediately to disk but follow the > same commit semantics on the destination as Xen. > > > > >I have taken a look at COLO. > > > > >IMHO there are two points. Custom changes of the TCP-Stack are a no-go for > >proprietary operating systems like Windows. It makes COLO application > >agnostic but not operating system agnostic. The other point is that with > >I/O intensive workloads COLO will tend to behave like MC. This is my point > >of view but i didn?t invest much time to understand everything in detail. > > > > Actually, if I remember correctly, the TCP stack is only modified at the > hypervisor level - they are intercepting and translating TCP sequence > numbers "in-flight" to detect divergence of the source and destination - > which is not a big problem if the implementation is well-done.
The 2013 paper says: 'COLO modifies the guest OSâs TCP/IP stack in order to make the behavior more deterministic. ' but does say that an alternative might be to have a ' comparison function that operates transparently over re-assembled TCP streams' > My hope in the future was that the two approaches could be used in a > "Hybrid" manner - actually MC has much more of a performance hit for I/O > than COLO does because of its buffering requirements. > > On the other hand, MC would perform better in a memory-intensive or > CPU-intensive situation - so maybe QEMU could "switch" between the two > mechanisms at different points in time when the resource bottleneck changes. If the primary were to rate-limit the number of resynchronisations (and send the secondary a message as soon as it knew a resync was needed) that would get some of the way, but then the only difference from microcheckpointing at that point is the secondary doing a wasteful copy and sending the packets across; it seems it should be easy to disable those if it knew that a resync was going to happen. Dave > - Michael > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK