On 2015/7/30 19:56, Dr. David Alan Gilbert wrote: > * Jason Wang (jasow...@redhat.com) wrote: >> >> >> On 07/30/2015 04:03 PM, Dr. David Alan Gilbert wrote: >>> * Dong, Eddie (eddie.d...@intel.com) wrote: >>>>>> A question here, the packet comparing may be very tricky. For example, >>>>>> some protocol use random data to generate unpredictable id or >>>>>> something else. One example is ipv6_select_ident() in Linux. So COLO >>>>>> needs a mechanism to make sure PVM and SVM can generate same random >>>>> data? >>>>> Good question, the random data connection is a big problem for COLO. At >>>>> present, it will trigger checkpoint processing because of the different >>>>> random >>>>> data. >>>>> I don't think any mechanisms can assure two different machines generate >>>>> the >>>>> same random data. If you have any ideas, pls tell us :) >>>>> >>>>> Frequent checkpoint can handle this scenario, but maybe will cause the >>>>> performance poor. :( >>>>> >>>> The assumption is that, after VM checkpoint, SVM and PVM have identical >>>> internal state, so the pattern used to generate random data has high >>>> possibility to generate identical data at short time, at least... >>> They do diverge pretty quickly though; I have simple examples which >>> reliably cause a checkpoint because of simple randomness in applications. >>> >>> Dave >>> >> >> And it will become even worse if hwrng is used in guest. > > Yes; it seems quite application dependent; (on IPv4) an ssh connection, > once established, tends to work well without triggering checkpoints; > and static web pages also work well. Examples of things that do cause > more checkpoints are, displaying guest statistics (e.g. running top > in that ssh) which is timing dependent, and dynamically generated > web pages that include a unique ID (bugzilla's password reset link in > it's front page was a fun one), I think also establishing > new encrypted connections cause the same randomness. > > However, it's worth remembering that COLO is trying to reduce the > number of checkpoints compared to a simple checkpointing world > which would be aiming to do a checkpoint ~100 times a second, > and for compute bound workloads, or ones that don't expose > the randomness that much, it can get checkpoints of a few seconds > in length which greatly reduces the overhead. >
Yes. That's the truth. We can set two different modes for different scenarios. Maybe Named 1) frequent checkpoint mode for multi-connections and randomness scenarios and 2) non-frequent checkpoint mode for other scenarios. But that's the next plan, we are thinking about that. Regards, -Gonglei