* Jason Wang (jasow...@redhat.com) wrote: > > > On 11/10/2015 05:41 PM, Dr. David Alan Gilbert wrote: > > * Jason Wang (jasow...@redhat.com) wrote: > >> > >> On 11/10/2015 01:26 PM, Tkid wrote: > >>> Hi,all > >>> > >>> We are planning to reimplement colo proxy in userspace (Here is in > >>> qemu) to > >>> cache and compare net packets.This module is one of the important > >>> components > >>> of COLO project and now it is still in early stage, so any comments and > >>> feedback are warmly welcomed,thanks in advance. > >>> > >>> ## Background > >>> COLO FT/HA (COarse-grain LOck-stepping Virtual Machines for Non-stop > >>> Service) > >>> project is a high availability solution. Both Primary VM (PVM) and > >>> Secondary VM > >>> (SVM) run in parallel. They receive the same request from client, and > >>> generate > >>> responses in parallel too. If the response packets from PVM and SVM are > >>> identical, they are released immediately. Otherwise, a VM checkpoint > >>> (on demand) > >>> is conducted. > >>> Paper: > >>> http://www.socc2013.org/home/program/a3-dong.pdf?attredirects=0 > >>> COLO on Xen: > >>> http://wiki.xen.org/wiki/COLO_-_Coarse_Grain_Lock_Stepping > >>> COLO on Qemu/KVM: > >>> http://wiki.qemu.org/Features/COLO > >>> > >>> By the needs of capturing response packets from PVM and SVM and > >>> finding out > >>> whether they are identical, we introduce a new module to qemu > >>> networking called > >>> colo-proxy. > >>> > >>> This document describes the design of the colo-proxy module > >>> > >>> ## Glossary > >>> PVM - Primary VM, which provides services to clients. > >>> SVM - Secondary VM, a hot standby and replication of PVM. > >>> PN - Primary Node, the host which PVM runs on > >>> SN - Secondary Node, the host which SVM runs on > >>> > >>> ## Our Idea ## > >>> > >>> COLO-Proxy > >>> COLO-Proxy is a part of COLO,based on qemu net filter and it's a > >>> plugin for > >>> qemu net filter.the function keep SVM connect normal to PVM and compare > >>> PVM's packets to SVM's packets.if difference,notify COLO do checkpoint. > >>> > >>> == Workflow == > >>> > >>> > >>> +--+ +--+ > >>> |PN| |SN| > >>> +-----------------------+ +-----------------------+ > >>> | +-------------------+ | | +-------------------+ | > >>> | | | | | | | | > >>> | | PVM | | | | SVM | | > >>> | | | | | | | | > >>> | +--+-^--------------+ | | +-------------^----++ | > >>> | | | | | | | | > >>> | | | +------------+ | | +-----------+ | | | > >>> | | | | COLO | | (socket) | | COLO | | | | > >>> | | | | CheckPoint +---------------------> CheckPoint| | | | > >>> | | | | | | (6) | | | | | | > >>> | | | +-----^------+ | | +-----------+ | | | > >>> | | | (5) | | | | | | > >>> | | | | | | | | | > >>> | +--v-+--------------+ | Forward(socket) | +-------------+----v+ | > >>> | |COLO Proxy | +-------+(1)+--------->seq&ack adjust(2)| | | > >>> | | +-----+------+ | | +-----------------+ | | > >>> | | | Compare(4) <-------+(3)+---------+ COLO Proxy | | > >>> | +-------------------+ | Forward(socket) | +-------------------+ | > >>> ++Qemu+-----------------+ ++Qemu+-----------------+ > >>> | ^ > >>> | | > >>> | | > >>> +--------v-+--------+ > >>> | | > >>> | Client | > >>> | | > >>> +-------------------+ > >>> > >>> > >>> > >>> > >>> (1)When PN receive client packets,PN COLO-Proxy copy and forward > >>> packets to > >>> SN COLO-Proxy. > >>> (2)SN COLO-Proxy record PVM's packet inital seq & adjust client's ack,send > >>> adjusted packets to SVM > >>> (3)SN Qemu COLO-Proxy recieve SVM's packets and forward to PN Qemu > >>> COLO-Proxy. > >>> (4)PN Qemu COLO-Proxy enqueue SVM's packets and enqueue PVM's packets,then > >>> compare PVM's packets data with SVM's packets data. If packets is > >>> different, compare > >>> module notify COLO CheckPoint module to do a checkpoint then send > >>> PVM's packets to > >>> client and drop SVM's packets, otherwise, just send PVM's packets to > >>> client and > >>> drop SVM's packets. > >>> (5)notify COLO-Checkpoint module checkpoint is needed > >>> (6)Do COLO-Checkpoint > >>> > >>> ### QEMU space TCP/IP stack(Based on SLIRP) ### > >>> We need a QEMU space TCP/IP stack to help us to analysis packet. After > >>> looking > >>> into QEMU, we found that SLIRP > >>> > >>> http://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29 > >>> > >>> is a good choice for us. SLIRP proivdes a full TCP/IP stack within > >>> QEMU, it can > >>> help use to handle the packet written to/read from backend(tap) device > >>> which is > >>> just like a link layer(L2) packet. > >>> > >>> ### Packet enqueue and compare ### > >>> Together with QEMU space TCP/IP stack, we enqueue all packets sent by > >>> PVM and > >>> SVM on Primary QEMU, and then compare the packet payload for each > >>> connection. > >>> > >> Hi: > >> > >> Just have the following questions in my mind (some has been raised in > >> the previous rounds of discussion without a conclusion): > >> > >> - What's the plan for management layer? The setup seems complicated so > >> we could not simply depend on user to do each step. (And for security > >> reason, qemu was usually run as unprivileged user) > > It's certainly easier than the current COLO code that relies on a very > > complex set of bridges, extra network interfaces and kernel modules. > > UMU (cc'd) have been working on a libvirt set that starts COLO up, although > > one bit that's very messy is the curretn kernel based network comparison > > code. > > Ok. > > >> - What's the plan for vhost? Userspace network in qemu is rather slow, > >> most user will choose vhost. > >> - What if application generate packet based on hwrng device? This will > >> produce always different packets. > > Yes, there are cases this happens - COLO's worst case is similar to simple > > checkpointing (because it has a limit to the smallest checkpoint), but it's > > best case is much better, on a compute heavy load, it ends up taking > > a checkpoint very rarely. > > Actually the big problem is where randomness occurs in unexpected places, > > e.g. where things like Perl's hash randomisation means that the two > > hosts produce the same data in different orders. > > Not familiar with this, but unlike the hwrng, if the random data was > computed by software, after a synchronization, it still has the > possibility to produce the same result for a while.
The hwrng, variation in tsc or anything else that feeds the entropy pool can do cause the divergence. > >> - Not sure SLIRP is perfect matched for this task. As has been raised, > >> another method is to decouple the packet comparing from qemu. In this > >> way, lots of open source userspace stack could be used. > >> - Haven't read the code of packet comparing, but if it needs to keep > >> track the state of each connection, it could be easily DOS from guest. > > The guest can only break it's own networking; so shooting itself in the foot > > is no big deal. > > > > Dave > > The question is for the packet comparing, if the number of connections > in guest exceed the maximum connections it could track, what will it do? The slow case choice is to just force simple checkpointing mode. However, I don't think this case is any different from the kernel's stateful connection tracking used in iptables for firewalling. Dave > > > > >> Thanks > > -- > > Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK > > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK