On 01/22/2016 11:28 AM, Wen Congyang wrote: > On 01/22/2016 11:15 AM, Jason Wang wrote: >> >> On 01/20/2016 06:30 PM, Wen Congyang wrote: >>> On 01/20/2016 06:19 PM, Jason Wang wrote: >>>>> >>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote: >>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote: >>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote: >>>>>>>>>>>>> Sure. >>>>>>>>>>>>> >>>>>>>>>>>>> Two main comments/suggestions: >>>>>>>>>>>>> >>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a >>>>>>>>>>>>> git tree >>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of >>>>>>>>>>>>> the >>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss). >>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's >>>>>>>>>>>>> better to >>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea >>>>>>>>>>>>> is: >>>>>>>>>>>>> >>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've >>>>>>>>>>>>> achieved >>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the >>>>>>>>>>>>> thread >>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to >>>>>>>>>>>>> be >>>>>>>>>>>>> reused by other kinds of dataplane. >>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter. >>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter. >>>>>>>>>>>>> >>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did: >>>>>>>>>>>>> - mirror ingress traffic to secondary node >>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread >>>>>>>>>>>>> >>>>>>>>>>>>> And in secondadry node, you need two filters: >>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number. >>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as >>>>>>>>>>>>> ingress >>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could >>>>>>>>>>>>> be >>>>>>>>>>>>> polled by remote packet comparing thread. >>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> zhangchen >>>>>>>>>>> Hi, Jason. >>>>>>>>>>> We consider your suggestion to split/decouple >>>>>>>>>>> the reusable parts from the codes. >>>>>>>>>>> Due to filter plugin are traversed one by one in order >>>>>>>>>>> we will split colo-proxy to three filters in each side. >>>>>>>>>>> >>>>>>>>>>> But in this plan,primary and secondary both have socket >>>>>>>>>>> server,startup is a problem. >>>>>>>>> I believe this issue could be solved by reusing socket chardev. >>>>>>>>> >>>>>>>>>>> Primary qemu >>>>>>>>>>> Secondary qemu >>>>>>>>>>> +----------------------------------------------------------+ >>>>>>>>>>> +-----------------------------------------------------------+ >>>>>>>>>>> | +-----------------------------------------------------+ | >>>>>>>>>>> | >>>>>>>>>>> +------------------------------------------------------+ | >>>>>>>>>>> | | | | >>>>>>>>>>> | >>>>>>>>>>> | | | >>>>>>>>>>> | | guest | | >>>>>>>>>>> | >>>>>>>>>>> | guest | | >>>>>>>>>>> | | | | >>>>>>>>>>> | >>>>>>>>>>> | | | >>>>>>>>>>> | +-----------^--------------+--------------------------+ | >>>>>>>>>>> | >>>>>>>>>>> +---------------------+--------+-----------------------+ | >>>>>>>>>>> | | | | >>>>>>>>>>> | ^ | | >>>>>>>>>>> | | | | >>>>>>>>>>> | | | | >>>>>>>>>>> | +-------------------------------------------------+ >>>>>>>>>>> | | | | >>>>>>>>>>> | netfilter | | | | >>>>>>>>>>> | >>>>>>>>>>> netfilter | | | >>>>>>>>>>> | +-----------------------------------------------------+ | | >>>>>>>>>>> | >>>>>>>>>>> +------------------------------------------------------+ | >>>>>>>>>>> | | | | filter excute order | | | >>>>>>>>>>> | >>>>>>>>>>> | | | filter excute order | | >>>>>>>>>>> | | | | +-------------------> | | | >>>>>>>>>>> | >>>>>>>>>>> | | | +-------------------> | | >>>>>>>>>>> | | | | | | | >>>>>>>>>>> | >>>>>>>>>>> | | | TCP | | >>>>>>>>>>> | | +---------+-+ +------v-----+ +----+ +-----+ | | | >>>>>>>>>>> | >>>>>>>>>>> | +-----------+ +---+----+---v+rewriter+ +--------+ | | >>>>>>>>>>> | | | | | | | | | | | >>>>>>>>>>> | >>>>>>>>>>> | | | | | | | | | | >>>>>>>>>>> | | | mirror | | redirect +----> compare | | | >>>>>>>>>>> +--------> mirror +---> adjust | adjust +-->redirect| | | >>>>>>>>>>> | | | client | | server | | | | | >>>>>>>>>>> | >>>>>>>>>>> | | server | | ack | seq | |client | | | >>>>>>>>>>> | | | | | | | | | | >>>>>>>>>>> | >>>>>>>>>>> | | | | | | | | | | >>>>>>>>>>> | | +----^------+ +----^-------+ +-----+------+ | | >>>>>>>>>>> | >>>>>>>>>>> | +-----------+ +--------+-------------+ +----+---+ | | >>>>>>>>>>> | | | tx | rx | rx | | >>>>>>>>>>> | >>>>>>>>>>> | tx all | rx | | >>>>>>>>>>> | +-----------------------------------------------------+ | >>>>>>>>>>> | >>>>>>>>>>> +------------------------------------------------------+ | >>>>>>>>>>> | | >>>>>>>>>>> +-------------------------------------------------------------------------------------------+ >>>>>>>>>>> >>>>>>>>>>> | >>>>>>>>>>> | | | | >>>>>>>>>>> | | >>>>>>>>>>> +----------------------------------------------------------+ >>>>>>>>>>> +-----------------------------------------------------------+ >>>>>>>>>>> | | >>>>>>>>>>> |guest receive |guest send >>>>>>>>>>> | | >>>>>>>>>>> +--------+------------------------------------v------------+ >>>>>>>>>>> | | >>>>>>>>>>> | | >>>>>>>>>>> | tap >>>>>>>>>>> | NOTE: filter direction is rx/tx/all >>>>>>>>>>> | >>>>>>>>>>> | rx:receive packets sent to the netdev >>>>>>>>>>> | >>>>>>>>>>> | tx:receive packets sent by the netdev >>>>>>>>>>> +----------------------------------------------------------+ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious >>>>>>>>> advantages: >>>>>>>>> >>>>>>>>> - make it can be reused by other dataplane (e.g vhost) >>>>>>>>> - secondary redirector could redirect rx to comparer on primary node >>>>>>>>> directly which simplify the design. >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> guest recv packet route >>>>>>>>>>> >>>>>>>>>>> primary >>>>>>>>>>> tap --> mirror client filter >>>>>>>>>>> mirror client will send packet to guest,at the >>>>>>>>>>> same time, copy and forward packet to secondary >>>>>>>>>>> mirror server. >>>>>>>>>>> >>>>>>>>>>> secondary >>>>>>>>>>> mirror server filter --> TCP rewriter >>>>>>>>>>> if recv packet is TCP packet,we will adjust ack >>>>>>>>>>> and update TCP checksum, then send to secondary >>>>>>>>>>> guest. else directly send to guest. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> guest send packet route >>>>>>>>>>> >>>>>>>>>>> primary >>>>>>>>>>> guest --> redirect server filter >>>>>>>>>>> redirect server filter recv primary guest packet >>>>>>>>>>> but do nothing, just pass to next filter. >>>>>>>>>>> >>>>>>>>>>> redirect server filter --> compare filter >>>>>>>>>>> compare filter recv primary guest packet then >>>>>>>>>>> waiting scondary redirect packet to compare it. >>>>>>>>>>> if packet same,send primary packet and clear secondary >>>>>>>>>>> packet, else send primary packet and do >>>>>>>>>>> checkpoint. >>>>>>>>>>> >>>>>>>>>>> secondary >>>>>>>>>>> guest --> TCP rewriter filter >>>>>>>>>>> if the packet is TCP packet,we will adjust seq >>>>>>>>>>> and update TCP checksum. then send it to >>>>>>>>>>> redirect client filter. else directly send to >>>>>>>>>>> redirect client filter. >>>>>>>>>>> >>>>>>>>>>> redirect client filter --> redirect server filter >>>>>>>>>>> forward packet to primary >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep >>>>>>>>>>> servicing >>>>>>>>>>> for the TCP connection which is established after the last >>>>>>>>>>> checkpoint。 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> How about this plan? >>>>>>>>> Sounds good. >>>>>>>>> >>>>>>>>> And there's indeed no need to differ client/server by reusing the >>>>>>>>> socket >>>>>>>>> chardev. E.g: >>>>>>>>> >>>>>>>>> In primary node: >>>>>>>>> >>>>>>>>> ... >>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait >>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait >>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait >>>>>>>>> -netdev tap,id=hn0 >>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0 >>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1 >>>>>>> Why mirrorer has indev? >>>>> >>>>> As I said in the previous mails. I would like to decouple packet >>>>> comparing from netfilter. You've already done most of this since the >>>>> comparing is done in an independent thread. So the indev here is to >>>>> mirror the packet sent by guest to the packet comparing thread. >>>>> >>>>>>> I think we can use traffic-redirector to do it. >>>>>>> The command line is: >>>>>>> -netdev tap,id=hn0 >>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0 >>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0 >>>>>>> -colo-comparer >>>>>>> primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0 >>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send >>>>>>> out the packet. >>>>>>> >>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0. >>>>> It depends on whether or not packet comparing was done in a net filter >>>>> (which I prefer not). >>> I mean that: packet comapring is done in a thread, not a net filter. >>> The flow of the packet sent from guest: >>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next >>> filter will never see it. >>> 2. comparing thread: read it from socket chardev comparer0 >>> 3. call qemu_net_queue_send_iov() to send it back to the netdev. >> Ok, looks like I miss something. >> >> My suggestion tries best to let the packet comparing not tie to filter >> or netdev. But your suggestion still need it to be coupled with a >> netdev. Any advantages of doing this (or is there a reason that packet >> must be sent to netdev after doing comparing?). If not, why not just > Yes, the packet must be sent to netdev after doing comparing. If both > the primary packet and secondary packet are the same(contains the same > application level data), we will drop the secondary packet, and send the > primary packet to the netdev. Otherwise, we will sync the state.
And drop primary packet also here? > >> mirror (duplicate the packet and forward it to a chardev, and pass the >> original packet to the next filter or netdev)? And doing > We cannot send the packet to the netdev before comparing. We need to keep > the connection after failover. > > Thanks > Wen Congyang > >> qemu_net_queue_send_iov() to a netdev in another thread may need some >> synchronization with iothread. >> >>> Thanks >>> Wen Congyang >>> >> >> >> . >> > >