On 01/22/2016 11:28 AM, Wen Congyang wrote:
> On 01/22/2016 11:15 AM, Jason Wang wrote:
>>
>> On 01/20/2016 06:30 PM, Wen Congyang wrote:
>>> On 01/20/2016 06:19 PM, Jason Wang wrote:
>>>>>
>>>>> On 01/20/2016 06:01 PM, Wen Congyang wrote:
>>>>>>> On 01/20/2016 02:54 PM, Jason Wang wrote:
>>>>>>>>> On 01/20/2016 11:29 AM, Zhang Chen wrote:
>>>>>>>>>>>>> Sure.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Two main comments/suggestions:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - TCP analysis is missed in current version, maybe you point a 
>>>>>>>>>>>>> git tree
>>>>>>>>>>>>> (or another version of RFC) to me for a better understanding of 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> design. (Just a skeleton for TCP should be sufficient to discuss).
>>>>>>>>>>>>> - I prefer to make the code as reusable as possible. So it's 
>>>>>>>>>>>>> better to
>>>>>>>>>>>>> split/decouple the reusable parts from the codes. So a vague idea 
>>>>>>>>>>>>> is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Decouple the packet comparing from the netfilter. You've 
>>>>>>>>>>>>> achieved
>>>>>>>>>>>>> this 99% since the work has been done in a thread. Just let the 
>>>>>>>>>>>>> thread
>>>>>>>>>>>>> poll sockets directly, then the comparing have the possibility to 
>>>>>>>>>>>>> be
>>>>>>>>>>>>> reused by other kinds of dataplane.
>>>>>>>>>>>>> 2) Implement traffic mirror/redirector as filter.
>>>>>>>>>>>>> 3) Implement TCP seq rewriting as a filter.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Then, in primary node, you need just a traffic mirror, which did:
>>>>>>>>>>>>> - mirror ingress traffic to secondary node
>>>>>>>>>>>>> - mirror outgress traffic to packet comparing thread
>>>>>>>>>>>>>
>>>>>>>>>>>>> And in secondadry node, you need two filters:
>>>>>>>>>>>>> - A TCP seq rewriter which adjust tcp sequence number.
>>>>>>>>>>>>> - A traffic redirector which redirect packet from a socket as 
>>>>>>>>>>>>> ingress
>>>>>>>>>>>>> traffic, and redirect outgress traffic to the socket which could 
>>>>>>>>>>>>> be
>>>>>>>>>>>>> polled by remote packet comparing thread.
>>>>>>>>>>>>>   Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> zhangchen
>>>>>>>>>>> Hi, Jason.
>>>>>>>>>>> We consider your suggestion to split/decouple
>>>>>>>>>>> the reusable parts from the codes.
>>>>>>>>>>> Due to filter plugin are traversed one by one in order
>>>>>>>>>>> we will split colo-proxy to three filters in each side.
>>>>>>>>>>>
>>>>>>>>>>> But in this plan,primary and secondary both have socket
>>>>>>>>>>> server,startup is a problem.
>>>>>>>>> I believe this issue could be solved by reusing socket chardev.
>>>>>>>>>
>>>>>>>>>>>  Primary qemu                                                      
>>>>>>>>>>> Secondary qemu
>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>> | +-----------------------------------------------------+  |       
>>>>>>>>>>> | 
>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>> | |                                                     |  |       
>>>>>>>>>>> | 
>>>>>>>>>>> |                                                      | |
>>>>>>>>>>> | |                        guest                        |  |       
>>>>>>>>>>> | 
>>>>>>>>>>> |                        guest                         | |
>>>>>>>>>>> | |                                                     |  |       
>>>>>>>>>>> | 
>>>>>>>>>>> |                                                      | |
>>>>>>>>>>> | +-----------^--------------+--------------------------+  |       
>>>>>>>>>>> | 
>>>>>>>>>>> +---------------------+--------+-----------------------+ |
>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>> |                        ^        |                         |
>>>>>>>>>>> |             |              |                             |      
>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>> |             +-------------------------------------------------+ 
>>>>>>>>>>> |                        |        |                         |
>>>>>>>>>>> |  netfilter  |              |                             |    |  
>>>>>>>>>>> |  
>>>>>>>>>>> netfilter            |        |                         |
>>>>>>>>>>> | +-----------------------------------------------------+  |    |  
>>>>>>>>>>> | 
>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>> | |           |              |     filter excute order  |  |    |  
>>>>>>>>>>> | 
>>>>>>>>>>> |                     |        |  filter excute order  | |
>>>>>>>>>>> | |           |              |    +-------------------> |  |    |  
>>>>>>>>>>> | 
>>>>>>>>>>> |                     |        | +-------------------> | |
>>>>>>>>>>> | |           |              |                          |  |    |  
>>>>>>>>>>> | 
>>>>>>>>>>> |                     |        |   TCP                 | |
>>>>>>>>>>> | | +---------+-+     +------v-----+    +----+ +-----+  |  |    |  
>>>>>>>>>>> | 
>>>>>>>>>>> | +-----------+   +---+----+---v+rewriter+  +--------+ | |
>>>>>>>>>>> | | |           |     |            |    |            |  |  |    |  
>>>>>>>>>>> | 
>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>> | | |  mirror   |     |  redirect  +---->  compare   |  |  |   
>>>>>>>>>>> +--------> mirror   +---> adjust |   adjust    +-->redirect| | |
>>>>>>>>>>> | | |  client   |     |  server    |    |            |  |  |       
>>>>>>>>>>> | 
>>>>>>>>>>> | |  server   |   | ack    |   seq       |  |client  | | |
>>>>>>>>>>> | | |           |     |            |    |            |  |  |       
>>>>>>>>>>> | 
>>>>>>>>>>> | |           |   |        |             |  |        | | |
>>>>>>>>>>> | | +----^------+     +----^-------+    +-----+------+  |  |       
>>>>>>>>>>> | 
>>>>>>>>>>> | +-----------+   +--------+-------------+  +----+---+ | |
>>>>>>>>>>> | |      |     tx          |      rx          |     rx  |  |       
>>>>>>>>>>> | 
>>>>>>>>>>> |            tx                        all       |  rx | |
>>>>>>>>>>> | +-----------------------------------------------------+  |       
>>>>>>>>>>> | 
>>>>>>>>>>> +------------------------------------------------------+ |
>>>>>>>>>>> |        |                
>>>>>>>>>>> +-------------------------------------------------------------------------------------------+
>>>>>>>>>>>       
>>>>>>>>>>> |
>>>>>>>>>>> |        |                                    |            |      
>>>>>>>>>>> |                                                           |
>>>>>>>>>>> +----------------------------------------------------------+      
>>>>>>>>>>> +-----------------------------------------------------------+
>>>>>>>>>>>          |                                    |
>>>>>>>>>>>          |guest receive                       |guest send
>>>>>>>>>>>          |                                    |
>>>>>>>>>>> +--------+------------------------------------v------------+
>>>>>>>>>>> |                                                          |
>>>>>>>>>>> |                                                          |
>>>>>>>>>>> |                         tap                             
>>>>>>>>>>> |                              NOTE: filter direction is rx/tx/all
>>>>>>>>>>> |                                                         
>>>>>>>>>>> |                              rx:receive packets sent to the netdev
>>>>>>>>>>> |                                                         
>>>>>>>>>>> |                              tx:receive packets sent by the netdev
>>>>>>>>>>> +----------------------------------------------------------+
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> I still like to decouple comparer from netfilter. It have two obvious
>>>>>>>>> advantages:
>>>>>>>>>
>>>>>>>>> - make it can be reused by other dataplane (e.g vhost)
>>>>>>>>> - secondary redirector could redirect rx to comparer on primary node
>>>>>>>>> directly which simplify the design.
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> guest recv packet route
>>>>>>>>>>>
>>>>>>>>>>> primary
>>>>>>>>>>> tap --> mirror client filter
>>>>>>>>>>> mirror client will send packet to guest,at the
>>>>>>>>>>> same time, copy and forward packet to secondary
>>>>>>>>>>> mirror server.
>>>>>>>>>>>
>>>>>>>>>>> secondary
>>>>>>>>>>> mirror server filter --> TCP rewriter
>>>>>>>>>>> if recv packet is TCP packet,we will adjust ack
>>>>>>>>>>> and update TCP checksum, then send to secondary
>>>>>>>>>>> guest. else directly send to guest.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> guest send packet route
>>>>>>>>>>>
>>>>>>>>>>> primary
>>>>>>>>>>> guest --> redirect server filter
>>>>>>>>>>> redirect server filter recv primary guest packet
>>>>>>>>>>> but do nothing, just pass to next filter.
>>>>>>>>>>>
>>>>>>>>>>> redirect server filter --> compare filter
>>>>>>>>>>> compare filter recv primary guest packet then
>>>>>>>>>>> waiting scondary redirect packet to compare it.
>>>>>>>>>>> if packet same,send primary packet and clear secondary
>>>>>>>>>>> packet, else send primary packet and do
>>>>>>>>>>> checkpoint.
>>>>>>>>>>>
>>>>>>>>>>> secondary
>>>>>>>>>>> guest --> TCP rewriter filter
>>>>>>>>>>> if the packet is TCP packet,we will adjust seq
>>>>>>>>>>> and update TCP checksum. then send it to
>>>>>>>>>>> redirect client filter. else directly send to
>>>>>>>>>>> redirect client filter.
>>>>>>>>>>>
>>>>>>>>>>> redirect client filter --> redirect server filter
>>>>>>>>>>> forward packet to primary
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In failover scene(primary is down), the TCP rewriter will keep
>>>>>>>>>>> servicing
>>>>>>>>>>> for the TCP connection which is established after the last 
>>>>>>>>>>> checkpoint。
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> How about this plan?
>>>>>>>>> Sounds good.
>>>>>>>>>
>>>>>>>>> And there's indeed no need to differ client/server by reusing the 
>>>>>>>>> socket
>>>>>>>>> chardev. E.g:
>>>>>>>>>
>>>>>>>>> In primary node:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>> -chardev socket,id=comparer0,host=ip_primary,port=X,server,nowait
>>>>>>>>> -chardev socket,id=comparer1,host=ip_primary,port=Y,server,nowait
>>>>>>>>> -chardev socket,id=mirrorer0,host=ip_primary,port=Z,server,nowait
>>>>>>>>> -netdev tap,id=hn0
>>>>>>>>> -traffic-mirrorer netdev=hn0,id=t0,indev=comparer0,outdev=mirrorer0
>>>>>>>>> -colo-comparer primary_traffic=comparer0,secondary_traffic=comparer1
>>>>>>> Why mirrorer has indev? 
>>>>>
>>>>> As I said in the previous mails. I would like to decouple packet
>>>>> comparing from netfilter. You've already done most of this since the
>>>>> comparing is done in an independent thread. So the indev here is to
>>>>> mirror the packet sent by guest to the packet comparing thread.
>>>>>
>>>>>>> I think we can use traffic-redirector to do it.
>>>>>>> The command line is:
>>>>>>> -netdev tap,id=hn0
>>>>>>> -object traffic-mirrorer,id=f0,netdev=hn0,queue=tx,outdev=mirrorer0
>>>>>>> -object traffic-redirector,id=f1,netdev=hn0,queue=rx,outdev=comparer0
>>>>>>> -colo-comparer 
>>>>>>> primary_traffic=comparer0,secondary_traffic=comparer1,netdev=hn0
>>>>>>> In the comparer thread, we can use qemu_net_queue_send_iov() to send
>>>>>>> out the packet.
>>>>>>>
>>>>>>> Also, we can merge the socketdev comparer1 and mirrorer0.
>>>>> It depends on whether or not packet comparing was done in a net filter
>>>>> (which I prefer not).
>>> I mean that: packet comapring is done in a thread, not a net filter.
>>> The flow of the packet sent from guest:
>>> 1. traffice-redirecotr, we will redirector the packet to comparer0, the next
>>>    filter will never see it.
>>> 2. comparing thread: read it from socket chardev comparer0
>>> 3. call qemu_net_queue_send_iov() to send it back to the netdev.
>> Ok, looks like I miss something.
>>
>> My suggestion tries best to let the packet comparing not tie to filter
>> or netdev. But your suggestion still need it to be coupled with a
>> netdev. Any advantages of doing this (or is there a reason that packet
>> must be sent to netdev after doing comparing?). If not, why not just
> Yes, the packet must be sent to netdev after doing comparing. If both
> the primary packet and secondary packet are the same(contains the same
> application level data), we will drop the secondary packet, and send the
> primary packet to the netdev. Otherwise, we will sync the state.

And drop primary packet also here?

>
>> mirror (duplicate the packet and forward it to a chardev, and pass the
>> original packet to the next filter or netdev)? And doing
> We cannot send the packet to the netdev before comparing. We need to keep
> the connection after failover.
>
> Thanks
> Wen Congyang
>
>> qemu_net_queue_send_iov() to a netdev in another thread may need some
>> synchronization with iothread.
>>
>>> Thanks
>>> Wen Congyang
>>>
>>
>>
>> .
>>
>
>


Reply via email to