Quoting Jesse Gross <je...@nicira.com>:

On Tue, Sep 29, 2015 at 10:50 PM,  <dwil...@us.ibm.com> wrote:
Hi-

I have been conducting scaling tests with OVS and docker. My tests revealed
that the latency of ARP packets can become very large resulting in many ARP
re-transmissions and time-outs. I found the source of the poor latency to be
with the handling of arp packets in ovs_vport_find_upcall_portid().  Each
packet is hashed in ovs_vport_find_upcall_portid() by calling
skb_get_hash().  This hash is used to select a netlink socket in which to
send the packet to userspace.  However, skb_get_hash() is not supporting ARP
packets returning a 0 (invalid hash) for every ARP.  This results in a
single ovs-vswitchd handler thread processing every arp packet thus severely
impacting the average latency of ARPs. I am purposing a change to
ovs_vport_find_upcall_portid() that spreads the ARP packets evenly between
all the handler threads (patch to follow).  Please let me know if you have
suggestions/comments.

This is definitely an interesting analysis but I'm a little surprised
at the basic scenario. First, I guess it seems to me that the L2
domain is too large if there are this many ARPs.

I can imagine running a couple of thousand docker containers, so I think this is a reasonable size test.

On a related issue, I am looking into the memory consumed by the netlink sockets, OVS on linux can create many of these sockets. Do you have any thought as to why the current model was picked?


The speed also
generally seems slower than I would expect but in any case I don't
disagree that it is better to spread the load among all the cores.

On the patch itself, can't we just make skb_get_hash() be able to
decode ARP? It seems like that is cleaner and more generic.

My first thought was to make a change in skb_get_hash(). However, the comment in __skb_get_hash() state that the hash is generated from the 4-tuple (address and ports). ARPs have no ports so a return value of 0 looked correct.

/*
  * __skb_get_hash: calculate a flow hash based on src/dst addresses
  * and src/dst port numbers.  Sets hash in skb to non-zero hash value
  * on success, zero indicates no valid hash.  Also, sets l4_hash in skb
  * if hash is a canonical 4-tuple hash over transport ports.
  */

What do you think?

_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to