Quoting Jesse Gross <je...@nicira.com>:
On Thu, Oct 1, 2015 at 4:19 PM, <dwil...@us.ibm.com> wrote:
Quoting Jesse Gross <je...@nicira.com>:
On Tue, Sep 29, 2015 at 10:50 PM, <dwil...@us.ibm.com> wrote:
Hi-
I have been conducting scaling tests with OVS and docker. My tests
revealed
that the latency of ARP packets can become very large resulting in many
ARP
re-transmissions and time-outs. I found the source of the poor latency to
be
with the handling of arp packets in ovs_vport_find_upcall_portid(). Each
packet is hashed in ovs_vport_find_upcall_portid() by calling
skb_get_hash(). This hash is used to select a netlink socket in which to
send the packet to userspace. However, skb_get_hash() is not supporting
ARP
packets returning a 0 (invalid hash) for every ARP. This results in a
single ovs-vswitchd handler thread processing every arp packet thus
severely
impacting the average latency of ARPs. I am purposing a change to
ovs_vport_find_upcall_portid() that spreads the ARP packets evenly
between
all the handler threads (patch to follow). Please let me know if you
have
suggestions/comments.
This is definitely an interesting analysis but I'm a little surprised
at the basic scenario. First, I guess it seems to me that the L2
domain is too large if there are this many ARPs.
I can imagine running a couple of thousand docker containers, so I think
this is a reasonable size test.
Having thousands of nodes (regardless of whether they are containers
or VMs) on a single L2 segment is really not a good idea. I would
expect them to be segmented into smaller groups with L3 boundaries in
the middle.
Something I am not clear on. Creating smaller L2 segments would reduce
the impact of a broadcast packet, fewer ports to flood the packet to.
But would it affect the performance of the upcall datapath? Comparing
a single 512 port switch to two 256 port switches (on a single host).
Wont they have the same number of ports, queues, threads and upcalls?
On a related issue, I am looking into the memory consumed by the netlink
sockets, OVS on linux can create many of these sockets. Do you have any
thought as to why the current model was picked?
Independent queues are the easiest way to provide lockless access to
incoming packets on different cores and, in some case, give higher
priority to certain types of packets.
The speed also
generally seems slower than I would expect but in any case I don't
disagree that it is better to spread the load among all the cores.
On the patch itself, can't we just make skb_get_hash() be able to
decode ARP? It seems like that is cleaner and more generic.
My first thought was to make a change in skb_get_hash(). However, the
comment in __skb_get_hash() state that the hash is generated from the
4-tuple (address and ports). ARPs have no ports so a return value of 0
looked correct.
/*
* __skb_get_hash: calculate a flow hash based on src/dst addresses
* and src/dst port numbers. Sets hash in skb to non-zero hash value
* on success, zero indicates no valid hash. Also, sets l4_hash in skb
* if hash is a canonical 4-tuple hash over transport ports.
*/
What do you think?
I don't think that this is really a strict definition. In particular,
IP packets that aren't TCP or UDP will still return a hash based on
the IP addresses.
However, I believe that you are looking at an old version of this
function. Any changes would need to be made to the upstream Linux
tree, not purely in OVS.
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss