On Fri, 3 Aug 2018 16:01:08 -0700 Ben Pfaff <b...@ovn.org> wrote: > I think that a simple mechanism for fairness is fine. The direction > of extensibility that makes me anxious is how to decide what matters > for fairness. So far, we've talked about per-vport fairness. That > works pretty well for packets coming in from virtual interfaces where > each vport represents a separate VM.
Yes, right, that's the case where we have significant issues currently. > It does not work well if the traffic filling your queues all comes > from a single physical port because some source of traffic is sending > traffic at a high rate. In that case, you'll do a lot better if you > do fairness based on the source 5-tuple. But if you're doing network > virtualization, then the outer source 5-tuples won't necessarily vary > much and you'd be better off looking at the VNI and maybe some Geneve > TLV options and maybe the inner 5-tuple... Sure, I see what you mean now. That looks entirely doable if we abstract the round-robin bucket selection out of the current patch. > I would be very pleased if we could integrate a simple mechanism for > fairness, based for now on some simple criteria like the source port, > but thinking ahead to how we could later make it gracefully extensible > to consider more general and possibly customizable criteria. We could change the patch so that instead of just using the vport for round-robin queue insertion, we generalise that and use "buckets" instead of vports, and have a set of possible functions that are called instead of using port_no directly in ovs_dp_upcall_queue_roundrobin(), making this configurable via netlink, per datapath. We could implement selection based on source port or a hash on the source 5-tuple, and the relevant bits of ovs_dp_upcall_queue_roundrobin() would look like this: static int ovs_dp_upcall_queue_roundrobin(struct datapath *dp, struct dp_upcall_info *upcall) { [...] list_for_each_entry(pos, head, list) { int bucket = dp->rr_select(pos); /* Count per-bucket upcalls. */ if (dp->upcalls.count[bucket] == U8_MAX) { err = -ENOSPC; goto out_clear; } dp->upcalls.count[bucket]++; if (bucket == upcall->bucket) { /* Another upcall for the same bucket: move insertion * point here, keep looking for insertion condition to * be still met further on. */ find_next = true; here = pos; continue; } count = dp->upcalls.count[bucket]; if (find_next && dp->upcalls.count[bucket] >= count) { /* Insertion condition met: no need to look further, * unless another upcall for the same port occurs later. */ find_next = false; here = pos; } } [...] } and implementations for dp->rr_select() would look like: int rr_select_vport(struct dp_upcall_info *upcall) { return upcall->port_no; } int rr_select_srcport(struct dp_upcall_info *upcall) { /* look up source port from upcall->skb... */ } And we could then easily extend this to use BPF with maps one day. This is for clarity by the way, but I guess we should avoid indirect calls in the final implementation. What do you think? -- Stefano