Hi Ben, On Tue, 31 Jul 2018 15:06:57 -0700 Ben Pfaff <b...@ovn.org> wrote:
> This is an awkward problem to try to solve with sockets because of the > nature of sockets, which are strictly first-in first-out. What you > really want is something closer to the algorithm that we use in > ovs-vswitchd to send packets to an OpenFlow controller. When the > channel becomes congested, then for each packet to be sent to the > controller, OVS appends it to a queue associated with its input port. > (This could be done on a more granular basis than just port.) If the > maximum amount of queued packets is reached, then OVS discards a packet > from the longest queue. When space becomes available in the channel, > OVS round-robins through the queues to send a packet. This achieves > pretty good fairness but it can't be done with sockets because you can't > drop a packet that is already queued to one. Thanks for your feedback. What you describe is, though, functionally equivalent to what this patch does, minus the per-port queueing limit. However, instead of having one explicit queue for each port, and then fetching packets in a round-robin fashion from all the queues, we implemented this with a single queue and choose insertion points while queueing in such a way that the result is equivalent. This way, we avoid the massive overhead associated with having one queue per each port (we can have up to 2^16 ports), and cycling over them. Let's say we have two ports, A and B, and three upcalls are sent for each port. If we implement one queue for each port as you described, we end up with this: .---------------- - - - | A1 | A2 | A3 | '---------------- - - - .---------------- - - - | B1 | B2 | B3 | '---------------- - - - and then send upcalls in this order: A1, B1, A2, B2, A3, B3. What we are doing here with a single queue is inserting the upcalls directly in this order: .------------------------------- - - - | A1 | B1 | A2 | B2 | A3 | B3 | '------------------------------- - - - and dequeueing from the head. About the per-port queueing limit: we currently have a global one (UPCALL_QUEUE_MAX_LEN), while the per-port limit is simply given by implementation constraints in our case: if (dp->upcalls.count[pos->port_no] == U8_MAX - 1) { err = -ENOSPC; goto out_clear; } but we can easily swap that U8_MAX - 1 with another macro or a configurable value, if there's any value in doing that. > My current thought is that any fairness scheme we implement directly in > the kernel is going to need to evolve over time. Maybe we could do > something flexible with BPF and maps, instead of hard-coding it. Honestly, I fail to see what else we might want to do here, other than adding a simple mechanism for fairness, to solve the specific issue at hand. Flexibility would probably come at a higher cost. We could easily make limits configurable if needed. Do you have anything else in mind? -- Stefano