Hi Ben, On Wed, May 13, 2015 at 1:00 AM, Ben Pfaff <b...@nicira.com> wrote: > On Tue, May 12, 2015 at 10:21:19AM +0200, Duarte Nunes wrote: >> We've observed that for high flow setup rates, the flow table becomes a >> bottleneck since it only allows a single writer (all CRUD flow operations >> take the ovs_mutex). >> >> A solution for this is to shard the flow table. A hash function is used to >> know which table to query on packet ingress and on recirculate as well as >> to know on which table to CRUD a flow. Some bits off the flow ID are used >> to identify the shard. The number of shards is configured upon datapath >> creation. > > First, I'm surprised to see high flow setup rates since the introduction > of megaflows. Generally, megaflows greatly reduce the flow setup rate.
True, megaflows help a lot, but not in all cases. In our case, we can't wildcard L4 fields if a connection requires NAT or conntrack, we do those at the edge outside the kernel and thus we want the trip to userspace for each flow, the trade-off being that we can distribute connection state from there and have zero extra hops. > Second, is it possible to use a better data structure? Perhaps one > could, for example, use a mutex per hash chain, instead of a single > mutex, or per-CPU data structures. Ideally, if the data structures were > improved, then one would not need to change the datapath interface at > all. We can look at other options, of course. IIUC, per-CPU data structures sound like they'd require table lookups to spill over to other CPUs after a miss, which sounds bad, and makes me think I didn't understand correctly, can you elaborate? A mutex per hash-chain would work, the number of buckets is high enough to make contention rare. It means: * A mutex per hash table bucket in the flow table. * A mutex for the flow mask list, grab it when a flow operation needs to modify it. * Grab the bucket mutex instead of ovs_mutex in flow CRUD ops. * Two operations need to grab all per-bucket locks in a flow table and the mask list mutex: * Flow table flush. * Datapath deletion. While doable, to us it sounds like it would be both more complicated and less scalable than just sharding of flow tables in a way that's known to userspace. The interface changes would be minimal: * Datapaths get an optional number-of-shards attribute. * Packets that ovs punts to userspace get a new attribute: the hash. If part of the flow id, this becomes unnecessary. ..and we can forget about passing hash functions around or making them known to userspace. Guillermo _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev