Hi Ben,

On Wed, May 13, 2015 at 1:00 AM, Ben Pfaff <b...@nicira.com> wrote:
> On Tue, May 12, 2015 at 10:21:19AM +0200, Duarte Nunes wrote:
>> We've observed that for high flow setup rates, the flow table becomes a
>> bottleneck since it only allows a single writer (all CRUD flow operations
>> take the ovs_mutex).
>>
>> A solution for this is to shard the flow table. A hash function is used to
>> know which table to query on packet ingress and on recirculate as well as
>> to know on which table to CRUD a flow. Some bits off the flow ID are used
>> to identify the shard. The number of shards is configured upon datapath
>> creation.
>
> First, I'm surprised to see high flow setup rates since the introduction
> of megaflows.  Generally, megaflows greatly reduce the flow setup rate.

True, megaflows help a lot, but not in all cases. In our case, we
can't wildcard L4 fields if a connection requires NAT or conntrack, we
do those at the edge outside the kernel and thus we want the trip to
userspace for each flow, the trade-off being that we can distribute
connection state from there and have zero extra hops.

> Second, is it possible to use a better data structure?  Perhaps one
> could, for example, use a mutex per hash chain, instead of a single
> mutex, or per-CPU data structures.  Ideally, if the data structures were
> improved, then one would not need to change the datapath interface at
> all.

We can look at other options, of course.

IIUC, per-CPU data structures sound like they'd require table lookups
to spill over to other CPUs after a miss, which sounds bad, and makes
me think I didn't understand correctly, can you elaborate?

A mutex per hash-chain would work, the number of buckets is high
enough to make contention rare. It means:

* A mutex per hash table bucket in the flow table.
* A mutex for the flow mask list, grab it when a flow operation needs
to modify it.
* Grab the bucket mutex instead of ovs_mutex in flow CRUD ops.
* Two operations need to grab all per-bucket locks in a flow table and
the mask list mutex:
    * Flow table flush.
    * Datapath deletion.

While doable, to us it sounds like it would be both more complicated
and less scalable than just sharding of flow tables in a way that's
known to userspace. The interface changes would be minimal:

* Datapaths get an optional number-of-shards attribute.
* Packets that ovs punts to userspace get a new attribute: the hash.
If part of the flow id, this becomes unnecessary.

..and we can forget about passing hash functions around or making them
known to userspace.


Guillermo
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to