On Tue, Feb 2, 2016 at 4:41 PM, Han Zhou <zhou...@gmail.com> wrote: > > > On Tue, Feb 2, 2016 at 2:01 PM, Russell Bryant <russ...@ovn.org> wrote: > > > > On 02/02/2016 04:49 PM, Ben Pfaff wrote: > > > On Thu, Jan 21, 2016 at 03:21:00PM -0500, Russell Bryant wrote: > > >> Previously, ovn-controller translated logical flows into OpenFlow > flows > > >> for *every* logical datapath. This patch makes it so we skip doing so > > >> for the egress pipeline if the datapath is a logical switch with no > > >> logical ports bound locally. In that case, the flows have no effect. > > >> > > >> This was the code path taking the most time in a large scale OVN > > >> environment and was an easy optimization to make based on the existing > > >> local_datapaths info. > > >> > > >> In this environment, while idling, ovn-controller was taking up about > > >> 20% CPU with this patch, while other nodes were in the 40-70% range. > > >> > > >> Reported-at: https://bugs.launchpad.net/networking-ovn/+bug/1536003 > > >> Signed-off-by: Russell Bryant <russ...@ovn.org> > > >> Tested-by: Matt Mulsow <mailto:mamul...@us.ibm.com> > > >> --- > > >> > > >> > > >> As discussed in the OVN IRC meeting today, this is one patch I came up > > >> with while trying to analyze the performance in a large scale OVN > > >> test setup. It made a big impact for not much code. I know Ben had > some > > >> suggestions for how to clean this up, so I'm just submitting as RFC > for now. > > > > > > I think this is fine for now; we can optimize more later. > > > > > > However I get a compile error against current master, perhaps it has > > > moved on since you posted the patch: > > > > > > ../ovn/controller/ovn-controller.c: In function ‘main’: > > > ../ovn/controller/ovn-controller.c:300:54: error: > ‘local_datapaths’ undeclared (first use in this function) > > > lflow_run(&ctx, &flow_table, &ct_zones, > &local_datapaths); > > > ^ > > > ../ovn/controller/ovn-controller.c:300:54: note: each undeclared > identifier is reported only once for each function it appears in > > > > Yeah, I actually wrote this on top of this patch series: > > > > https://patchwork.ozlabs.org/bundle/russellb/localnet/ > > > > mainly because the scale test environment had those patches applied. > > > > > I think that the way to optimize this, in the end, is to use a kind of > > > "flood fill" algorithm: > > > > > > 1. Initialize set S to the logical datapaths that have a port > > > located on the hypervisor. > > > > > > 2. For each patch port P in a logical datapath in S, add the > logical > > > datapath of the remote end of P to S. > > > > > > Extra credit if there's a way to infer (or specify) that a logical > > > datapath is "terminal", that is, that a packet that comes into it from > a > > > different logical datapath will never come back out. > > > > right, because #2 needs to happen on every datapath added as a result of > > #2, as well. > > Awesome! This would help a lot for small logical router use cases, > especially for bridge mode (localnet) where there is no logical routers. > However, choosing this direction also means that below point mentioned in > TODO will be abandoned/re-thought: > === > *** Reducing redundant data and code within ovsdb-server. > Currently, ovsdb-server separately composes database update information to > send to each of its clients. This is fine for a small number of clients, > but it wastes time and memory when hundreds of clients all want the same > updates (as will be in the case in OVN). (This is somewhat opposed to the > idea of letting a client monitor only some rows in a table, since that > would increase the diversity among clients.) > === > > ovsdb-server has a per monitor cache that suppose to store same updates in json form. Is this cache not effective in your tests?
> Any thoughts on this? It means we need to make ovsdb better at large > amount of clients handling. A tree structured cluster with sharding may be > considered. > We just connected around 1800 chassises simulated by sandbox, which end up > 80% CPU of ovsdb-server in the node running nb/sb ovsdb, without even > creating any lswitch/lports. > @Andy may be interested in this. > Would it be possible to separate the nb/sb ovsdb-server so we can get a rough idea where performance bottlenecks are? they suppose to be separated in a sailing environment anyways. > > > > > > I'll incorporate some comments about this in the non-RFC submission. As > > I mentioned above, it's based on the localnet fixes from above, so I'll > > probably wait until those are merged. > > > Yeah, the localnet fixes are pending, and please maybe review this related > one, which is also based on the same patch sets: > http://patchwork.ozlabs.org/patch/576145/ (on top of > https://patchwork.ozlabs.org/bundle/russellb/localnet/) > > -- > Best regards, > Han > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev