And sorry for the email spam but here's a test I did earlier with the RCU patch when I first finished it. I sent the email to Alex, but unfortunately not to the rest of the list. This may give some insight into the case where this patch is the most helpful
---- So I performance tested this and most cases were about the same performance. However, I saw a significant performance increase while doing the following test: - Establish 100 netperf TCP_CRR connections to sink (as you do in your start script) - Sleep for 20 (to let connections get established and OVS to get started) - Run the following commands in an indefinite loop: ovs-vsctl add-br br0 -- add-br br1 ovs-vsctl set bridge br1 datapath-type=dummy \ other-config:hwaddr=aa:55:aa:56:00:00 -- \ add-port br1 p11 -- set Interface p11 type=patch \ options:peer=p00 -- \ add-port br0 p00 -- set Interface p00 type=patch \ options:peer=p11 -- ovs-vsctl set Interface p00 bfd:enable=true -- \ set Interface p11 bfd:enable=true sleep 1 ovs-vsctl del-br br0 -- del-br br1 Notice how I don't chain the commands together, this is because if I do so, the new config is batched in 1 message to the ofproto-dpif-xlate layer, meaning only 1 global xlate_rwlock worth of delay in master. So if I batch commands together, there is no significant performance hit. However, when I don't chain commands together (i.e. I use 3 separate ovs-vsctl commands), then in master this means 3 separate messages between ofproto and ofproto-dpif-xlate, meaning 3 lockings of global xlate_rwlock. This can add up to a bunch of delay! Hence this is where we see the real improvement for RCU. Some numbers for about 10000 interim results of the netperf processes (in trans/s): RCU: Mean: 84.591932 Median: 83.405000 Master: Mean: 78.528627 Median: 70.550000 Its not huge, but if we add more ovs-vsctl commands, I'd imagine we'd see more improvement. Not sure if this is a valid use case, but these are my findings so far. Ryan Wilson Member of Technical Staff wr...@vmware.com 3401 Hillview Avenue, Palo Alto, CA 650.427.1511 Office 916.588.7783 Mobile On May 19, 2014, at 9:56 PM, Ryan Wilson <wr...@vmware.com> wrote: > Sorry Gurucharan, totally forgot to answer your question! > > After interspersing these tests with random calls to reload the kernel > module, it doesn't appear to affect time in any significant way. > > Ryan Wilson > Member of Technical Staff > wr...@vmware.com > 3401 Hillview Avenue, Palo Alto, CA > 650.427.1511 Office > 916.588.7783 Mobile > > On May 19, 2014, at 9:53 PM, Ryan Wilson <wr...@vmware.com> wrote: > >> So I did an experiment where I added 500 and 1000 ports and then deleted 500 >> and 1000 ports with and without this patch on both machines with 8 GB and 62 >> GB memory. Weirdly enough, adding / deleting ports with the RCU patch turned >> out to actually be faster than without. My only explanation here is taking >> the global xlate lock is expensive and / or 500 ports wasn't enough to >> induce memory pressure. >> >> Here are the numbers for the 500 port case on a 8 GB memory machine: >> WIth RCU patch: >> Adding ports: real 1m15.850s >> Deleting ports: real 1m21.830s >> >> Without RCU patch: >> Adding ports: real 1m28.357s >> Deleting ports: real 1m33.277s >> >> Ryan Wilson >> Member of Technical Staff >> wr...@vmware.com >> 3401 Hillview Avenue, Palo Alto, CA >> 650.427.1511 Office >> 916.588.7783 Mobile >> >> On May 19, 2014, at 8:56 AM, Ben Pfaff <b...@nicira.com> wrote: >> >>> On Fri, May 16, 2014 at 06:59:02AM -0700, Ryan Wilson wrote: >>>> Before, a global read-write lock protected the ofproto-dpif / >>>> ofproto-dpif-xlate >>>> interface. Handler and revalidator threads had to wait while configuration >>>> was >>>> being changed. This patch implements RCU locking which allows handlers and >>>> revalidators to operate while configuration is being updated. >>>> >>>> Signed-off-by: Ryan Wilson <wr...@nicira.com> >>>> Acked-by: Alex Wang <al...@nicira.com> >>> >>> One side effect of this change that I am a bit concerned about is >>> performance of configuration changes. In particular, it looks like >>> removing a port requires copying the entire configuration and that >>> removing N ports requires copying the entire configuration N times. Can >>> you try a few experiments with configurations that have many ports, >>> maybe 500 or 1000, and see how long it takes to remove several of them? >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman/listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=TfBS78Vw3dzttvXidhbffg%3D%3D%0A&m=Zs91K1%2FqNCTCBEK8%2FYn6ZxlWk8%2B9KnAmWsxIFslVMIM%3D%0A&s=d0a516ff3c7de6162c8224e60363e8159c1da0eabe5ede2e10d43b18858e965d >> >
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev