Per Alex's request, I ran a 10K internal port creation test (using batches of 500 ports at a time via ovs-vsctl) on my 8GB memory machine. Again RCU was slightly faster:
master: real 3m28.301s with RCU: real 3m21.489s Also, the reason I don't simply batch all creation of ports together via — separator in ovs-vsctl is because if doing so, the message to the xlate module will contain all 1000 ports. This means there will be only be 1 copy of the configuration in memory. However, when the creation of ports is not batched, this creates 1000 different messages to the xlate module, meaning 1000 copies of the configuration in memory. This should stress memory usage. I ran a test in my previous email that details this behavior more specifically, but I know I've sent a lot of emails so here's the gist of it. Ryan From: Ryan Wilson <wr...@vmware.com<mailto:wr...@vmware.com>> Date: Monday, May 19, 2014 9:59 PM To: Ben Pfaff <b...@nicira.com<mailto:b...@nicira.com>> Cc: Ryan Wilson <wr...@nicira.com<mailto:wr...@nicira.com>>, "dev@openvswitch.org<mailto:dev@openvswitch.org>" <dev@openvswitch.org<mailto:dev@openvswitch.org>> Subject: Re: [ovs-dev] [PATCH v4] ofproto-dpif-xlate: Implement RCU locking in ofproto-dpif-xlate. And sorry for the email spam but here's a test I did earlier with the RCU patch when I first finished it. I sent the email to Alex, but unfortunately not to the rest of the list. This may give some insight into the case where this patch is the most helpful ---- So I performance tested this and most cases were about the same performance. However, I saw a significant performance increase while doing the following test: - Establish 100 netperf TCP_CRR connections to sink (as you do in your start script) - Sleep for 20 (to let connections get established and OVS to get started) - Run the following commands in an indefinite loop: ovs-vsctl add-br br0 -- add-br br1 ovs-vsctl set bridge br1 datapath-type=dummy \ other-config:hwaddr=aa:55:aa:56:00:00 -- \ add-port br1 p11 -- set Interface p11 type=patch \ options:peer=p00 -- \ add-port br0 p00 -- set Interface p00 type=patch \ options:peer=p11 -- ovs-vsctl set Interface p00 bfd:enable=true -- \ set Interface p11 bfd:enable=true sleep 1 ovs-vsctl del-br br0 -- del-br br1 Notice how I don't chain the commands together, this is because if I do so, the new config is batched in 1 message to the ofproto-dpif-xlate layer, meaning only 1 global xlate_rwlock worth of delay in master. So if I batch commands together, there is no significant performance hit. However, when I don't chain commands together (i.e. I use 3 separate ovs-vsctl commands), then in master this means 3 separate messages between ofproto and ofproto-dpif-xlate, meaning 3 lockings of global xlate_rwlock. This can add up to a bunch of delay! Hence this is where we see the real improvement for RCU. Some numbers for about 10000 interim results of the netperf processes (in trans/s): RCU: Mean: 84.591932 Median: 83.405000 Master: Mean: 78.528627 Median: 70.550000 Its not huge, but if we add more ovs-vsctl commands, I'd imagine we'd see more improvement. Not sure if this is a valid use case, but these are my findings so far. Ryan Wilson Member of Technical Staff wr...@vmware.com<mailto:wr...@vmware.com> 3401 Hillview Avenue, Palo Alto, CA 650.427.1511 Office 916.588.7783 Mobile On May 19, 2014, at 9:56 PM, Ryan Wilson <wr...@vmware.com<mailto:wr...@vmware.com>> wrote: Sorry Gurucharan, totally forgot to answer your question! After interspersing these tests with random calls to reload the kernel module, it doesn't appear to affect time in any significant way. Ryan Wilson Member of Technical Staff wr...@vmware.com<mailto:wr...@vmware.com> 3401 Hillview Avenue, Palo Alto, CA 650.427.1511 Office 916.588.7783 Mobile On May 19, 2014, at 9:53 PM, Ryan Wilson <wr...@vmware.com<mailto:wr...@vmware.com>> wrote: So I did an experiment where I added 500 and 1000 ports and then deleted 500 and 1000 ports with and without this patch on both machines with 8 GB and 62 GB memory. Weirdly enough, adding / deleting ports with the RCU patch turned out to actually be faster than without. My only explanation here is taking the global xlate lock is expensive and / or 500 ports wasn't enough to induce memory pressure. Here are the numbers for the 500 port case on a 8 GB memory machine: WIth RCU patch: Adding ports: real 1m15.850s Deleting ports: real1m21.830s Without RCU patch: Adding ports: real 1m28.357s Deleting ports: real1m33.277s Ryan Wilson Member of Technical Staff wr...@vmware.com<mailto:wr...@vmware.com> 3401 Hillview Avenue, Palo Alto, CA 650.427.1511 Office 916.588.7783 Mobile On May 19, 2014, at 8:56 AM, Ben Pfaff <b...@nicira.com<mailto:b...@nicira.com>> wrote: On Fri, May 16, 2014 at 06:59:02AM -0700, Ryan Wilson wrote: Before, a global read-write lock protected the ofproto-dpif / ofproto-dpif-xlate interface. Handler and revalidator threads had to wait while configuration was being changed. This patch implements RCU locking which allows handlers and revalidators to operate while configuration is being updated. Signed-off-by: Ryan Wilson <wr...@nicira.com<mailto:wr...@nicira.com>> Acked-by: Alex Wang <al...@nicira.com<mailto:al...@nicira.com>> One side effect of this change that I am a bit concerned about is performance of configuration changes. In particular, it looks like removing a port requires copying the entire configuration and that removing N ports requires copying the entire configuration N times. Can you try a few experiments with configurations that have many ports, maybe 500 or 1000, and see how long it takes to remove several of them? _______________________________________________ dev mailing list dev@openvswitch.org<mailto:dev@openvswitch.org> https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman/listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=TfBS78Vw3dzttvXidhbffg%3D%3D%0A&m=Zs91K1%2FqNCTCBEK8%2FYn6ZxlWk8%2B9KnAmWsxIFslVMIM%3D%0A&s=d0a516ff3c7de6162c8224e60363e8159c1da0eabe5ede2e10d43b18858e965d
_______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev