And sorry for the email spam but here's a test I did earlier with the RCU patch 
when I first finished it. I sent the email to Alex, but unfortunately not to 
the rest of the list. This may give some insight into the case where this patch 
is the most helpful

----

So I performance tested this and most cases were about the same performance. 
However, I saw a significant performance increase while doing the following 
test:

- Establish 100 netperf TCP_CRR connections to sink (as you do in your start 
script)
- Sleep for 20 (to let connections get established and OVS to get started)
- Run the following commands in an indefinite loop:

   ovs-vsctl add-br br0 -- add-br br1
   ovs-vsctl set bridge br1 datapath-type=dummy \
       other-config:hwaddr=aa:55:aa:56:00:00 -- \
       add-port br1 p11 -- set Interface p11 type=patch \
       options:peer=p00 -- \
       add-port br0 p00 -- set Interface p00 type=patch \
       options:peer=p11 --
   ovs-vsctl set Interface p00 bfd:enable=true -- \
       set Interface p11 bfd:enable=true
    sleep 1
    ovs-vsctl del-br br0 -- del-br br1

Notice how I don't chain the commands together, this is because if I do so, the 
new config is batched in 1 message to the ofproto-dpif-xlate layer, meaning 
only 1 global xlate_rwlock worth of delay in master. So if I batch commands 
together, there is no significant performance hit.

However, when I don't chain commands together (i.e. I use 3 separate ovs-vsctl 
commands), then in master this means 3 separate messages between ofproto and 
ofproto-dpif-xlate, meaning 3 lockings of global xlate_rwlock. This can add up 
to a bunch of delay! Hence this is where we see the real improvement for RCU.

Some numbers for about 10000 interim results of the netperf processes (in 
trans/s):

RCU:
Mean: 84.591932
Median: 83.405000

Master:
Mean: 78.528627
Median: 70.550000

Its not huge, but if we add more ovs-vsctl commands, I'd imagine we'd see more 
improvement. Not sure if this is a valid use case, but these are my findings so 
far.

Ryan Wilson
Member of Technical Staff
wr...@vmware.com
3401 Hillview Avenue, Palo Alto, CA
650.427.1511 Office
916.588.7783 Mobile

On May 19, 2014, at 9:56 PM, Ryan Wilson <wr...@vmware.com> wrote:

> Sorry Gurucharan, totally forgot to answer your question!
> 
> After interspersing these tests with random calls to reload the kernel 
> module, it doesn't appear to affect time in any significant way.
> 
> Ryan Wilson
> Member of Technical Staff
> wr...@vmware.com
> 3401 Hillview Avenue, Palo Alto, CA
> 650.427.1511 Office
> 916.588.7783 Mobile
> 
> On May 19, 2014, at 9:53 PM, Ryan Wilson <wr...@vmware.com> wrote:
> 
>> So I did an experiment where I added 500 and 1000 ports and then deleted 500 
>> and 1000 ports with and without this patch on both machines with 8 GB and 62 
>> GB memory. Weirdly enough, adding / deleting ports with the RCU patch turned 
>> out to actually be faster than without. My only explanation here is taking 
>> the global xlate lock is expensive and / or 500 ports wasn't enough to 
>> induce memory pressure.
>> 
>> Here are the numbers for the 500 port case on a 8 GB memory machine:
>> WIth RCU patch:
>> Adding ports: real   1m15.850s
>> Deleting ports: real 1m21.830s
>> 
>> Without RCU patch:
>> Adding ports: real   1m28.357s
>> Deleting ports: real 1m33.277s
>> 
>> Ryan Wilson
>> Member of Technical Staff
>> wr...@vmware.com
>> 3401 Hillview Avenue, Palo Alto, CA
>> 650.427.1511 Office
>> 916.588.7783 Mobile
>> 
>> On May 19, 2014, at 8:56 AM, Ben Pfaff <b...@nicira.com> wrote:
>> 
>>> On Fri, May 16, 2014 at 06:59:02AM -0700, Ryan Wilson wrote:
>>>> Before, a global read-write lock protected the ofproto-dpif / 
>>>> ofproto-dpif-xlate
>>>> interface. Handler and revalidator threads had to wait while configuration 
>>>> was
>>>> being changed. This patch implements RCU locking which allows handlers and
>>>> revalidators to operate while configuration is being updated.
>>>> 
>>>> Signed-off-by: Ryan Wilson <wr...@nicira.com>
>>>> Acked-by: Alex Wang <al...@nicira.com>
>>> 
>>> One side effect of this change that I am a bit concerned about is
>>> performance of configuration changes.  In particular, it looks like
>>> removing a port requires copying the entire configuration and that
>>> removing N ports requires copying the entire configuration N times.  Can
>>> you try a few experiments with configurations that have many ports,
>>> maybe 500 or 1000, and see how long it takes to remove several of them?
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman/listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=TfBS78Vw3dzttvXidhbffg%3D%3D%0A&m=Zs91K1%2FqNCTCBEK8%2FYn6ZxlWk8%2B9KnAmWsxIFslVMIM%3D%0A&s=d0a516ff3c7de6162c8224e60363e8159c1da0eabe5ede2e10d43b18858e965d
>> 
> 

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to