On Jan 24, 2013, at 19:41 , ext Jesse Gross wrote:

> On Thu, Jan 24, 2013 at 7:34 AM, Jarno Rajahalme
> <jarno.rajaha...@nsn.com> wrote:
>> 
>> On Jan 23, 2013, at 19:30 , ext Jesse Gross wrote:
>> 
>>> On Tue, Jan 22, 2013 at 9:48 PM, Jarno Rajahalme
>>> <jarno.rajaha...@nsn.com> wrote:
>>>> Add OVS_PACKET_ATTR_KEY_INFO to relieve userspace from re-computing
>>>> data already computed within the kernel datapath.  In the typical
>>>> case of an upcall with perfect key fitness between kernel and
>>>> userspace this eliminates flow_extract() and flow_hash() calls in
>>>> handle_miss_upcalls().
>>>> 
>>>> Additional bookkeeping within the kernel datapath is minimal.
>>>> Kernel flow insertion also saves one flow key hash computation.
>>>> 
>>>> Removed setting the packet's l7 pointer for ICMP packets, as this was
>>>> never used.
>>>> 
>>>> Signed-off-by: Jarno Rajahalme <jarno.rajaha...@nsn.com>
>>>> ---
>>>> 
>>>> This likely requires some discussion, but it took a while for me to
>>>> understand why each packet miss upcall would require flow_extract()
>>>> right after the flow key has been obtained from odp attributes.
>>> 
>>> Do you have any performance numbers to share?  Since this is an
>>> optimization it's important to understand if the benefit is worth the
>>> extra complexity.
>> 
>> Not yet, but would be happy to. Any hits towards for the best way of 
>> obtaining
>> meaningful numbers for something like this?
> 
> This is a flow setup optimization, so usually something like netperf
> TCP_CRR would be a good way to stress that.
> 
> However, Ben mentioned to me that he had tried eliminating the
> flow_extract() call from userspace in the past and the results were
> disappointing.

I made a simple test, where there is only one flow entry "in_port=LOCAL 
actions=drop", and only the local port is configured. One process sends UDP 
packets with different source/destination port combinations in a loop. OVS then 
tries to cope with the load. During the test both processes run near 100% CPU 
utilization in a virtual machine on a dual-core laptop. On each round 10100000 
packets were generated:

OFPST_PORT reply (xid=0x2): 1 ports
  port LOCAL: rx pkts=10100006, bytes=464600468, drop=0, errs=0, frame=0, 
over=0, crc=0
           tx pkts=0, bytes=0, drop=0, errs=0, coll=0

With current master 19.35% of packets on average get processed by the flow:

Round 1:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=29.124s, table=0, n_packets=1959794, n_bytes=90150548, 
idle_age=4, in_port=LOCAL actions=drop

Round 2:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=63.534s, table=0, n_packets=1932785, n_bytes=88908158, 
idle_age=37, in_port=LOCAL actions=drop

Round 3:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=33.394s, table=0, n_packets=1972389, n_bytes=90729894, 
idle_age=8, in_port=LOCAL actions=drop


With the proposed change 20.2% of packets on average get processed by the flow:

Round 4:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=31.96s, table=0, n_packets=2042759, n_bytes=93966914, 
idle_age=4, in_port=LOCAL actions=drop

Round 5:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=38.6s, table=0, n_packets=2040224, n_bytes=93850372, 
idle_age=8, in_port=LOCAL actions=drop

Round 6:
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=35.661s, table=0, n_packets=2039595, n_bytes=93821418, 
idle_age=3, in_port=LOCAL actions=drop


So there is a consistent benefit, but it is not very large. Seemingly the 
flow_extract() and flow_hash() represent only a small portion of the OVS flow 
setup CPU use.

  Jarno

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to