On Jan 24, 2013, at 19:41 , ext Jesse Gross wrote: > On Thu, Jan 24, 2013 at 7:34 AM, Jarno Rajahalme > <jarno.rajaha...@nsn.com> wrote: >> >> On Jan 23, 2013, at 19:30 , ext Jesse Gross wrote: >> >>> On Tue, Jan 22, 2013 at 9:48 PM, Jarno Rajahalme >>> <jarno.rajaha...@nsn.com> wrote: >>>> Add OVS_PACKET_ATTR_KEY_INFO to relieve userspace from re-computing >>>> data already computed within the kernel datapath. In the typical >>>> case of an upcall with perfect key fitness between kernel and >>>> userspace this eliminates flow_extract() and flow_hash() calls in >>>> handle_miss_upcalls(). >>>> >>>> Additional bookkeeping within the kernel datapath is minimal. >>>> Kernel flow insertion also saves one flow key hash computation. >>>> >>>> Removed setting the packet's l7 pointer for ICMP packets, as this was >>>> never used. >>>> >>>> Signed-off-by: Jarno Rajahalme <jarno.rajaha...@nsn.com> >>>> --- >>>> >>>> This likely requires some discussion, but it took a while for me to >>>> understand why each packet miss upcall would require flow_extract() >>>> right after the flow key has been obtained from odp attributes. >>> >>> Do you have any performance numbers to share? Since this is an >>> optimization it's important to understand if the benefit is worth the >>> extra complexity. >> >> Not yet, but would be happy to. Any hits towards for the best way of >> obtaining >> meaningful numbers for something like this? > > This is a flow setup optimization, so usually something like netperf > TCP_CRR would be a good way to stress that. > > However, Ben mentioned to me that he had tried eliminating the > flow_extract() call from userspace in the past and the results were > disappointing.
I made a simple test, where there is only one flow entry "in_port=LOCAL actions=drop", and only the local port is configured. One process sends UDP packets with different source/destination port combinations in a loop. OVS then tries to cope with the load. During the test both processes run near 100% CPU utilization in a virtual machine on a dual-core laptop. On each round 10100000 packets were generated: OFPST_PORT reply (xid=0x2): 1 ports port LOCAL: rx pkts=10100006, bytes=464600468, drop=0, errs=0, frame=0, over=0, crc=0 tx pkts=0, bytes=0, drop=0, errs=0, coll=0 With current master 19.35% of packets on average get processed by the flow: Round 1: NXST_FLOW reply (xid=0x4): cookie=0x0, duration=29.124s, table=0, n_packets=1959794, n_bytes=90150548, idle_age=4, in_port=LOCAL actions=drop Round 2: NXST_FLOW reply (xid=0x4): cookie=0x0, duration=63.534s, table=0, n_packets=1932785, n_bytes=88908158, idle_age=37, in_port=LOCAL actions=drop Round 3: NXST_FLOW reply (xid=0x4): cookie=0x0, duration=33.394s, table=0, n_packets=1972389, n_bytes=90729894, idle_age=8, in_port=LOCAL actions=drop With the proposed change 20.2% of packets on average get processed by the flow: Round 4: NXST_FLOW reply (xid=0x4): cookie=0x0, duration=31.96s, table=0, n_packets=2042759, n_bytes=93966914, idle_age=4, in_port=LOCAL actions=drop Round 5: NXST_FLOW reply (xid=0x4): cookie=0x0, duration=38.6s, table=0, n_packets=2040224, n_bytes=93850372, idle_age=8, in_port=LOCAL actions=drop Round 6: NXST_FLOW reply (xid=0x4): cookie=0x0, duration=35.661s, table=0, n_packets=2039595, n_bytes=93821418, idle_age=3, in_port=LOCAL actions=drop So there is a consistent benefit, but it is not very large. Seemingly the flow_extract() and flow_hash() represent only a small portion of the OVS flow setup CPU use. Jarno _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev