Re: [ovs-dev] [PATCH net-next 2/3] netlink: Convert netlink_lookup() to use RCU protected hash table

2014-08-04 Thread Eric Dumazet
etlink_seq_stop(). > -- Yes, two places use rht_dereference() instead of rht_dereference_rcu() [PATCH net-next] netlink: fix lockdep splats With netlink_lookup() conversion to RCU, we need to use appropriate rcu dereference in netlink_seq_socket_idx() & netlink_seq_next() Reported-by: Sasha L

Re: [ovs-dev] [PATCH net-next 2/9] libnl: nla_put_le64(): align on a 64-bit area

2016-04-22 Thread Eric Dumazet
On Fri, 2016-04-22 at 17:31 +0200, Nicolas Dichtel wrote: > nla_data() is now aligned on a 64-bit area. > > Signed-off-by: Nicolas Dichtel > --- > include/net/netlink.h | 8 +--- > include/net/nl802154.h| 6 ++ > net/ieee802154/nl802154.c | 13 - > 3 files changed,

Re: [ovs-dev] [RFC PATCH] openvswitch: use percpu flow stats

2016-08-19 Thread Eric Dumazet
On Fri, 2016-08-19 at 16:47 -0300, Thadeu Lima de Souza Cascardo wrote: > Instead of using flow stats per NUMA node, use it per CPU. When using > megaflows, the stats lock can be a bottleneck in scalability. ... > > flow_cache = kmem_cache_create("sw_flow", sizeof(struct sw_flow) > -

Re: [ovs-dev] [RFC PATCH] openvswitch: use percpu flow stats

2016-08-19 Thread Eric Dumazet
On Fri, 2016-08-19 at 18:09 -0700, David Miller wrote: > From: Eric Dumazet > Date: Fri, 19 Aug 2016 12:56:56 -0700 > > > On Fri, 2016-08-19 at 16:47 -0300, Thadeu Lima de Souza Cascardo wrote: > >> Instead of using flow stats per NUMA node, use it per CPU. When using &

Re: [ovs-dev] [PATCH v2 2/2] openvswitch: use percpu flow stats

2016-09-15 Thread Eric Dumazet
On Thu, 2016-09-15 at 19:11 -0300, Thadeu Lima de Souza Cascardo wrote: > Instead of using flow stats per NUMA node, use it per CPU. When using > megaflows, the stats lock can be a bottleneck in scalability. > > On a E5-2690 12-core system, usual throughput went from ~4Mpps to > ~15Mpps when forwa

Re: [ovs-dev] [RFC net-next 03/22] ipv4: support for fib route lwtunnel encap attributes

2015-07-10 Thread Eric Dumazet
On Fri, 2015-07-10 at 16:19 +0200, Thomas Graf wrote: > From: Roopa Prabhu > + if (oif) > + dev = __dev_get_by_index(net, oif); > + ret = lwtunnel_build_state(dev, encap_type, > +encap, &lwtstate); > + if (!ret) { > + lwtunnel_s

Re: [ovs-dev] [PATCH v4 net-next] MPLS: Use mpls_features to activate software MPLS GSO segmentation

2014-06-02 Thread Eric Dumazet
Hi Simon On Tue, 2014-06-03 at 11:38 +0900, Simon Horman wrote: > +/* If MPLS offload request, verify we are testing hardware MPLS features > + * instead of standard features for the netdev. > + */ > +#ifdef CONFIG_NET_MPLS_GSO > +static netdev_features_t net_mpls_features(struct sk_buff *skb, > +

Re: [ovs-dev] [PATCH net v3] ovs: limit ovs recursions in ovs_execute_actions to not corrupt stack

2016-01-15 Thread Eric Dumazet
On Fri, 2016-01-15 at 15:33 +0100, Hannes Frederic Sowa wrote: > It was seen that defective configurations of openvswitch could overwrite > the STACK_END_MAGIC and cause a hard crash of the kernel because of too > many recursions within ovs. ... > + > + preempt_disable(); > + level = __th

Re: [ovs-dev] [PATCH 1/5] GRE: Add segmentation offload for GRE TAP device.

2013-01-10 Thread Eric Dumazet
On Mon, 2013-01-07 at 18:31 -0800, Pravin B Shelar wrote: > From: Pravin Shelar Do you have some perf numbers to share ? > Signed-off-by: Pravin B Shelar > --- > include/linux/skbuff.h | 12 ++ > include/net/gre.h |6 +++ > net/ipv4/af_inet.c |1 + > net/ipv4/gre.c

Re: [ovs-dev] [patch net-next] net: squash ->rx_handler and ->rx_handler_data into single rcu pointer

2013-03-30 Thread Eric Dumazet
ct netdevice by one pointer and reduces number of needed > rcu_dereference calls from 2 to 1. > Thats not true. > Note this also fixes the race bug pointed out by Steven Rostedt and > fixed by patch "[PATCH] net: add a synchronize_net() in > netdev_rx_handler_unregister()

Re: [ovs-dev] [patch net-next] net: squash ->rx_handler and ->rx_handler_data into single rcu pointer

2013-03-30 Thread Eric Dumazet
On Sat, 2013-03-30 at 18:13 +0100, Jiri Pirko wrote: > Well, not entirely true, depends on arch. > Are you really trying to obfuscate stack because of Alpha architecture ? Really, a bit of stability in this code is welcome. Lets fix existing bugs instead of possibly add new ones. __

Re: [ovs-dev] [patch net-next] net: squash ->rx_handler and ->rx_handler_data into single rcu pointer

2013-03-30 Thread Eric Dumazet
On Sat, 2013-03-30 at 12:28 -0700, Eric Dumazet wrote: > On Sat, 2013-03-30 at 18:13 +0100, Jiri Pirko wrote: > > > Well, not entirely true, depends on arch. > > > > Are you really trying to obfuscate stack because of Alpha architecture ? By the way, only dev->

Re: [ovs-dev] [patch net-next] net: squash ->rx_handler and ->rx_handler_data into single rcu pointer

2013-03-30 Thread Eric Dumazet
On Sat, 2013-03-30 at 12:31 -0700, Eric Dumazet wrote: > By the way, only dev->rx_handler needs to be RCU protected. > > The patch send yesterday make the second rcu_dereference() (to get > rx_handler_data) totally irrelevant. I'll send patch when yesterday fix is

Re: [ovs-dev] [PATCH] net/openvswitch: replace memcmp() with specialized comparator

2013-04-26 Thread Eric Dumazet
On Fri, 2013-04-26 at 17:46 -0400, Peter Klausler wrote: > Tune flow table lookup in net/openvswitch, replacing a call to > the slow-but-safe memcmp() in lib/string.c with a key comparator > routine that presumes most comparisons will succeed. Besides > avoiding an early-exit test on each iteratio

Re: [ovs-dev] [PATCH net-next v2] net: Loosen constraints for recalculating checksum in skb_segment()

2013-05-17 Thread Eric Dumazet
On Fri, 2013-05-17 at 15:49 +0900, Simon Horman wrote: > In the case where the ability to offload checksums changes > then it may be necessary for software checksumming of an skb to occur. > > An example of such a case is where a non-GRE packet is received but > is to be encapsulated and transmitt

Re: [ovs-dev] [PATCH net-next v3] MPLS: Add limited GSO support

2013-05-17 Thread Eric Dumazet
On Fri, 2013-05-17 at 15:50 +0900, Simon Horman wrote: > @@ -509,6 +511,8 @@ struct sk_buff { > __u32 reserved_tailroom; > }; > > + __be16 inner_protocol; > + /* 16/48 bit hole */ > sk_buff_data_t inner_transport_header; >

Re: [ovs-dev] [PATCH next-next v4 1/2] net: Use 16bits for inner_*_headers fields of struct skbuff

2013-05-22 Thread Eric Dumazet
64K MTU. Verify if the acking and collapsing resulted in a headroom exceeding what csum_start can cover and reallocate the headroom if so. A big thank you to Jim Foraker and the team at LLNL for helping out with the investiga

Re: [ovs-dev] [PATCH net-next 1/2] net: Export skb_zerocopy() to zerocopy from one skb to another

2013-05-24 Thread Eric Dumazet
On Fri, 2013-05-24 at 16:52 +0200, Thomas Graf wrote: > Make the skb zerocopy logic written for nfnetlink queue available for > use by other modules. > > Signed-off-by: Thomas Graf > --- > include/linux/skbuff.h | 2 ++ > net/core/skbuff.c| 46 +

Re: [ovs-dev] [PATCH net-next 2/2] openvswitch: Use zerocopy if applicable when performing the upcall

2013-05-24 Thread Eric Dumazet
On Fri, 2013-05-24 at 10:24 -0700, Jesse Gross wrote: > Does this have any impact on small packets? Those are usually the > common case (i.e. TCP SYN) and I think this is slightly less optimal > for those. No difference at all, small packets are copied anyway in skb->head _

Re: [ovs-dev] [PATCH net-next 2/2] openvswitch: Use zerocopy if applicable when performing the upcall

2013-05-24 Thread Eric Dumazet
On Fri, 2013-05-24 at 14:23 -0700, Jesse Gross wrote: > On Fri, May 24, 2013 at 11:58 AM, Eric Dumazet wrote: > > On Fri, 2013-05-24 at 10:24 -0700, Jesse Gross wrote: > > > >> Does this have any impact on small packets? Those are usually the > >> common case (

Re: [ovs-dev] [PATCH net-next 2/2] openvswitch: Use zerocopy if applicable when performing the upcall

2013-05-24 Thread Eric Dumazet
On Fri, 2013-05-24 at 15:18 -0700, Jesse Gross wrote: > Offloads are supported. What I want to know is how they affect > performance with this change. Hmm, I do not understand why you are checksumming then. skb_copy_and_csum_dev() is a killer. ___ d

Re: [ovs-dev] [PATCH net-next 2/2] openvswitch: Use zerocopy if applicable when performing the upcall

2013-05-24 Thread Eric Dumazet
On Fri, 2013-05-24 at 17:08 -0700, Jesse Gross wrote: > What's the alternative? I guess Thomas was working on this ;) We addressed all this on nfnetlink lately. Presumably GSO stuff is a non issue for vSwitch upcalls. But UDP messages can be big, and their checksum might be already validated b

Re: [ovs-dev] [PATCH net-next 2/2] openvswitch: Use zerocopy if applicable when performing the upcall

2013-05-25 Thread Eric Dumazet
On Sat, 2013-05-25 at 08:02 +0100, Thomas Graf wrote: > I ran TCP_CRR to verify the SYN/ACK use case and I did not > observe a difference. If you have any specific test in mind > I will be glad to run that before posting the 2nd revision. I guess you should test with rx checksum disabled as well,

Re: [ovs-dev] [PATCH 1/3] skbuff: Update truesize in pskb_expand_head

2013-06-12 Thread Eric Dumazet
On Wed, 2013-06-12 at 19:05 +1000, Dave Wiltshire wrote: > Some call sites to pskb_expand_head subsequently update the skb truesize > and others don't (even with non-zero arguments). This is likely a memory > audit leak. Fixed this up by moving the memory accounting to the > skbuff.c file and remov

Re: [ovs-dev] [PATCH] loopback: set pkt_type to PACKET_HOST explicitly

2013-06-26 Thread Eric Dumazet
On Wed, 2013-06-26 at 16:34 +0900, Isaku Yamahata wrote: > Reset pkt_type to PACKET_HOST when loopback device receives packet > before calling eth_type_trans() > > ip-encapsulated packets can be handled by localhost. But skb->pkt_type > can be PACKET_OTHERHOST when packet comes into ip tunnel devi

Re: [ovs-dev] [PATCH net-next 00/21] treewide: Use consistent api style for address testing

2012-10-19 Thread Eric Dumazet
On Thu, 2012-10-18 at 20:55 -0700, Joe Perches wrote: > ethernet, ipv4, and ipv6 address testing uses 3 different api naming styles. > > ethernet uses:is__ether_addr > ipv4 uses:ipv4_is_ > ipv6 uses:ipv6_addr_ > > Standardize on the ipv6 style of _addr_ to reduce > the number of s

Re: [ovs-dev] [RFC v3] Add TCP encap_rcv hook

2012-04-12 Thread Eric Dumazet
On Thu, 2012-04-12 at 16:42 +0900, Simon Horman wrote: > This hook is based on a hook of the same name provided by UDP. It provides > a way for to receive packets that have a TCP header and treat them in some > alternate way. > > It is intended to be used by an implementation of the STT tunneling

[ovs-dev] [PATCH net-next] udp: intoduce udp_encap_needed static_key

2012-04-12 Thread Eric Dumazet
path does a single JMP . When static_key is enabled, JMP destination is patched to reach the real encap_type/encap_rcv logic, possibly adding cache misses. Signed-off-by: Eric Dumazet Cc: Simon Horman Cc: dev@openvswitch.org --- include/net/udp.h|1 + net/ipv4/udp.c | 12

Re: [ovs-dev] [PATCH net-next] udp: intoduce udp_encap_needed static_key

2012-04-12 Thread Eric Dumazet
On Thu, 2012-04-12 at 11:05 +0200, Eric Dumazet wrote: > If static_key is not yet enabled, the fast path does a single JMP . > > When static_key is enabled, JMP destination is patched to reach the real > encap_type/encap_rcv logic, possibly adding cache misses. Small note Simon, Th

Re: [ovs-dev] [GIT PULL v2] Open vSwitch

2011-11-23 Thread Eric Dumazet
Le mercredi 23 novembre 2011 à 15:54 +0800, Herbert Xu a écrit : > David Miller wrote: > > > > I would like to see some discussion wrt. Jamal's feedback, which is that > > a lot of the side-band functionality added by this code is either 1) already > > doable with packet scheduler actions or 2) s

Re: [ovs-dev] [GIT PULL v2] Open vSwitch

2011-11-23 Thread Eric Dumazet
Le mercredi 23 novembre 2011 à 07:47 -0500, jamal a écrit : > On Wed, 2011-11-23 at 09:12 +0100, Eric Dumazet wrote: > > > I had no time to look at OVS, but current tc model is not scalable, > > everything is performed under a queue lock. > > Maybe its time to redesi

Re: [ovs-dev] [GIT PULL v2] Open vSwitch

2011-11-23 Thread Eric Dumazet
Le mercredi 23 novembre 2011 à 08:36 -0500, jamal a écrit : > If you wanna do this right - I suggest you get a different domain name. > tc.org or something along those lines. > Start aggregating documentation that is validated to be working. There's > a lot of "opinions" out there instead of facts

Re: [ovs-dev] Open vSwitch Design

2011-11-24 Thread Eric Dumazet
Le jeudi 24 novembre 2011 à 21:20 -0800, Stephen Hemminger a écrit : > The problem is that there are two flow classifiers, one in OpenVswitch > in the kernel, and the other in the user space flow manager. I think the > issue is that the two have different code. We have kind of same duplication in

Re: [ovs-dev] Open vSwitch Design

2011-11-24 Thread Eric Dumazet
Le vendredi 25 novembre 2011 à 01:25 -0500, David Miller a écrit : > From: Eric Dumazet > Date: Fri, 25 Nov 2011 07:18:03 +0100 > > > Le jeudi 24 novembre 2011 à 21:20 -0800, Stephen Hemminger a écrit : > > > >> The problem is that there are two flow classifiers,

Re: [ovs-dev] Open vSwitch Design

2011-11-25 Thread Eric Dumazet
Le vendredi 25 novembre 2011 à 06:34 -0500, jamal a écrit : > Hrm. I forgot about the flow classifier - it may be what the openflow > folks need. It is more friendly for the well defined tuples than u32. > > But what do you mean "refactor"? I can already use this classifier > and attach actions to

[ovs-dev] [PATCH net-next 0/4] net: factorize flow dissector

2011-11-28 Thread Eric Dumazet
Le vendredi 25 novembre 2011 à 14:02 +0100, Eric Dumazet a écrit : > cls_flow is not complete, since it doesnt handle tunnels for example. > > It calls a 'partial flow classifier' to find each needed element, one by > one. > (adding tunnel decap would need to perform t

[ovs-dev] [PATCH net-next 1/4] net: introduce skb_flow_dissect()

2011-11-28 Thread Eric Dumazet
all. Signed-off-by: Eric Dumazet --- include/net/flow_keys.h | 15 net/core/Makefile |2 net/core/flow_dissector.c | 134 3 files changed, 150 insertions(+), 1 deletion(-) diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h

[ovs-dev] [PATCH net-next 2/4] net: use skb_flow_dissect() in __skb_get_rxhash()

2011-11-28 Thread Eric Dumazet
No functional changes. This uses the code we factorized in skb_flow_dissect() Signed-off-by: Eric Dumazet --- net/core/dev.c | 125 +-- 1 file changed, 14 insertions(+), 111 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 8afb244

[ovs-dev] [PATCH net-next 3/4] cls_flow: use skb_flow_dissect()

2011-11-28 Thread Eric Dumazet
Instead of using a custom flow dissector, use skb_flow_dissect() and benefit from tunnelling support. This lack of tunnelling support was mentioned by Dan Siemon. Signed-off-by: Eric Dumazet --- net/sched/cls_flow.c | 180 ++--- 1 file changed, 48

[ovs-dev] [PATCH net-next 4/4] sch_sfb: use skb_flow_dissect()

2011-11-28 Thread Eric Dumazet
Current SFB double hashing is not fulfilling SFB theory, if two flows share same rxhash value. Using skb_flow_dissect() permits to really have better hash dispersion, and get tunnelling support as well. Double hashing point was mentioned by Florian Westphal Signed-off-by: Eric Dumazet --- net

Re: [ovs-dev] Integration of Open vSwitch

2011-11-30 Thread Eric Dumazet
Le mercredi 30 novembre 2011 à 08:14 -0500, jamal a écrit : > On Wed, 2011-11-30 at 15:00 +0800, Herbert Xu wrote: > > > > The other factor I considered is scalability. The OVS code as is > > is not really friendly to SMP/NUMA scalability (but as Eric pointed, > > neither is the classifier/actio