[ovs-dev] Salut

2014-09-20 Thread Mlle Patricia Rosa



Je m'excuse pour le dérangement, je voudrais faire votre connaissance et
liée une amitié sincère avec vous, prière de me répondre. Je promets
d'être honnête et de garder une bonne relation avec vous.

baisers,
Mlle Patricia Rosa
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 6/9] switchdev: add basic support for flow matching and actions

2014-09-20 Thread Jiri Pirko
Sat, Sep 20, 2014 at 07:32:08AM CEST, f.faine...@gmail.com wrote:
>On 09/19/14 06:49, Jiri Pirko wrote:
>>This patch adds basic support for flows. The infrastructure is prepared
>>to easily add another flow matching types. So far, only the key one is
>>implemented.
>>
>>Signed-off-by: Jiri Pirko 
>>---
>
>[snip]
>
>>
>>+struct swdev_flow_match_key {
>>+ struct {
>>+ u32 priority;   /* Packet QoS priority. */
>>+ u32 in_port_ifindex; /* Input switch port ifindex (or 0). */
>>+ } phy;
>>+ struct {
>>+ u8 src[ETH_ALEN];   /* Ethernet source address. */
>>+ u8 dst[ETH_ALEN];   /* Ethernet destination address. */
>>+ __be16 tci; /* 0 if no VLAN, VLAN_TAG_PRESENT set 
>>otherwise. */
>
>Humm, how about QinQ here? I would provision two more 16 bits fields so we
>can do all sorts of VLAN matching.
>
>You might want to allow for a 4 to 8 bytes hardware switch tag as well.

Note this structure is not carved in stone and can be easily adjusted
without any problems any time. So when the time comes and the changes
you are describing will be needed, we can do it.


>
>>+ __be16 type;/* Ethernet frame type. */
>>+ } eth;
>>+ struct {
>>+ u8 proto;   /* IP protocol or lower 8 bits of ARP 
>>opcode. */
>>+ u8 tos; /* IP ToS. */
>>+ u8 ttl; /* IP TTL/hop limit. */
>>+ u8 frag;/* One of OVS_FRAG_TYPE_*. */
>
>Options might be missing?
>
>[snip]
>
>>+
>>+static void print_flow(const struct swdev_flow *flow, struct net_device *dev,
>>+const char *comment)
>>+{
>>+ pr_debug("%s flow %s:\n", dev->name, comment);
>>+ print_flow_match(&flow->match);
>>+ print_flow_actions(flow->action, flow->action_count);
>>+}
>
>I am really not sure how much of this valuable besides early (as in, right
>now) debugging, don't we rather want a generic way to dump a given flow under
>a its native netlink format, does that code has to be here in the first
>place?

Hmm, I think you have a point here, let me think about that.

>--
>Florian
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 5/9] net: introduce dummy switch

2014-09-20 Thread Jiri Pirko
Sat, Sep 20, 2014 at 07:21:00AM CEST, f.faine...@gmail.com wrote:
>On 09/19/14 06:49, Jiri Pirko wrote:
>>Dummy switch implementation using switchdev interface
>
>This really looks like a DSA driver that has 0 ports, and is not attached to
>an useful network interface, and which is registering its own set of rtnl
>operations for a purpose that is unclear to me.
>
>I think registering these rtnl ops is misleading as it leads to a false idea
>that this is allowed, and that people are actually encouraged to do that for
>custom switch drivers, and this completely defeats the purpose of coming up
>with a generic API.
>
>If we are to go that route anyway, I really prefer the way Felix did it in
>swconfig, and the fake switch driver did do something useful being attached
>to the loopback interface:
>
>http://lists.openwall.net/netdev/2013/10/22/103
>http://patchwork.ozlabs.org/patch/285478/

I will drop dummyswitch because it serves primary as an example. But
since rocker is here, this is no longer needed.


>
>>
>>Signed-off-by: Jiri Pirko 
>>---
>>  drivers/net/Kconfig  |   7 +++
>>  drivers/net/Makefile |   1 +
>>  drivers/net/dummyswitch.c| 130 
>> +++
>>  include/uapi/linux/if_link.h |   9 +++
>>  4 files changed, 147 insertions(+)
>>  create mode 100644 drivers/net/dummyswitch.c
>>
>>diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
>>index c6f6f69..7822c74 100644
>>--- a/drivers/net/Kconfig
>>+++ b/drivers/net/Kconfig
>>@@ -71,6 +71,13 @@ config DUMMY
>>To compile this driver as a module, choose M here: the module
>>will be called dummy.
>>
>>+config NET_DUMMY_SWITCH
>>+ tristate "Dummy switch net driver support"
>>+ depends on NET_SWITCHDEV
>>+ ---help---
>>+   To compile this driver as a module, choose M here: the module
>>+   will be called dummyswitch.
>>+
>>  config EQUALIZER
>>  tristate "EQL (serial line load balancing) support"
>>  ---help---
>>diff --git a/drivers/net/Makefile b/drivers/net/Makefile
>>index 61aefdd..3c835ba 100644
>>--- a/drivers/net/Makefile
>>+++ b/drivers/net/Makefile
>>@@ -7,6 +7,7 @@
>>  #
>>  obj-$(CONFIG_BONDING) += bonding/
>>  obj-$(CONFIG_DUMMY) += dummy.o
>>+obj-$(CONFIG_NET_DUMMY_SWITCH) += dummyswitch.o
>>  obj-$(CONFIG_EQUALIZER) += eql.o
>>  obj-$(CONFIG_IFB) += ifb.o
>>  obj-$(CONFIG_MACVLAN) += macvlan.o
>>diff --git a/drivers/net/dummyswitch.c b/drivers/net/dummyswitch.c
>>new file mode 100644
>>index 000..e7a48f4
>>--- /dev/null
>>+++ b/drivers/net/dummyswitch.c
>>@@ -0,0 +1,130 @@
>>+/*
>>+ * drivers/net/dummyswitch.c - Dummy switch device
>>+ * Copyright (c) 2014 Jiri Pirko 
>>+ *
>>+ * This program is free software; you can redistribute it and/or modify
>>+ * it under the terms of the GNU General Public License as published by
>>+ * the Free Software Foundation; either version 2 of the License, or
>>+ * (at your option) any later version.
>>+ */
>>+
>>+#include 
>>+#include 
>>+#include 
>>+#include 
>>+#include 
>>+#include 
>>+
>>+struct dummyswport_priv {
>>+ struct netdev_phys_item_id psid;
>>+};
>>+
>>+static netdev_tx_t dummyswport_start_xmit(struct sk_buff *skb,
>>+   struct net_device *dev)
>>+{
>>+ dev_kfree_skb(skb);
>>+ return NETDEV_TX_OK;
>>+}
>>+
>>+static int dummyswport_swdev_id_get(struct net_device *dev,
>>+ struct netdev_phys_item_id *psid)
>>+{
>>+ struct dummyswport_priv *dsp = netdev_priv(dev);
>>+
>>+ memcpy(psid, &dsp->psid, sizeof(*psid));
>>+ return 0;
>>+}
>>+
>>+static int dummyswport_change_carrier(struct net_device *dev, bool 
>>new_carrier)
>>+{
>>+ if (new_carrier)
>>+ netif_carrier_on(dev);
>>+ else
>>+ netif_carrier_off(dev);
>>+ return 0;
>>+}
>>+
>>+static const struct net_device_ops dummyswport_netdev_ops = {
>>+ .ndo_start_xmit = dummyswport_start_xmit,
>>+ .ndo_swdev_id_get   = dummyswport_swdev_id_get,
>>+ .ndo_change_carrier = dummyswport_change_carrier,
>>+};
>>+
>>+static void dummyswport_setup(struct net_device *dev)
>>+{
>>+ ether_setup(dev);
>>+
>>+ /* Initialize the device structure. */
>>+ dev->netdev_ops = &dummyswport_netdev_ops;
>>+ dev->destructor = free_netdev;
>>+
>>+ /* Fill in device structure with ethernet-generic values. */
>>+ dev->tx_queue_len = 0;
>>+ dev->flags |= IFF_NOARP;
>>+ dev->flags &= ~IFF_MULTICAST;
>>+ dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
>>+ dev->features   |= NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_TSO;
>>+ dev->features   |= NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_LLTX;
>>+ eth_hw_addr_random(dev);
>>+}
>>+
>>+static int dummyswport_validate(struct nlattr *tb[], struct nlattr *data[])
>>+{
>>+ if (tb[IFLA_ADDRESS])
>>+ return -EINVAL;
>>+ if (!data || !data[IFLA_DUMMYSWPORT_PHYS_SWITCH_ID])
>>+ return -EINVAL;
>>+ return 0;
>>+}
>>

Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Jiri Pirko
Sat, Sep 20, 2014 at 05:41:16AM CEST, ro...@cumulusnetworks.com wrote:
>On 9/19/14, 8:49 AM, Jiri Pirko wrote:
>>Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com wrote:
>>>On 09/19/14 09:49, Jiri Pirko wrote:
This patch exposes switchdev API using generic Netlink.
Example userspace utility is here:
https://github.com/jpirko/switchdev

>>>Is this just a temporary test tool? Otherwise i dont see reason
>>>for its existence (or the API that it feeds on).
>>Please read the conversation I had with Pravin and Jesse in v1 thread.
>>Long story short they like to have the api separated from ovs datapath
>>so ovs daemon can use it to directly communicate with driver. Also John
>>Fastabend requested a way to work with driver flows without using ovs ->
>>that was the original reason I created switchdev genl api.
>>
>>Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
>>will use directly switchdev genl api.
>>
>>I hope I cleared this out.
>We already have all the needed rtnetlink kernel api and userspace tools
>around it to support all
>switching asic features. ie, the rtnetlink api is the switchdev api. We can
>do l2, l3, acl's with it.
>Its unclear to me why we need another new netlink api. Which will mean none
>of the existing tools to
>create bridges etc will work on a switchdev.

No one is proposing such API. Note that what I'm trying to solve in my
patchset is FLOW world. There is only one API there, ovs genl. But the
usage of that for hw offload purposes was nacked by ovs maintainer. Plus
couple of people wanted to run the offloading independently on ovs
instance. Therefore I introduced the switchdev genl, which takes care of
that. No plan to extend it for other things you mentioned, just flows.


>Which seems like going in the direction exactly opposite to what we had
>discussed earlier.

Nope. The previous discussion ignored flows.


>
>If a non-ovs flow interface is needed from userspace, we can extend the
>existing interface to include flows.

How? You mean to extend rtnetlink? What advantage it would bring
comparing to separate genl iface?


>I don't understand why we should replace the existing rtnetlink switchdev api
>to accommodate flows.

Sorry, I do not undertand what "existing rtnetlink switchdev api" you
have on mind. Would you care to explain?


>
>Thanks,
>Roopa
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Scott Feldman

On Sep 19, 2014, at 8:41 PM, Roopa Prabhu  wrote:

> On 9/19/14, 8:49 AM, Jiri Pirko wrote:
>> Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com wrote:
>>> On 09/19/14 09:49, Jiri Pirko wrote:
 This patch exposes switchdev API using generic Netlink.
 Example userspace utility is here:
 https://github.com/jpirko/switchdev
 
>>> Is this just a temporary test tool? Otherwise i dont see reason
>>> for its existence (or the API that it feeds on).
>> Please read the conversation I had with Pravin and Jesse in v1 thread.
>> Long story short they like to have the api separated from ovs datapath
>> so ovs daemon can use it to directly communicate with driver. Also John
>> Fastabend requested a way to work with driver flows without using ovs ->
>> that was the original reason I created switchdev genl api.
>> 
>> Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
>> will use directly switchdev genl api.
>> 
>> I hope I cleared this out.
> We already have all the needed rtnetlink kernel api and userspace tools 
> around it to support all
> switching asic features. ie, the rtnetlink api is the switchdev api. We can 
> do l2, l3, acl's with it.
> Its unclear to me why we need another new netlink api. Which will mean none 
> of the existing tools to
> create bridges etc will work on a switchdev.
> Which seems like going in the direction exactly opposite to what we had 
> discussed earlier.

Existing rtnetlink isn’t available to swdev without some kind of snooping the 
echoes from the various kernel components (bridge, fib, etc).  With swdev_flow, 
as Jiri has defined it, there is an additional conversion needed to bridge the 
gap (bad expression, I know) between rtnetlink and swdev_flow.  This conversion 
happens in the kernel components.  For example, the bridge module, still driven 
from userspace by existing rtnetlink, will formulate the necessary swdev_flow 
insert/remove calls to the swdev driver such that HW will offload the fwd path.

You have:
user -> rtnetlink -> kernel -> netlink echo -> [some process] -> [some 
driver] -> HW

Jiri has:
user -> rtnetlink -> kernel -> swdev_* -> swdev driver -> HW

> If a non-ovs flow interface is needed from userspace, we can extend the 
> existing interface to include flows.
> I don't understand why we should replace the existing rtnetlink switchdev api 
> to accommodate flows.
> 
> Thanks,
> Roopa
> 


-scott



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Jiri Pirko
Sat, Sep 20, 2014 at 12:12:12AM CEST, john.r.fastab...@intel.com wrote:
>On 09/19/2014 10:57 AM, Jamal Hadi Salim wrote:
>> On 09/19/14 11:49, Jiri Pirko wrote:
>>> Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com wrote:
>> 
 Is this just a temporary test tool? Otherwise i dont see reason
 for its existence (or the API that it feeds on).
>>>
>>> Please read the conversation I had with Pravin and Jesse in v1 thread.
>>> Long story short they like to have the api separated from ovs datapath
>>> so ovs daemon can use it to directly communicate with driver. Also John
>>> Fastabend requested a way to work with driver flows without using ovs ->
>>> that was the original reason I created switchdev genl api.
>>>
>>> Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
>>> will use directly switchdev genl api.
>>>
>>> I hope I cleared this out.
>>>
>> 
>> It is - thanks Jiri.
>> 
>> cheers,
>> jamal
>
>Hi Jiri,
>
>I was considering a slightly different approach where the
>device would report via netlink the fields/actions it
>supported rather than creating pre-defined enums for every
>possible key.
>
>I already need to have an API to report fields/matches
>that are being supported why not have the device report
>the headers as header fields (len, offset) and the
>associated parse graph the hardware uses? Vendors should
>have this already to describe/design their real hardware.

Hmm, let me think about this a bit more. I will try to figure out how to
handle that. Sound logic though. Will try to incorporate the idea in the
patchset.


>
>As always its better to have code and when I get some
>time I'll try to write it up. Maybe its just a separate
>classifier although I don't actually want two hardware
>flow APIs.

Understood.

>
>I see you dropped the RFC tag are you proposing we include
>this now?

v11 is my bet :)

>
>.John
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Jiri Pirko
Sat, Sep 20, 2014 at 12:18:02AM CEST, j...@mojatatu.com wrote:
>On 09/19/14 18:12, John Fastabend wrote:
>>On 09/19/2014 10:57 AM, Jamal Hadi Salim wrote:
>>>On 09/19/14 11:49, Jiri Pirko wrote:
Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com wrote:
>>>
>Is this just a temporary test tool? Otherwise i dont see reason
>for its existence (or the API that it feeds on).

Please read the conversation I had with Pravin and Jesse in v1 thread.
Long story short they like to have the api separated from ovs datapath
so ovs daemon can use it to directly communicate with driver. Also John
Fastabend requested a way to work with driver flows without using ovs ->
that was the original reason I created switchdev genl api.

Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
will use directly switchdev genl api.

I hope I cleared this out.

>>>
>>>It is - thanks Jiri.
>>>
>>>cheers,
>>>jamal
>>
>>Hi Jiri,
>>
>>I was considering a slightly different approach where the
>>device would report via netlink the fields/actions it
>>supported rather than creating pre-defined enums for every
>>possible key.
>>
>>I already need to have an API to report fields/matches
>>that are being supported why not have the device report
>>the headers as header fields (len, offset) and the
>>associated parse graph the hardware uses? Vendors should
>>have this already to describe/design their real hardware.
>>
>>As always its better to have code and when I get some
>>time I'll try to write it up. Maybe its just a separate
>>classifier although I don't actually want two hardware
>>flow APIs.
>>
>>I see you dropped the RFC tag are you proposing we include
>>this now?
>>
>
>
>Actually I just realized i missed something very basic that
>Jiri said. I think i understand the tool being there for testing
>but i am assumed the same about the genlink api.
>Jiri, are you saying that genlink api is there to
>stay?

Yes, that I say. It is needed for flow manipulation, because such api does
not exist. As I stated earlier, I do not want to use switchdev genl for
anything other than flow manipulation.

>
>cheers,
>jamal
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Jiri Pirko
Sat, Sep 20, 2014 at 07:39:51AM CEST, f.faine...@gmail.com wrote:
>On 09/19/14 15:18, Jamal Hadi Salim wrote:
>>On 09/19/14 18:12, John Fastabend wrote:
>>>On 09/19/2014 10:57 AM, Jamal Hadi Salim wrote:
On 09/19/14 11:49, Jiri Pirko wrote:
>Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com wrote:

>>Is this just a temporary test tool? Otherwise i dont see reason
>>for its existence (or the API that it feeds on).
>
>Please read the conversation I had with Pravin and Jesse in v1 thread.
>Long story short they like to have the api separated from ovs datapath
>so ovs daemon can use it to directly communicate with driver. Also John
>Fastabend requested a way to work with driver flows without using
>ovs ->
>that was the original reason I created switchdev genl api.
>
>Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
>will use directly switchdev genl api.
>
>I hope I cleared this out.
>

It is - thanks Jiri.

cheers,
jamal
>>>
>>>Hi Jiri,
>>>
>>>I was considering a slightly different approach where the
>>>device would report via netlink the fields/actions it
>>>supported rather than creating pre-defined enums for every
>>>possible key.
>>>
>>>I already need to have an API to report fields/matches
>>>that are being supported why not have the device report
>>>the headers as header fields (len, offset) and the
>>>associated parse graph the hardware uses? Vendors should
>>>have this already to describe/design their real hardware.
>>>
>>>As always its better to have code and when I get some
>>>time I'll try to write it up. Maybe its just a separate
>>>classifier although I don't actually want two hardware
>>>flow APIs.
>>>
>>>I see you dropped the RFC tag are you proposing we include
>>>this now?
>>>
>>
>>
>>Actually I just realized i missed something very basic that
>>Jiri said. I think i understand the tool being there for testing
>>but i am assumed the same about the genlink api.
>>Jiri, are you saying that genlink api is there to
>>stay?
>
>So, I really have mixed feelings about this netlink API, in particular
>because it is not clear to me where is the line between what should be a
>network device ndo operation, what should be an ethtool command, what should
>be a netlink message, and the rest.

Well as I said, this api should serve for flow manipulation only,
therefore swdev flow related ndos are used.


>
>I can certainly acknowledge the fact that manipulating flows is not ideal
>with the current set of tools, but really once we are there with netlink, how
>far are we from not having any network devices at all, and how does that
>differ from OpenWrt's swconfig in the end [1]?


I'm all ears on proposals how to make flow manipulation better.


>
>[1]: https://lwn.net/Articles/571390/
>--
>Florian
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Jamal Hadi Salim

On 09/20/14 04:17, Jiri Pirko wrote:


Yes, that I say. It is needed for flow manipulation, because such api does
not exist.


Come on Jiri!
The ovs guys are against this and now no *api exists*?
Write a 15 tuple classifier tc classifier and use it. I will be more
than happy to help you. I will get to it when we have basics L2 working
on real devices.


As I stated earlier, I do not want to use switchdev genl for
anything other than flow manipulation.



Totally unacceptable in my books. If the OVS guys want some way out
to be able to ride on some vendor sdks then that is their problem.
We shouldnt allow for such loopholes. This is why/how TOE never made it
in the kernel.

cheers,
jamal
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Jamal Hadi Salim

On 09/20/14 04:10, Scott Feldman wrote:


On Sep 19, 2014, at 8:41 PM, Roopa Prabhu  wrote:





Existing rtnetlink isn’t available to swdev without some kind of snooping the 
echoes from the various kernel components
(bridge, fib, etc).


You have made this claim before Scott and I am still not following.
Why do we need to echo things to get FDB or FIB to work? device ops for 
FDB offload for example already exist. I think they need to be

revamped, but that consensus can be reasonably reached. Why do we
need this flow api for such activities?

cheers,
jamal

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Thomas Graf
On 09/20/14 at 10:14am, Jiri Pirko wrote:
> Sat, Sep 20, 2014 at 12:12:12AM CEST, john.r.fastab...@intel.com wrote:
> >I was considering a slightly different approach where the
> >device would report via netlink the fields/actions it
> >supported rather than creating pre-defined enums for every
> >possible key.
> >
> >I already need to have an API to report fields/matches
> >that are being supported why not have the device report
> >the headers as header fields (len, offset) and the
> >associated parse graph the hardware uses? Vendors should
> >have this already to describe/design their real hardware.
> 
> Hmm, let me think about this a bit more. I will try to figure out how to
> handle that. Sound logic though. Will try to incorporate the idea in the
> patchset.

I think this is the right track.

I agree with Jamal that there is no need for a new permanent and
separate Netlink interface for this. I think this would best be described
as a structure of nested Netlink attributes in the form John proposes
which is then embedded into existing Netlink interfaces such as rtnetlink
and OVS genl.

OVS can register new genl ops to check capabilities and insert
hardware flows which allows implementation of the offload decision in
user space and allows for arbitary combination of hardware and software
flows. It also allows to run a eBPF software data path in combination
with a hardware flow setup.

rtnetlink can embed the nested attribute structure into existing APIs
to allow feature capability detection from user space, statistic
reporting and optional direct hardware offload if a transaprent
offload is not feasible. Would that work for you John?

I think we should focus on getting the layering right and make it
generic enough so we allow evolving naturally.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Thomas Graf
On 09/20/14 at 06:19am, Jamal Hadi Salim wrote:
> The ovs guys are against this and now no *api exists*?
> Write a 15 tuple classifier tc classifier and use it. I will be more
> than happy to help you. I will get to it when we have basics L2 working
> on real devices.

Nothing speaks against having such a tc classifier. In fact, having
the interface consist of only an embedded Netlink attribute structure
would allow for such a classifier in a very straight forward way.

That doesn't mean everybody should be forced to use the stateful
tc interface.

> Totally unacceptable in my books. If the OVS guys want some way out
> to be able to ride on some vendor sdks then that is their problem.
> We shouldnt allow for such loopholes. This is why/how TOE never made it
> in the kernel.

No need for false accusations here. Nobody ever mentioned vendor SDKs.

The statement was that the requirement of deriving hardware flows from
software flows *in the kernel* is not flexible enough for the future
for reasons such as:

1) The OVS software data path might be based on eBPF in the future and
   it is unclear how we could derive hardware flows from that
   transparently.

2) Depending on hardware capabilities. Hardware flows might need to be
   assisted by software flow counterparts and it is believed that it
   is the wrong approach to push all the necessary context for the
   decision down into the kernel. This can be argued about and I don't
   feel strongly either way.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Jamal Hadi Salim

On 09/20/14 07:01, Thomas Graf wrote:


Nothing speaks against having such a tc classifier. In fact, having
the interface consist of only an embedded Netlink attribute structure
would allow for such a classifier in a very straight forward way.

That doesn't mean everybody should be forced to use the stateful
tc interface.




Agreed. The response was to Jiri's strange statement that now that
he cant use OVS, there is no such api. I point to tc as very capable of
such usage.


No need for false accusations here. Nobody ever mentioned vendor SDKs.



I am sorry to have tied the two together. Maybe not OVS but the approach
described is heaven for vendor SDKs.


The statement was that the requirement of deriving hardware flows from
software flows *in the kernel* is not flexible enough for the future
for reasons such as:

1) The OVS software data path might be based on eBPF in the future and
it is unclear how we could derive hardware flows from that
transparently.



Who says you cant put BPF in hardware?
And why is OVS defining how BPF should evolve or how it should be used?


2) Depending on hardware capabilities. Hardware flows might need to be
assisted by software flow counterparts and it is believed that it
is the wrong approach to push all the necessary context for the
decision down into the kernel. This can be argued about and I don't
feel strongly either way.



Pointing to the current FDB offload: You can select to bypass
and not use s/ware.

cheers,
jamal
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Thomas Graf
On 09/20/14 at 07:32am, Jamal Hadi Salim wrote:
> I am sorry to have tied the two together. Maybe not OVS but the approach
> described is heaven for vendor SDKs.

I fail to see the connection. You can use switch vendor SDK no matter
how we define the kernel APIs. They already exist and have been
designed in a way to be completely indepenedent from the kernel.

Are you referring to vendor specific decisions in user space in
general? I believe that the whole point of swdev is to provide *that*
level of abstraction so decisions can be made in a vendor neutral way.

> >The statement was that the requirement of deriving hardware flows from
> >software flows *in the kernel* is not flexible enough for the future
> >for reasons such as:
> >
> >1) The OVS software data path might be based on eBPF in the future and
> >it is unclear how we could derive hardware flows from that
> >transparently.
> >
> 
> Who says you cant put BPF in hardware?

I don't think anybody is saying that. P4 is likely a reality soon. But
we definitely want hardware offload in a BPF world even if the hardware
can't do BPF yet.

> And why is OVS defining how BPF should evolve or how it should be used?

Not sure I understand. OVS would be a user of eBPF just like tracing,
xt_BPF, socket filter, ...

> >2) Depending on hardware capabilities. Hardware flows might need to be
> >assisted by software flow counterparts and it is believed that it
> >is the wrong approach to push all the necessary context for the
> >decision down into the kernel. This can be argued about and I don't
> >feel strongly either way.
> >
> 
> Pointing to the current FDB offload: You can select to bypass
> and not use s/ware.

As I said, this can be argued about. It would require to push a lot of
context into the kernel though. The FDB offload is relatively trivial
in comparison to the complexity OVS user space can handle. I can't think
of any reasons why to complicate the kernel further with OVS specific
knowledge as long as we can guarantee the vendor abstraction.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Jamal Hadi Salim

On 09/20/14 07:51, Thomas Graf wrote:


I fail to see the connection. You can use switch vendor SDK no matter
how we define the kernel APIs. They already exist and have been
designed in a way to be completely indepenedent from the kernel.

Are you referring to vendor specific decisions in user space in
general? I believe that the whole point of swdev is to provide *that*
level of abstraction so decisions can be made in a vendor neutral way.



I am not against the swdev idea. I think we have disagreements
for the general classification/action interface how that should look
like - but that is resolvable with correct interfaces.
The vendor neutral way *already exists* via current netlink
abstractions that existing tools use. When we need to add new
interfaces then we should.


I don't think anybody is saying that. P4 is likely a reality soon. But
we definitely want hardware offload in a BPF world even if the hardware
can't do BPF yet.




I dont think we have contradictions. We are speaking past each other.
You implied that in the future OVS s/w path might be based on BPF.
I implied BPF itself could be offloaded and stands on its own merit
and should work if we have the correct interface. As an example,
I dont care about P4 or OVS - but i have no problem if they use
the common interfaces provided by Linux. i.e
If i want to build  a little cpu running the BPF instruction set
and use that as my offload then that interface should work and if
it doesnt i should provide extensions.


Not sure I understand. OVS would be a user of eBPF just like tracing,
xt_BPF, socket filter, ...



Ok, we are on the same page then.


As I said, this can be argued about. It would require to push a lot of
context into the kernel though. The FDB offload is relatively trivial
in comparison to the complexity OVS user space can handle. I can't think
of any reasons why to complicate the kernel further with OVS specific
knowledge as long as we can guarantee the vendor abstraction.



I disagree. OVS maybe complex in that sense (I am sorry i am making
an assumption based on what you are saying) but i dont think there is
any other kernel subsystem that has this challenge.
Note: i am pointing to fdb only because it carries the concept of "put
this in hardware and/or software". I agree the fdb maybe reasonably
simpler.

cheers,
jamal
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Roopa Prabhu

On 9/20/14, 1:09 AM, Jiri Pirko wrote:

Sat, Sep 20, 2014 at 05:41:16AM CEST, ro...@cumulusnetworks.com wrote:

On 9/19/14, 8:49 AM, Jiri Pirko wrote:

Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com wrote:

On 09/19/14 09:49, Jiri Pirko wrote:

This patch exposes switchdev API using generic Netlink.
Example userspace utility is here:
https://github.com/jpirko/switchdev


Is this just a temporary test tool? Otherwise i dont see reason
for its existence (or the API that it feeds on).

Please read the conversation I had with Pravin and Jesse in v1 thread.
Long story short they like to have the api separated from ovs datapath
so ovs daemon can use it to directly communicate with driver. Also John
Fastabend requested a way to work with driver flows without using ovs ->
that was the original reason I created switchdev genl api.

Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
will use directly switchdev genl api.

I hope I cleared this out.

We already have all the needed rtnetlink kernel api and userspace tools
around it to support all
switching asic features. ie, the rtnetlink api is the switchdev api. We can
do l2, l3, acl's with it.
Its unclear to me why we need another new netlink api. Which will mean none
of the existing tools to
create bridges etc will work on a switchdev.

No one is proposing such API. Note that what I'm trying to solve in my
patchset is FLOW world. There is only one API there, ovs genl. But the
usage of that for hw offload purposes was nacked by ovs maintainer. Plus
couple of people wanted to run the offloading independently on ovs
instance. Therefore I introduced the switchdev genl, which takes care of
that. No plan to extend it for other things you mentioned, just flows.

ok, That was not clear to me. Introducing a new genl api and calling it the
switchd dev api can result it non-flow creep into it in the future.




Which seems like going in the direction exactly opposite to what we had
discussed earlier.

Nope. The previous discussion ignored flows.

If a non-ovs flow interface is needed from userspace, we can extend the
existing interface to include flows.

How? You mean to extend rtnetlink? What advantage it would bring
comparing to separate genl iface?
yes. Advantage would be that we dont have yet another parallel switchdev 
netlink api.




I don't understand why we should replace the existing rtnetlink switchdev api
to accommodate flows.

Sorry, I do not undertand what "existing rtnetlink switchdev api" you
have on mind. Would you care to explain?


I am taking about existing rtnetlink api that bridge, ip link uses to 
talk l2 and l3 to the kernel.

RTM_NEWROUTE etc.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Roopa Prabhu

On 9/20/14, 1:10 AM, Scott Feldman wrote:

On Sep 19, 2014, at 8:41 PM, Roopa Prabhu  wrote:


On 9/19/14, 8:49 AM, Jiri Pirko wrote:

Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com wrote:

On 09/19/14 09:49, Jiri Pirko wrote:

This patch exposes switchdev API using generic Netlink.
Example userspace utility is here:
https://github.com/jpirko/switchdev


Is this just a temporary test tool? Otherwise i dont see reason
for its existence (or the API that it feeds on).

Please read the conversation I had with Pravin and Jesse in v1 thread.
Long story short they like to have the api separated from ovs datapath
so ovs daemon can use it to directly communicate with driver. Also John
Fastabend requested a way to work with driver flows without using ovs ->
that was the original reason I created switchdev genl api.

Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
will use directly switchdev genl api.

I hope I cleared this out.

We already have all the needed rtnetlink kernel api and userspace tools around 
it to support all
switching asic features. ie, the rtnetlink api is the switchdev api. We can do 
l2, l3, acl's with it.
Its unclear to me why we need another new netlink api. Which will mean none of 
the existing tools to
create bridges etc will work on a switchdev.
Which seems like going in the direction exactly opposite to what we had 
discussed earlier.

Existing rtnetlink isn’t available to swdev without some kind of snooping the 
echoes from the various kernel components (bridge, fib, etc).  With swdev_flow, 
as Jiri has defined it, there is an additional conversion needed to bridge the 
gap (bad expression, I know) between rtnetlink and swdev_flow.  This conversion 
happens in the kernel components.  For example, the bridge module, still driven 
from userspace by existing rtnetlink, will formulate the necessary swdev_flow 
insert/remove calls to the swdev driver such that HW will offload the fwd path.

You have:
 user -> rtnetlink -> kernel -> netlink echo -> [some process] -> [some 
driver] -> HW

Jiri has:
 user -> rtnetlink -> kernel -> swdev_* -> swdev driver -> HW


Keeping the goal to not change or not add a new userspace API in mind,

I have :
user -> rtnetlink -> kernel -> ndo_op -> swdev driver  -> HW

Jiri has:
user -> genl (newapi) -> kernel -> swdev_* -> swdev driver -> HW






___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] Contribute Auto-Attach SPBm support to Open vSwitch

2014-09-20 Thread Ben Pfaff
On Fri, Sep 19, 2014 at 04:11:51PM +, Flynn, Dennis R (Dennis) wrote:
> This is our first engagement with the OVS team. We have familiarized
> ourselves with the upstream procedures as defined within the file
> .../CONTRIBUTING and are working to ensure conformance to OVS coding
> standards. We have viewed various videos and tutorials describing the
> OVS development processes and best practices. We are not sanctioned
> OVS source code committers.

Usually the best way to contribute code is to post patches.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Scott Feldman

On Sep 20, 2014, at 5:51 AM, Roopa Prabhu  wrote:

> On 9/20/14, 1:10 AM, Scott Feldman wrote:
>> On Sep 19, 2014, at 8:41 PM, Roopa Prabhu 
>>  wrote:
>> 
>> 
>>> On 9/19/14, 8:49 AM, Jiri Pirko wrote:
>>> 
 Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com
  wrote:
 
> On 09/19/14 09:49, Jiri Pirko wrote:
> 
>> This patch exposes switchdev API using generic Netlink.
>> Example userspace utility is here:
>> 
>> https://github.com/jpirko/switchdev
>> 
>> 
>> 
> Is this just a temporary test tool? Otherwise i dont see reason
> for its existence (or the API that it feeds on).
> 
 Please read the conversation I had with Pravin and Jesse in v1 thread.
 Long story short they like to have the api separated from ovs datapath
 so ovs daemon can use it to directly communicate with driver. Also John
 Fastabend requested a way to work with driver flows without using ovs ->
 that was the original reason I created switchdev genl api.
 
 Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
 will use directly switchdev genl api.
 
 I hope I cleared this out.
 
>>> We already have all the needed rtnetlink kernel api and userspace tools 
>>> around it to support all
>>> switching asic features. ie, the rtnetlink api is the switchdev api. We can 
>>> do l2, l3, acl's with it.
>>> Its unclear to me why we need another new netlink api. Which will mean none 
>>> of the existing tools to
>>> create bridges etc will work on a switchdev.
>>> Which seems like going in the direction exactly opposite to what we had 
>>> discussed earlier.
>>> 
>> Existing rtnetlink isn’t available to swdev without some kind of snooping 
>> the echoes from the various kernel components (bridge, fib, etc).  With 
>> swdev_flow, as Jiri has defined it, there is an additional conversion needed 
>> to bridge the gap (bad expression, I know) between rtnetlink and swdev_flow. 
>>  This conversion happens in the kernel components.  For example, the bridge 
>> module, still driven from userspace by existing rtnetlink, will formulate 
>> the necessary swdev_flow insert/remove calls to the swdev driver such that 
>> HW will offload the fwd path.
>> 
>> You have:
>> user -> rtnetlink -> kernel -> netlink echo -> [some process] -> [some 
>> driver] -> HW
>> 
>> 
>> Jiri has:
>> user -> rtnetlink -> kernel -> swdev_* -> swdev driver -> HW
>> 
>> 
> Keeping the goal to not change or not add a new userspace API in mind,
> I have :
> user -> rtnetlink -> kernel -> ndo_op -> swdev driver  -> HW
> 

Then you have the same as Jiri, for the traditional L2/L3 case.

> Jiri has:
> user -> genl (newapi) -> kernel -> swdev_* -> swdev driver -> HW

Jiri’s genl is for userspace apps that are talking rtnetlink, like OVS.  It’s 
not a substitute for rtnetlink, it’s an alternative.  The complete picture is:

user -> swdev genl -
\
 \
  ---> kernel -> ndo_swdev_* -> swdev driver -> HW
 /
/
user -> rtnetlink --


-scott



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Jiri Pirko
Sat, Sep 20, 2014 at 07:21:10PM CEST, sfel...@cumulusnetworks.com wrote:
>
>On Sep 20, 2014, at 5:51 AM, Roopa Prabhu  wrote:
>
>> On 9/20/14, 1:10 AM, Scott Feldman wrote:
>>> On Sep 19, 2014, at 8:41 PM, Roopa Prabhu 
>>>  wrote:
>>> 
>>> 
 On 9/19/14, 8:49 AM, Jiri Pirko wrote:
 
> Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com
>  wrote:
> 
>> On 09/19/14 09:49, Jiri Pirko wrote:
>> 
>>> This patch exposes switchdev API using generic Netlink.
>>> Example userspace utility is here:
>>> 
>>> https://github.com/jpirko/switchdev
>>> 
>>> 
>>> 
>> Is this just a temporary test tool? Otherwise i dont see reason
>> for its existence (or the API that it feeds on).
>> 
> Please read the conversation I had with Pravin and Jesse in v1 thread.
> Long story short they like to have the api separated from ovs datapath
> so ovs daemon can use it to directly communicate with driver. Also John
> Fastabend requested a way to work with driver flows without using ovs ->
> that was the original reason I created switchdev genl api.
> 
> Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
> will use directly switchdev genl api.
> 
> I hope I cleared this out.
> 
 We already have all the needed rtnetlink kernel api and userspace tools 
 around it to support all
 switching asic features. ie, the rtnetlink api is the switchdev api. We 
 can do l2, l3, acl's with it.
 Its unclear to me why we need another new netlink api. Which will mean 
 none of the existing tools to
 create bridges etc will work on a switchdev.
 Which seems like going in the direction exactly opposite to what we had 
 discussed earlier.
 
>>> Existing rtnetlink isn’t available to swdev without some kind of snooping 
>>> the echoes from the various kernel components (bridge, fib, etc).  With 
>>> swdev_flow, as Jiri has defined it, there is an additional conversion 
>>> needed to bridge the gap (bad expression, I know) between rtnetlink and 
>>> swdev_flow.  This conversion happens in the kernel components.  For 
>>> example, the bridge module, still driven from userspace by existing 
>>> rtnetlink, will formulate the necessary swdev_flow insert/remove calls to 
>>> the swdev driver such that HW will offload the fwd path.
>>> 
>>> You have:
>>> user -> rtnetlink -> kernel -> netlink echo -> [some process] -> [some 
>>> driver] -> HW
>>> 
>>> 
>>> Jiri has:
>>> user -> rtnetlink -> kernel -> swdev_* -> swdev driver -> HW
>>> 
>>> 
>> Keeping the goal to not change or not add a new userspace API in mind,
>> I have :
>> user -> rtnetlink -> kernel -> ndo_op -> swdev driver  -> HW
>> 
>
>Then you have the same as Jiri, for the traditional L2/L3 case.
>
>> Jiri has:
>> user -> genl (newapi) -> kernel -> swdev_* -> swdev driver -> HW
>
>Jiri’s genl is for userspace apps that are talking rtnetlink, like OVS.  It’s 
>not a substitute for rtnetlink, it’s an alternative.  The complete picture is:

Not an alternative, an addition.

>
>user -> swdev genl -
>\
> \
>  ---> kernel -> ndo_swdev_* -> swdev driver -> HW
> /
>/
>user -> rtnetlink --

True is that, as Thomas pointed out, we can probably nest this into
rtnl_link messages. That might work.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] RFC: A plan for stats.

2014-09-20 Thread Ethan Jackson
I'm doing some performance testing which heavily stresses the OVS DPDK slow
path, and notice we spend a ridiculous amount of CPU time on the rule_dpif stats
mutex.  For my test case, by commenting out the mutex I literally doubled the
performance.  I'm in a bit of a rush, so as a short term hack, I simply copied
over the per-thread stats code which dpif-netdev uses, but it got me thinking of
a general plan for how we should handle stats in userspace which I'll outline
here:

To start, I think the ovsthread_stats construction needs to be pulled into it's
own library and made higher level.  Right now it's a general mechanism for
storing arbitrary per thread data.  I think we really want something which
implements an API something like "get_stats", "credit_stats", "clear_stats",
etc.  That could be used directly by both ofproto-dpif and dpif-netdev very
cleanly.  You'll see in the hacked together patch I've included below, it's
basically a direct copy of all the relevant dpif-netdev code.

That done, I think we can actually get rid of the flow stats mutex by using a
technique similar to what we do for cmaps.  We're guaranteed that only one
writer ever touches a particular bucket, so it should be sufficient to ensure
all of the data fits on a particular cache line, and that we update a sequence
number to notify readers of changes.

Thoughts?

Ethan


---
 ofproto/ofproto-dpif.c | 125 -
 1 file changed, 83 insertions(+), 42 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 6a59098..e9f4f16 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -25,10 +25,10 @@
 #include "bond.h"
 #include "bundle.h"
 #include "byte-order.h"
+#include "cfm.h"
 #include "connectivity.h"
 #include "connmgr.h"
 #include "coverage.h"
-#include "cfm.h"
 #include "dpif.h"
 #include "dynamic-string.h"
 #include "fail-open.h"
@@ -44,13 +44,13 @@
 #include "netdev.h"
 #include "netlink.h"
 #include "nx-match.h"
-#include "odp-util.h"
 #include "odp-execute.h"
-#include "ofp-util.h"
-#include "ofpbuf.h"
+#include "odp-util.h"
 #include "ofp-actions.h"
 #include "ofp-parse.h"
 #include "ofp-print.h"
+#include "ofp-util.h"
+#include "ofpbuf.h"
 #include "ofproto-dpif-ipfix.h"
 #include "ofproto-dpif-mirror.h"
 #include "ofproto-dpif-monitor.h"
@@ -58,6 +58,7 @@
 #include "ofproto-dpif-sflow.h"
 #include "ofproto-dpif-upcall.h"
 #include "ofproto-dpif-xlate.h"
+#include "ovs-thread.h"
 #include "poll-loop.h"
 #include "seq.h"
 #include "simap.h"
@@ -76,6 +77,13 @@ COVERAGE_DEFINE(packet_in_overflow);
 
 struct flow_miss;
 
+struct rule_dpif_stats {
+struct ovs_mutex mutex;
+uint64_t n_packets;
+uint64_t n_bytes;
+long long int used;
+};
+
 struct rule_dpif {
 struct rule up;
 
@@ -83,8 +91,7 @@ struct rule_dpif {
  *
  *   - Do include packets and bytes from datapath flows which have not
  *   recently been processed by a revalidator. */
-struct ovs_mutex stats_mutex;
-struct dpif_flow_stats stats OVS_GUARDED;
+struct ovsthread_stats stats;
 
 /* If non-zero then the recirculation id that has
  * been allocated for use with this rule.
@@ -96,8 +103,7 @@ struct rule_dpif {
 /* RULE_CAST() depends on this. */
 BUILD_ASSERT_DECL(offsetof(struct rule_dpif, up) == 0);
 
-static void rule_get_stats(struct rule *, uint64_t *packets, uint64_t *bytes,
-   long long int *used);
+static void rule_get_stats(struct rule_dpif *, struct dpif_flow_stats *);
 static struct rule_dpif *rule_dpif_cast(const struct rule *);
 static void rule_expire(struct rule_dpif *);
 
@@ -3425,13 +3431,10 @@ rule_expire(struct rule_dpif *rule)
 }
 
 if (reason < 0 && idle_timeout) {
-long long int used;
+struct dpif_flow_stats stats;
 
-ovs_mutex_lock(&rule->stats_mutex);
-used = rule->stats.used;
-ovs_mutex_unlock(&rule->stats_mutex);
-
-if (now > used + idle_timeout * 1000) {
+rule_get_stats(rule, &stats);
+if (now > stats.used + idle_timeout * 1000) {
 reason = OFPRR_IDLE_TIMEOUT;
 }
 }
@@ -3493,15 +3496,26 @@ ofproto_dpif_execute_actions(struct ofproto_dpif 
*ofproto,
 return error;
 }
 
+static void *
+rule_dpif_stats_new_cb(void)
+{
+struct rule_dpif_stats *bucket = xzalloc_cacheline(sizeof *bucket);
+ovs_mutex_init(&bucket->mutex);
+return bucket;
+}
+
 void
 rule_dpif_credit_stats(struct rule_dpif *rule,
const struct dpif_flow_stats *stats)
 {
-ovs_mutex_lock(&rule->stats_mutex);
-rule->stats.n_packets += stats->n_packets;
-rule->stats.n_bytes += stats->n_bytes;
-rule->stats.used = MAX(rule->stats.used, stats->used);
-ovs_mutex_unlock(&rule->stats_mutex);
+struct rule_dpif_stats *bucket;
+
+bucket = ovsthread_stats_bucket_get(&rule->stats, rule_dpif_stats_new_cb);
+ovs_mutex_lock(&bucket->mutex);
+bucket->n_packets += stats->n_packets

[ovs-dev] [RFC 1/2] u64-stats-lock: Introduce u64_stats_lock

2014-09-20 Thread Daniele Di Proietto
Sometimes we need to read and write 64-bit values (statistics) from multiple
threads. On 64-bit architectures it si possible to read 64-bit data while it is
being written without worrying about consistency.
On 32-bit architectures have to use some form of synchronization to make sure
that we are reading a consistent 64-bit value.

Therefore, 'u64_stats_lock' is introduced, which maps to a noop for 64-bits
architecture and to a mutex for other architetures.

Signed-off-by: Daniele Di Proietto 
---
 lib/automake.mk  |  1 +
 lib/u64-stats-lock.h | 65 
 2 files changed, 66 insertions(+)
 create mode 100644 lib/u64-stats-lock.h

diff --git a/lib/automake.mk b/lib/automake.mk
index b83cceb..7f265ef 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -234,6 +234,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/token-bucket.c \
lib/token-bucket.h \
lib/type-props.h \
+   lib/u64-stats-lock.h \
lib/unaligned.h \
lib/unicode.c \
lib/unicode.h \
diff --git a/lib/u64-stats-lock.h b/lib/u64-stats-lock.h
new file mode 100644
index 000..0db40a5
--- /dev/null
+++ b/lib/u64-stats-lock.h
@@ -0,0 +1,65 @@
+/*
+ * Copyright (c) 2014 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef U64_STATS_LOCK_H
+#define U64_STATS_LOCK_H 1
+
+#include "ovs-thread.h"
+
+#ifdef  __cplusplus
+extern "C" {
+#endif
+
+/* This type of lock should be used for accessing atomically 64-bit values from
+ * multiples threads. It maps to noop when compiling for 64-bit architectures,
+ * otherwise it is a mutex. The basic idea is similar to linux/u64_stats_sync.h
+ */
+
+#if !__GNUC__ || !defined(__x86_64__)
+#define U64_STATS_LOCK_NEEDS_MUTEX
+#endif
+
+struct OVS_LOCKABLE u64_stats_lock {
+#ifdef U64_STATS_LOCK_NEEDS_MUTEX
+struct ovs_mutex mutex;
+#endif
+};
+
+static inline void
+u64_stats_lock_acquire(struct u64_stats_lock *l OVS_UNUSED)
+OVS_ACQUIRES(l)
+OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+#ifdef U64_STATS_LOCK_NEEDS_MUTEX
+ovs_mutex_lock(&l->mutex);
+#endif
+}
+
+static inline void
+u64_stats_lock_release(struct u64_stats_lock *l OVS_UNUSED)
+OVS_RELEASES(l)
+OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+#ifdef U64_STATS_LOCK_NEEDS_MUTEX
+ovs_mutex_unlock(&l->mutex);
+#endif
+}
+
+#ifdef  __cplusplus
+}
+#endif
+
+#endif /* u64-stats-lock.h */
-- 
2.1.0.rc1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [RFC 2/2] dpif-netdev: Use per pmdthread datapath statistics

2014-09-20 Thread Daniele Di Proietto
Signed-off-by: Daniele Di Proietto 
---
 lib/dpif-netdev.c | 70 ++-
 1 file changed, 23 insertions(+), 47 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 90fe01c..e5f2aa0 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -62,6 +62,7 @@
 #include "shash.h"
 #include "sset.h"
 #include "timeval.h"
+#include "u64-stats-lock.h"
 #include "unixctl.h"
 #include "util.h"
 #include "vlog.h"
@@ -178,11 +179,6 @@ struct dp_netdev {
 struct classifier cls;
 struct cmap flow_table OVS_GUARDED; /* Flow table. */
 
-/* Statistics.
- *
- * ovsthread_stats is internally synchronized. */
-struct ovsthread_stats stats; /* Contains 'struct dp_netdev_stats *'. */
-
 /* Ports.
  *
  * Protected by RCU.  Take the mutex to add or remove ports. */
@@ -225,9 +221,9 @@ enum dp_stat_type {
 
 /* Contained by struct dp_netdev's 'stats' member.  */
 struct dp_netdev_stats {
-struct ovs_mutex mutex;  /* Protects 'n'. */
+struct u64_stats_lock lock;
 
-/* Indexed by DP_STAT_*, protected by 'mutex'. */
+/* Indexed by DP_STAT_*. */
 unsigned long long int n[DP_N_STATS] OVS_GUARDED;
 };
 
@@ -357,6 +353,7 @@ struct dp_netdev_pmd_thread {
  * need to be protected (e.g. by 'dp_netdev_mutex').  All other
  * instances will only be accessed by its own pmd thread. */
 struct emc_cache flow_cache;
+struct dp_netdev_stats stats;   /* Per thread datapath statistics */
 struct latch exit_latch;/* For terminating the pmd thread. */
 atomic_uint change_seq; /* For reloading pmd ports. */
 pthread_t thread;
@@ -560,8 +557,6 @@ create_dp_netdev(const char *name, const struct dpif_class 
*class,
 classifier_init(&dp->cls, NULL);
 cmap_init(&dp->flow_table);
 
-ovsthread_stats_init(&dp->stats);
-
 ovs_mutex_init(&dp->port_mutex);
 cmap_init(&dp->ports);
 dp->port_seq = seq_create();
@@ -625,8 +620,6 @@ dp_netdev_free(struct dp_netdev *dp)
 OVS_REQUIRES(dp_netdev_mutex)
 {
 struct dp_netdev_port *port;
-struct dp_netdev_stats *bucket;
-int i;
 
 shash_find_and_delete(&dp_netdevs, dp->name);
 
@@ -641,12 +634,6 @@ dp_netdev_free(struct dp_netdev *dp)
 }
 ovs_mutex_unlock(&dp->port_mutex);
 
-OVSTHREAD_STATS_FOR_EACH_BUCKET (bucket, i, &dp->stats) {
-ovs_mutex_destroy(&bucket->mutex);
-free_cacheline(bucket);
-}
-ovsthread_stats_destroy(&dp->stats);
-
 classifier_destroy(&dp->cls);
 cmap_destroy(&dp->flow_table);
 ovs_mutex_destroy(&dp->flow_mutex);
@@ -701,18 +688,18 @@ static int
 dpif_netdev_get_stats(const struct dpif *dpif, struct dpif_dp_stats *stats)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
-struct dp_netdev_stats *bucket;
-size_t i;
+struct dp_netdev_pmd_thread *pmd;
 
 stats->n_flows = cmap_count(&dp->flow_table);
 
 stats->n_hit = stats->n_missed = stats->n_lost = 0;
-OVSTHREAD_STATS_FOR_EACH_BUCKET (bucket, i, &dp->stats) {
-ovs_mutex_lock(&bucket->mutex);
-stats->n_hit += bucket->n[DP_STAT_HIT];
-stats->n_missed += bucket->n[DP_STAT_MISS];
-stats->n_lost += bucket->n[DP_STAT_LOST];
-ovs_mutex_unlock(&bucket->mutex);
+
+CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
+u64_stats_lock_acquire(&pmd->stats.lock);
+stats->n_hit += pmd->stats.n[DP_STAT_HIT];
+stats->n_missed += pmd->stats.n[DP_STAT_MISS];
+stats->n_lost += pmd->stats.n[DP_STAT_LOST];
+u64_stats_lock_release(&pmd->stats.lock);
 }
 stats->n_masks = UINT32_MAX;
 stats->n_mask_hit = UINT64_MAX;
@@ -2395,23 +2382,13 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow,
 ovs_mutex_unlock(&bucket->mutex);
 }
 
-static void *
-dp_netdev_stats_new_cb(void)
-{
-struct dp_netdev_stats *bucket = xzalloc_cacheline(sizeof *bucket);
-ovs_mutex_init(&bucket->mutex);
-return bucket;
-}
-
 static void
-dp_netdev_count_packet(struct dp_netdev *dp, enum dp_stat_type type, int cnt)
+dp_netdev_count_packet(struct dp_netdev_pmd_thread *pmd,
+   enum dp_stat_type type, int cnt)
 {
-struct dp_netdev_stats *bucket;
-
-bucket = ovsthread_stats_bucket_get(&dp->stats, dp_netdev_stats_new_cb);
-ovs_mutex_lock(&bucket->mutex);
-bucket->n[type] += cnt;
-ovs_mutex_unlock(&bucket->mutex);
+u64_stats_lock_acquire(&pmd->stats.lock);
+pmd->stats.n[type] += cnt;
+u64_stats_lock_release(&pmd->stats.lock);
 }
 
 static int
@@ -2422,10 +2399,6 @@ dp_netdev_upcall(struct dp_netdev *dp, struct 
dpif_packet *packet_,
 {
 struct ofpbuf *packet = &packet_->ofpbuf;
 
-if (type == DPIF_UC_MISS) {
-dp_netdev_count_packet(dp, DP_STAT_MISS, 1);
-}
-
 if (OVS_UNLIKELY(!dp->upcall_cb)) {
 return ENODEV;
 }
@@ -2517,7 +2490,7 @@ packet_batch_execute(struct packet_batch *batch,
 dp_netdev_execute_actions(pmd, batch->packets

Re: [ovs-dev] RFC: A plan for stats.

2014-09-20 Thread Daniele Di Proietto
Hi Ethan,

As you know I¹ve been working on similar problems for the userspace
datapath. Also, I¹m about to post a patch series to reduce the overhead of
counting (global, not per-flow) stats in dpif-netdev.c.
Here are some thoughts :

* I definitely think it¹s a good idea to use ovsthread_stats for
rule_dpif_stats, thanks for working on that.
* It makes sense to improve ovsthread_stats interface like you suggested.
I can spend some time on that, if you think it¹s worth.

* ovsthread_stats just uses an approximation of thread local storage.
Basically it is not convenient to use truly per-thread data because:
  * The availability of per thread keys might be limited. This, of course,
can be worked around, but I don¹t know about performance.
  * Most importantly, many implementation of thread local storage (e.g.
C11) do not allow cross-thread access (there¹s a big comment in
lib/ovs-thread.h).

* For the userspace datapath we can have per thread statistics, because we
stricly control the threads that have access to the datapath (pmd
pthreads, plus non pmd access are already protected by a mutex). My patch
should take care of datapath statistics, and once we have per pmd thread
classifier we will have per pmd thread flows that do not need statistics.

* Since ovsthread_stats doesn¹t use per thread data, I don¹t know if it
will be possible to remove the mutex and use the trick you described.
Maybe I can use that trick in my patch for 32-bit systems (let¹s discuss
this later)

What do you guys think?

Daniele

On 9/20/14, 11:00 AM, "Ethan Jackson"  wrote:

>I'm doing some performance testing which heavily stresses the OVS DPDK
>slow
>path, and notice we spend a ridiculous amount of CPU time on the
>rule_dpif stats
>mutex.  For my test case, by commenting out the mutex I literally doubled
>the
>performance.  I'm in a bit of a rush, so as a short term hack, I simply
>copied
>over the per-thread stats code which dpif-netdev uses, but it got me
>thinking of
>a general plan for how we should handle stats in userspace which I'll
>outline
>here:
>
>To start, I think the ovsthread_stats construction needs to be pulled
>into it's
>own library and made higher level.  Right now it's a general mechanism for
>storing arbitrary per thread data.  I think we really want something which
>implements an API something like "get_stats", "credit_stats",
>"clear_stats",
>etc.  That could be used directly by both ofproto-dpif and dpif-netdev
>very
>cleanly.  You'll see in the hacked together patch I've included below,
>it's
>basically a direct copy of all the relevant dpif-netdev code.
>
>That done, I think we can actually get rid of the flow stats mutex by
>using a
>technique similar to what we do for cmaps.  We're guaranteed that only one
>writer ever touches a particular bucket, so it should be sufficient to
>ensure
>all of the data fits on a particular cache line, and that we update a
>sequence
>number to notify readers of changes.
>
>Thoughts?
>
>Ethan

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: A plan for stats.

2014-09-20 Thread Ben Pfaff
On Sat, Sep 20, 2014 at 07:23:30PM +, Daniele Di Proietto wrote:
>   * Most importantly, many implementation of thread local storage (e.g.
> C11) do not allow cross-thread access (there?s a big comment in
> lib/ovs-thread.h).

That's what the standard says but I think that it is basically nonsense.
I don't think any implementation we care about prevents cross-thread
access.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC 1/2] u64-stats-lock: Introduce u64_stats_lock

2014-09-20 Thread Jarno Rajahalme

On Sep 20, 2014, at 12:12 PM, Daniele Di Proietto  
wrote:

> Sometimes we need to read and write 64-bit values (statistics) from multiple
> threads. On 64-bit architectures it si possible to read 64-bit data while it 
> is
> being written without worrying about consistency.
> On 32-bit architectures have to use some form of synchronization to make sure
> that we are reading a consistent 64-bit value.
> 
> Therefore, 'u64_stats_lock' is introduced, which maps to a noop for 64-bits
> architecture and to a mutex for other architectures.

Did you consider using atomic variable instead? In some 32-bit systems aligned 
64-bit accesses can be also consistent without locking, and there may be other, 
more efficient mechanisms for ensuring atomicity of single variable access than 
a mutex. In a single-writer scenario you would not use atomic_add, but a 
combination of an atomic_read, non-atomic addition, and atomic_store. Optimizer 
might even combine these to a single instruction?

  Jarno

> 

> Signed-off-by: Daniele Di Proietto 
> ---
> lib/automake.mk  |  1 +
> lib/u64-stats-lock.h | 65 
> 2 files changed, 66 insertions(+)
> create mode 100644 lib/u64-stats-lock.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index b83cceb..7f265ef 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -234,6 +234,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/token-bucket.c \
>   lib/token-bucket.h \
>   lib/type-props.h \
> + lib/u64-stats-lock.h \
>   lib/unaligned.h \
>   lib/unicode.c \
>   lib/unicode.h \
> diff --git a/lib/u64-stats-lock.h b/lib/u64-stats-lock.h
> new file mode 100644
> index 000..0db40a5
> --- /dev/null
> +++ b/lib/u64-stats-lock.h
> @@ -0,0 +1,65 @@
> +/*
> + * Copyright (c) 2014 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef U64_STATS_LOCK_H
> +#define U64_STATS_LOCK_H 1
> +
> +#include "ovs-thread.h"
> +
> +#ifdef  __cplusplus
> +extern "C" {
> +#endif
> +
> +/* This type of lock should be used for accessing atomically 64-bit values 
> from
> + * multiples threads. It maps to noop when compiling for 64-bit 
> architectures,
> + * otherwise it is a mutex. The basic idea is similar to 
> linux/u64_stats_sync.h
> + */
> +
> +#if !__GNUC__ || !defined(__x86_64__)
> +#define U64_STATS_LOCK_NEEDS_MUTEX
> +#endif
> +
> +struct OVS_LOCKABLE u64_stats_lock {
> +#ifdef U64_STATS_LOCK_NEEDS_MUTEX
> +struct ovs_mutex mutex;
> +#endif
> +};
> +
> +static inline void
> +u64_stats_lock_acquire(struct u64_stats_lock *l OVS_UNUSED)
> +OVS_ACQUIRES(l)
> +OVS_NO_THREAD_SAFETY_ANALYSIS
> +{
> +#ifdef U64_STATS_LOCK_NEEDS_MUTEX
> +ovs_mutex_lock(&l->mutex);
> +#endif
> +}
> +
> +static inline void
> +u64_stats_lock_release(struct u64_stats_lock *l OVS_UNUSED)
> +OVS_RELEASES(l)
> +OVS_NO_THREAD_SAFETY_ANALYSIS
> +{
> +#ifdef U64_STATS_LOCK_NEEDS_MUTEX
> +ovs_mutex_unlock(&l->mutex);
> +#endif
> +}
> +
> +#ifdef  __cplusplus
> +}
> +#endif
> +
> +#endif /* u64-stats-lock.h */
> -- 
> 2.1.0.rc1
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Alexei Starovoitov
On Sat, Sep 20, 2014 at 3:53 AM, Thomas Graf  wrote:
> On 09/20/14 at 10:14am, Jiri Pirko wrote:
>> Sat, Sep 20, 2014 at 12:12:12AM CEST, john.r.fastab...@intel.com wrote:
>> >I was considering a slightly different approach where the
>> >device would report via netlink the fields/actions it
>> >supported rather than creating pre-defined enums for every
>> >possible key.
>> >
>> >I already need to have an API to report fields/matches
>> >that are being supported why not have the device report
>> >the headers as header fields (len, offset) and the
>> >associated parse graph the hardware uses? Vendors should
>> >have this already to describe/design their real hardware.
>>
>> Hmm, let me think about this a bit more. I will try to figure out how to
>> handle that. Sound logic though. Will try to incorporate the idea in the
>> patchset.
>
> I think this is the right track.

I agree with John and Thomas here.
I think HW should not be limited by SW abstractions whether
these abstractions are called flows, n-tuples, bridge or else.
Really looking forward to see "device reporting the headers as
header fields (len, offset) and the associated parse graph"
as the first step.

Another topic that this discussion didn't cover yet is how this
all connects to tunnels and what is 'tunnel offloading'.
imo flow offloading by itself serves only academic interest.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] Returned mail: see transcript for details

2014-09-20 Thread Mail Administrator
Your message was not delivered due to the following reason:

Your message could not be delivered because the destination server was
unreachable within the allowed queue period. The amount of time
a message is queued before it is returned depends on local configura-
tion parameters.

Most likely there is a network problem that prevented delivery, but
it is also possible that the computer is turned off, or does not
have a mail system running right now.

Your message was not delivered within 1 days:
Host 190.228.62.134 is not responding.

The following recipients could not receive this message:


Please reply to postmas...@openvswitch.org
if you feel this message to be in error.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] RFC: A plan for stats.

2014-09-20 Thread Daniele Di Proietto


On 9/20/14, 12:31 PM, "Ben Pfaff"  wrote:

>On Sat, Sep 20, 2014 at 07:23:30PM +, Daniele Di Proietto wrote:
>>   * Most importantly, many implementation of thread local storage (e.g.
>> C11) do not allow cross-thread access (there?s a big comment in
>> lib/ovs-thread.h).
>
>That's what the standard says but I think that it is basically nonsense.
>I don't think any implementation we care about prevents cross-thread
>access.

If we can work around that, I definitely agree with Ethan¹s plan

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next v2 8/9] switchdev: introduce Netlink API

2014-09-20 Thread Roopa Prabhu

On 9/20/14, 10:38 AM, Jiri Pirko wrote:

Sat, Sep 20, 2014 at 07:21:10PM CEST, sfel...@cumulusnetworks.com wrote:

On Sep 20, 2014, at 5:51 AM, Roopa Prabhu  wrote:


On 9/20/14, 1:10 AM, Scott Feldman wrote:

On Sep 19, 2014, at 8:41 PM, Roopa Prabhu 
  wrote:



On 9/19/14, 8:49 AM, Jiri Pirko wrote:


Fri, Sep 19, 2014 at 05:25:48PM CEST, j...@mojatatu.com
  wrote:


On 09/19/14 09:49, Jiri Pirko wrote:


This patch exposes switchdev API using generic Netlink.
Example userspace utility is here:

https://github.com/jpirko/switchdev




Is this just a temporary test tool? Otherwise i dont see reason
for its existence (or the API that it feeds on).


Please read the conversation I had with Pravin and Jesse in v1 thread.
Long story short they like to have the api separated from ovs datapath
so ovs daemon can use it to directly communicate with driver. Also John
Fastabend requested a way to work with driver flows without using ovs ->
that was the original reason I created switchdev genl api.

Regarding the "sw" tool, yes it is for testing purposes now. ovs daemon
will use directly switchdev genl api.

I hope I cleared this out.


We already have all the needed rtnetlink kernel api and userspace tools around 
it to support all
switching asic features. ie, the rtnetlink api is the switchdev api. We can do 
l2, l3, acl's with it.
Its unclear to me why we need another new netlink api. Which will mean none of 
the existing tools to
create bridges etc will work on a switchdev.
Which seems like going in the direction exactly opposite to what we had 
discussed earlier.


Existing rtnetlink isn’t available to swdev without some kind of snooping the 
echoes from the various kernel components (bridge, fib, etc).  With swdev_flow, 
as Jiri has defined it, there is an additional conversion needed to bridge the 
gap (bad expression, I know) between rtnetlink and swdev_flow.  This conversion 
happens in the kernel components.  For example, the bridge module, still driven 
from userspace by existing rtnetlink, will formulate the necessary swdev_flow 
insert/remove calls to the swdev driver such that HW will offload the fwd path.

You have:
 user -> rtnetlink -> kernel -> netlink echo -> [some process] -> [some 
driver] -> HW


Jiri has:
 user -> rtnetlink -> kernel -> swdev_* -> swdev driver -> HW



Keeping the goal to not change or not add a new userspace API in mind,
I have :
 user -> rtnetlink -> kernel -> ndo_op -> swdev driver  -> HW


Then you have the same as Jiri, for the traditional L2/L3 case.


Jiri has:
 user -> genl (newapi) -> kernel -> swdev_* -> swdev driver -> HW

Jiri’s genl is for userspace apps that are talking rtnetlink, like OVS.  It’s 
not a substitute for rtnetlink, it’s an alternative.  The complete picture is:

Not an alternative, an addition.


user -> swdev genl -
\
 \
  ---> kernel -> ndo_swdev_* -> swdev driver -> HW
 /
/
user -> rtnetlink --

True is that, as Thomas pointed out, we can probably nest this into
rtnl_link messages. That might work.
That's the thing i was hinting as well. You can extend the existing 
infrastructure/api  instead of adding a new one.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] datapath: Remove pkt_key from OVS_CB.

2014-09-20 Thread Pravin Shelar
On Thu, Sep 18, 2014 at 11:31 PM, Andy Zhou  wrote:
> look good in general. a few small comments in line.
>
> Acked-by: Andy Zhou 
>
> On Wed, Sep 17, 2014 at 6:58 PM, Pravin B Shelar  wrote:
>> OVS keeps pointer to packet key in skb->cb, but the packet key is
>> store on stack. This could make code bit tricky. So it is better to
>> get rid of the pointer.
>>
>> Signed-off-by: Pravin B Shelar 
>> ---
>>  datapath/actions.c| 294 
>> +++---
>>  datapath/datapath.c   |  35 +++---
>>  datapath/datapath.h   |   8 +-
>>  datapath/flow.c   |   1 -
>>  datapath/vport-lisp.c |  22 ++--
>>  datapath/vport.c  |   2 +-
>>  6 files changed, 144 insertions(+), 218 deletions(-)
>>
>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index 8d18848..0fc32d6 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> @@ -40,6 +40,10 @@
>>  #include "vlan.h"
>>  #include "vport.h"
>>
>> +static int do_execute_actions(struct datapath *dp, struct sk_buff *skb,
>> + struct sw_flow_key *key,
>> + const struct nlattr *attr, int len);
>> +
>>  struct deferred_action {
>> struct sk_buff *skb;
>> const struct nlattr *actions;
>> @@ -72,8 +76,7 @@ static bool action_fifo_is_empty(struct action_fifo *fifo)
>> return (fifo->head == fifo->tail);
>>  }
>>
>> -static struct deferred_action *
>> -action_fifo_get(struct action_fifo *fifo)
>> +static struct deferred_action *action_fifo_get(struct action_fifo *fifo)
>>  {
>> if (action_fifo_is_empty(fifo))
>> return NULL;
>> @@ -81,8 +84,7 @@ action_fifo_get(struct action_fifo *fifo)
>> return &fifo->fifo[fifo->tail++];
>>  }
>>
>> -static struct deferred_action *
>> -action_fifo_put(struct action_fifo *fifo)
>> +static struct deferred_action *action_fifo_put(struct action_fifo *fifo)
>>  {
>> if (fifo->head >= DEFERRED_ACTION_FIFO_SIZE - 1)
>> return NULL;
>> @@ -90,15 +92,10 @@ action_fifo_put(struct action_fifo *fifo)
>> return &fifo->fifo[fifo->head++];
>>  }
>>
>> -static void flow_key_clone(struct sk_buff *skb, struct sw_flow_key *new_key)
>> -{
>> -   *new_key = *OVS_CB(skb)->pkt_key;
>> -   OVS_CB(skb)->pkt_key = new_key;
>> -}
>> -
>>  /* Return true if fifo is not full */
>> -static bool add_deferred_actions(struct sk_buff *skb,
>> -const struct nlattr *attr)
>> +static struct deferred_action *add_deferred_actions(struct sk_buff *skb,
>> +   struct sw_flow_key *key,
>> +   const struct nlattr 
>> *attr)
> the comment above needs update as well.
>>  {
>> struct action_fifo *fifo;
>> struct deferred_action *da;
>> @@ -108,109 +105,22 @@ static bool add_deferred_actions(struct sk_buff *skb,
>> if (da) {
>> da->skb = skb;
>> da->actions = attr;
>> -   flow_key_clone(skb, &da->pkt_key);
>> +   da->pkt_key = *key;
>> }
>>
>> -   return (da != NULL);
>> -}
>> -
>> -static void flow_key_set_recirc_id(struct sk_buff *skb, u32 recirc_id)
>> -{
>> -   OVS_CB(skb)->pkt_key->recirc_id = recirc_id;
>> -}
>> -
>> -static void flow_key_set_priority(struct sk_buff *skb, u32 priority)
>> -{
>> -   OVS_CB(skb)->pkt_key->phy.priority = priority;
>> -}
>> -
>> -static void flow_key_set_skb_mark(struct sk_buff *skb, u32 skb_mark)
>> -{
>> -   OVS_CB(skb)->pkt_key->phy.skb_mark = skb_mark;
>> -}
>> -
>> -static void flow_key_set_eth_src(struct sk_buff *skb, const u8 addr[])
>> -{
>> -   ether_addr_copy(OVS_CB(skb)->pkt_key->eth.src, addr);
>> -}
>> -
>> -static void flow_key_set_eth_dst(struct sk_buff *skb, const u8 addr[])
>> -{
>> -   ether_addr_copy(OVS_CB(skb)->pkt_key->eth.dst, addr);
>> -}
>> -
>> -static void flow_key_set_vlan_tci(struct sk_buff *skb, __be16 tci)
>> -{
>> -   OVS_CB(skb)->pkt_key->eth.tci = tci;
>> -}
>> -
>> -static void flow_key_set_mpls_top_lse(struct sk_buff *skb, __be32 top_lse)
>> -{
>> -   OVS_CB(skb)->pkt_key->mpls.top_lse = top_lse;
>> -}
>> -
>> -static void flow_key_set_ipv4_src(struct sk_buff *skb, __be32 addr)
>> -{
>> -   OVS_CB(skb)->pkt_key->ipv4.addr.src = addr;
>> -}
>> -
>> -static void flow_key_set_ipv4_dst(struct sk_buff *skb, __be32 addr)
>> -{
>> -   OVS_CB(skb)->pkt_key->ipv4.addr.src = addr;
>> -}
>> -
>> -static void flow_key_set_ip_tos(struct sk_buff *skb, u8 tos)
>> -{
>> -   OVS_CB(skb)->pkt_key->ip.tos = tos;
>> -}
>> -
>> -static void flow_key_set_ip_ttl(struct sk_buff *skb, u8 ttl)
>> -{
>> -   OVS_CB(skb)->pkt_key->ip.ttl = ttl;
>> -}
>> -
>> -static void flow_key_set_ipv6_src(struct sk_buff *skb,
>> - const __be32 addr[4])
>> -{
>> -   memcpy(&OVS_CB(skb)->pkt_key->ipv6.addr.src, addr, 
>> sizeof(__be32[4]));
>> -}
>> -
>> -static void fl

Re: [ovs-dev] [PATCH] datapath: Remove support to set vport stats.

2014-09-20 Thread Pravin Shelar
On Fri, Sep 19, 2014 at 2:13 PM, Andy Zhou  wrote:
> Looks good.  A few minor comments in line.
> Acked-by: Andy Zhou 
>

I fixed according to comments and pushed to master.

> It would be good to add some more back ground information in the commit 
> message.
>
> On Wed, Sep 17, 2014 at 4:08 PM, Pravin B Shelar  wrote:
>> This was required for old compatibility code. Now vswitchd
>> has dropped it. This support was always deprecated, so
>> finally removing it.
>>
>> Signed-off-by: Pravin B Shelar 
>> ---
>>  datapath/datapath.c |  8 
>>  datapath/vport.c| 46 ++
>>  datapath/vport.h|  6 +-
>>  3 files changed, 15 insertions(+), 45 deletions(-)
>>
>> diff --git a/datapath/datapath.c b/datapath/datapath.c
>> index ed9d7bd..3e5ff72 100644
>> --- a/datapath/datapath.c
>> +++ b/datapath/datapath.c
>> @@ -1831,10 +1831,6 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, 
>> struct genl_info *info)
>> if (IS_ERR(vport))
>> goto exit_unlock_free;
>>
>> -   err = 0;
>> -   if (a[OVS_VPORT_ATTR_STATS])
>> -   ovs_vport_set_stats(vport, 
>> nla_data(a[OVS_VPORT_ATTR_STATS]));
>> -
>> err = ovs_vport_cmd_fill_info(vport, reply, info->snd_portid,
>>   info->snd_seq, 0, OVS_VPORT_CMD_NEW);
>> BUG_ON(err < 0);
>> @@ -1878,10 +1874,6 @@ static int ovs_vport_cmd_set(struct sk_buff *skb, 
>> struct genl_info *info)
>> goto exit_unlock_free;
>> }
>>
>> -   if (a[OVS_VPORT_ATTR_STATS])
>> -   ovs_vport_set_stats(vport, 
>> nla_data(a[OVS_VPORT_ATTR_STATS]));
>> -
>> -
>> if (a[OVS_VPORT_ATTR_UPCALL_PID]) {
>> err = ovs_vport_set_upcall_portids(vport,
>>
>> a[OVS_VPORT_ATTR_UPCALL_PID]);
>> diff --git a/datapath/vport.c b/datapath/vport.c
>> index cf7f917..f7e66d1 100644
>> --- a/datapath/vport.c
>> +++ b/datapath/vport.c
>> @@ -243,26 +243,6 @@ void ovs_vport_del(struct vport *vport)
>>  }
>>
>>  /**
>> - * ovs_vport_set_stats - sets offset device stats
>> - *
>> - * @vport: vport on which to set stats
>> - * @stats: stats to set
>> - *
>> - * Provides a set of transmit, receive, and error stats to be added as an
>> - * offset to the collected data when stats are retrieved.  Some devices may 
>> not
>> - * support setting the stats, in which case the result will always be
>> - * -EOPNOTSUPP.
>> - *
>> - * Must be called with ovs_mutex.
>> - */
>> -void ovs_vport_set_stats(struct vport *vport, struct ovs_vport_stats *stats)
>> -{
>> -   spin_lock_bh(&vport->stats_lock);
>> -   vport->offset_stats = *stats;
>> -   spin_unlock_bh(&vport->stats_lock);
>> -}
>> -
>> -/**
>>   * ovs_vport_get_stats - retrieve device stats
>>   *
>>   * @vport: vport from which to retrieve the stats
>> @@ -276,25 +256,27 @@ void ovs_vport_get_stats(struct vport *vport, struct 
>> ovs_vport_stats *stats)
>>  {
>> int i;
>>
>> -   /* We potentially have 3 sources of stats that need to be
>> +   stats->rx_bytes = 0;
>> +   stats->rx_packets   = 0;
>> +   stats->tx_bytes = 0;
>> +   stats->tx_packets   = 0;
> It may be better to move this block below the spin_unlock_bh, so that clear 
> and
> set are continuous.
>> +
>> +   /* We potentially have two surces of stats that need to be
>>  * combined: those we have collected (split into err_stats and
>> -* percpu_stats), offset_stats from set_stats(), and device
>> -* error stats from netdev->get_stats() (for errors that happen
>> -* downstream and therefore aren't reported through our
>> -* vport_record_error() function).
>> -* Stats from first two sources are merged and reported by ovs over
>> +* percpu_stats), and device error stats from netdev->get_stats()
>> +* (for errors that happen downstream and therefore aren't
>> +* reported through our vport_record_error() function).
>> +* Stats from first source are reported by ovs over
>>  * OVS_VPORT_ATTR_STATS.
>>  * netdev-stats can be directly read over netlink-ioctl.
>>  */
>>
>> spin_lock_bh(&vport->stats_lock);
>>
>> -   *stats = vport->offset_stats;
>> -
>> -   stats->rx_errors+= vport->err_stats.rx_errors;
>> -   stats->tx_errors+= vport->err_stats.tx_errors;
>> -   stats->tx_dropped   += vport->err_stats.tx_dropped;
>> -   stats->rx_dropped   += vport->err_stats.rx_dropped;
>> +   stats->rx_errors= vport->err_stats.rx_errors;
>> +   stats->tx_errors= vport->err_stats.tx_errors;
>> +   stats->tx_dropped   = vport->err_stats.tx_dropped;
>> +   stats->rx_dropped   = vport->err_stats.rx_dropped;
>>
>> spin_unlock_bh(&vport->stats_lock);
>>
>> diff --git a/datapath/vport.h b/datapath/vport.h
>> inde

Re: [ovs-dev] [PATCH v2] datapath: Provide compatibility for kernels up to 3.17

2014-09-20 Thread Pravin Shelar
On Thu, Sep 18, 2014 at 5:48 AM, Thomas Graf  wrote:
> Port datapath to work with kernrels up to 3.17 and use 3.16.2 as
> the new kernel for CI testing.
>
> Tested with 3.14, 3.16.2, and net-next (3.17).
>
> Signed-off-by: Thomas Graf 
> Co-authored-by: Madhu Challa 
> ---
> v2:
>  - Swapped ignore_df local_df compat direction
>  - Provide alloc_netdev() compat macro to avoid #ifdef
>  - iptunnel_xmit() version for > 3.12 && < 3.15
>

Thanks. I just made one change and pushed to master.

>  .travis.yml |  2 +-
>  .travis/build.sh|  6 ++--
>  acinclude.m4| 12 ++--
>  datapath/linux/Modules.mk   |  1 +
>  datapath/linux/compat/include/linux/if.h| 12 
>  datapath/linux/compat/include/linux/netdevice.h | 21 ++
>  datapath/linux/compat/include/linux/skbuff.h|  4 +++
>  datapath/linux/compat/include/net/ip_tunnels.h  | 14 +-
>  datapath/linux/compat/include/net/udp.h | 37 
> +
>  datapath/linux/compat/include/net/vxlan.h   |  5 
>  datapath/linux/compat/ip_tunnels_core.c |  2 +-
>  datapath/linux/compat/vxlan.c   |  3 +-
>  datapath/vport-geneve.c |  8 --
>  datapath/vport-gre.c|  4 +--
>  datapath/vport-internal_dev.c   |  4 +--
>  datapath/vport-lisp.c   |  4 +--
>  datapath/vport-netdev.c |  3 +-
>  datapath/vport-vxlan.c  | 12 ++--
>  18 files changed, 114 insertions(+), 40 deletions(-)
>  create mode 100644 datapath/linux/compat/include/net/udp.h
>

.

>
> +#ifndef HAVE_NET_NAME_UNKNOWN
> +#undef alloc_netdev
> +#define NET_NAME_UNKNOWN 0
> +#define alloc_netdev(sizeof_priv, name, name_assign_type, setup) \
> +alloc_netdev_mqs(sizeof_priv, name_assign_type, setup, 1, 1)
> +#endif
> +
I changed name_assign_type to name.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] datapath: Restore OVS_CB after skb_segment.

2014-09-20 Thread Pravin B Shelar
OVS needs to segments large skb before sending it for miss
packet handling to userspace. but skb_gso_segment uses
skb->cb. This corrupted OVS_CB which result in following panic.

[  735.419921] BUG: unable to handle kernel paging request at 001401b2
[  735.423168] Oops:  [#1] SMP DEBUG_PAGEALLOC
[  735.445097] RIP: 0010:[]  [] 
ovs_nla_put_flow+0x37/0x7c0 [openvswitch]
[  735.468858] Call Trace:
[  735.470384]  [] queue_userspace_packet+0x332/0x4d0 
[openvswitch]
[  735.471741]  [] queue_gso_packets+0xf5/0x180 [openvswitch]
[  735.481862]  [] ovs_dp_upcall+0x65/0x70 [openvswitch]
[  735.483031]  [] ovs_dp_process_packet+0x181/0x1b0 
[openvswitch]
[  735.484391]  [] ovs_vport_receive+0x65/0x90 [openvswitch]
[  735.492638]  [] internal_dev_xmit+0x68/0x110 [openvswitch]
[  735.495334]  [] dev_hard_start_xmit+0x2e6/0x8b0
[  735.496503]  [] __dev_queue_xmit+0x3c7/0x920
[  735.499827]  [] dev_queue_xmit+0x10/0x20
[  735.500798]  [] ip_finish_output+0x6a0/0x950
[  735.502818]  [] ip_output+0x68/0x110
[  735.503835]  [] ip_local_out+0x29/0x90
[  735.504801]  [] ip_queue_xmit+0x1d6/0x640
[  735.507015]  [] tcp_transmit_skb+0x477/0xac0
[  735.508260]  [] tcp_write_xmit+0x136/0xba0
[  735.510829]  [] __tcp_push_pending_frames+0x2e/0xc0
[  735.512296]  [] tcp_sendmsg+0xa63/0xd50
[  735.513526]  [] inet_sendmsg+0x10c/0x220
[  735.516025]  [] sock_sendmsg+0x9c/0xe0
[  735.518066]  [] SYSC_sendto+0x121/0x1c0
[  735.521398]  [] SyS_sendto+0xe/0x10
[  735.522473]  [] system_call_fastpath+0x16/0x1b

Reported-by: Andy Zhou 
Signed-off-by: Pravin B Shelar 
---
 datapath/datapath.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/datapath/datapath.c b/datapath/datapath.c
index 9850130..ed40c6a 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -337,9 +337,12 @@ static int queue_gso_packets(struct datapath *dp, struct 
sk_buff *skb,
unsigned short gso_type = skb_shinfo(skb)->gso_type;
struct sw_flow_key later_key;
struct sk_buff *segs, *nskb;
+   struct ovs_skb_cb ovs_cb;
int err;
 
+   ovs_cb = *OVS_CB(skb);
segs = __skb_gso_segment(skb, NETIF_F_SG, false);
+   *OVS_CB(skb) = ovs_cb;
if (IS_ERR(segs))
return PTR_ERR(segs);
 
@@ -355,6 +358,7 @@ static int queue_gso_packets(struct datapath *dp, struct 
sk_buff *skb,
/* Queue all of the segments. */
skb = segs;
do {
+   *OVS_CB(skb) = ovs_cb;
if (gso_type & SKB_GSO_UDP && skb != segs)
key = &later_key;
 
-- 
1.8.3.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev