Re: [ovs-dev] [PATCH v2 3/3] datapath: add layer 3 flow/port support

2014-04-17 Thread thomas.morin
Hi Lori, all,

While I've attempted to adapt the layer 3 port patch to GRE tunneling, 
and obtained nice results for IP-over-GRE (as I reported last week), 
I've had a harder time reusing this framework for -over-GRE, in 
particular for MPLS-over-GRE.

Given its name, I maybe shouldn't have expected -over-GRE to 
work on a "layer 3 port", but I did, given that this patch seems to 
really be a generic framework for popping/pushing the Ethernet header 
before/after tunnelling encap/decap, which is essentially what we need 
for any non-Ethernet tunnel.

Unfortunately, as it is, the patch assumes in some places that the 
tunnel payload will be IP if it is not Ethernet, and does not generally 
support the tunnel payload being some other ethertype.

Practically speaking, making your layer 3 port code work for 
MOLS-over-GRE is not entirely trivial:
- on emission, compose_output_action__ pops the Ethernet header before 
push_mpls is called (which is done during commit_odp_actions) ; this 
does not work since push_mpls actually relies on the presence of the 
Ethernet header to set the Ethernet ethertype to an MPLS ethertype.  One 
way to fix it would be to adapt MPLS code to support the case where the 
frame is layer3. Another is to have compose_output_action__ call 
odp_put_pop_eth_action after commit_odp_actions. I implemented the 
latter and it works. This latter option has the benefit preserving 
compatibility with other actions that can be done before outputting and 
which rely on the Ethernet header being there (such as changing a 
Ethernet dst) -- such operations do not make sense if the header is 
popped afterward, but I think we should try to not break if the user is 
to define such a flow.
- on reception, although I haven't fully tried to reach a resolution, 
there is at least an issue in flow_extract: with the current patch flow 
extract does not parse MPLS for a layer3 frame. I guess that could be 
solved although I haven't clarified exactly yet how flow_extract would 
guess what payload is in the ofppuf->packet (as I understand, this is 
initially determined in the kernel datapath and stored in skb->protocol, 
but it has to be propagated to userspace).

Based on the above, I wonder if the code couldn't be made overall 
simpler by *always* pushing an Ethernet header on reception of a 
non-Ethernet tunnelled packet (not only when, after flow identification, 
we conclude that the in_port is layer 3 while the out_port is not). 
Similarly, on emission, the popping of the Ethernet header would not 
have to be conditioned to the in/out ports being layer 3 or not.  The 
coexistence of the layer 3 port code with operations relying on the fact 
that the packet being manipulated is Ethernet, would become a non-issue 
; the code related to these operations could remain completely untouched.

In practice, that could be done by doing pop_eth based on flow actions 
(as currently done by your patch), but by doing push_eth unconditionally 
on reception of a packet from a layer3 tunnel (*not* as a flow action as 
currently done in your patch).

Comments ?

-Thomas
_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch net-next RFC v3 01/10] openvswitch: split flow structures into ovs specific and generic ones

2014-04-17 Thread Jiri Pirko
After this, flow related structures can be used in other code.

Signed-off-by: Jiri Pirko 
---
 include/linux/sw_flow.h| 108 
 net/openvswitch/datapath.c |  45 +++
 net/openvswitch/datapath.h |   4 +-
 net/openvswitch/flow.c |  14 ++---
 net/openvswitch/flow.h | 123 +
 net/openvswitch/flow_netlink.c |  24 
 net/openvswitch/flow_netlink.h |   4 +-
 net/openvswitch/flow_table.c   | 100 +
 net/openvswitch/flow_table.h   |  18 +++---
 net/openvswitch/vport-gre.c|   4 +-
 net/openvswitch/vport-vxlan.c  |   2 +-
 net/openvswitch/vport.c|   2 +-
 net/openvswitch/vport.h|   2 +-
 13 files changed, 245 insertions(+), 205 deletions(-)
 create mode 100644 include/linux/sw_flow.h

diff --git a/include/linux/sw_flow.h b/include/linux/sw_flow.h
new file mode 100644
index 000..3bbd5aa
--- /dev/null
+++ b/include/linux/sw_flow.h
@@ -0,0 +1,108 @@
+/*
+ * Copyright (c) 2007-2012 Nicira, Inc.
+ * Copyright (c) 2014 Jiri Pirko 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ */
+
+#ifndef _LINUX_SW_FLOW_H_
+#define _LINUX_SW_FLOW_H_
+
+struct sw_flow_key_ipv4_tunnel {
+   __be64 tun_id;
+   __be32 ipv4_src;
+   __be32 ipv4_dst;
+   __be16 tun_flags;
+   u8   ipv4_tos;
+   u8   ipv4_ttl;
+};
+
+struct sw_flow_key {
+   struct sw_flow_key_ipv4_tunnel tun_key;  /* Encapsulating tunnel key. */
+   struct {
+   u32 priority;   /* Packet QoS priority. */
+   u32 skb_mark;   /* SKB mark. */
+   u16 in_port;/* Input switch port (or DP_MAX_PORTS). 
*/
+   } phy;
+   struct {
+   u8 src[ETH_ALEN];   /* Ethernet source address. */
+   u8 dst[ETH_ALEN];   /* Ethernet destination address. */
+   __be16 tci; /* 0 if no VLAN, VLAN_TAG_PRESENT set 
otherwise. */
+   __be16 type;/* Ethernet frame type. */
+   } eth;
+   struct {
+   u8 proto;   /* IP protocol or lower 8 bits of ARP 
opcode. */
+   u8 tos; /* IP ToS. */
+   u8 ttl; /* IP TTL/hop limit. */
+   u8 frag;/* One of OVS_FRAG_TYPE_*. */
+   } ip;
+   union {
+   struct {
+   struct {
+   __be32 src; /* IP source address. */
+   __be32 dst; /* IP destination address. */
+   } addr;
+   union {
+   struct {
+   __be16 src; /* TCP/UDP/SCTP 
source port. */
+   __be16 dst; /* TCP/UDP/SCTP 
destination port. */
+   __be16 flags;   /* TCP flags. */
+   } tp;
+   struct {
+   u8 sha[ETH_ALEN];   /* ARP source 
hardware address. */
+   u8 tha[ETH_ALEN];   /* ARP target 
hardware address. */
+   } arp;
+   };
+   } ipv4;
+   struct {
+   struct {
+   struct in6_addr src;/* IPv6 source address. 
*/
+   struct in6_addr dst;/* IPv6 destination 
address. */
+   } addr;
+   __be32 label;   /* IPv6 flow label. */
+   struct {
+   __be16 src; /* TCP/UDP/SCTP source 
port. */
+   __be16 dst; /* TCP/UDP/SCTP 
destination port. */
+   __be16 flags;   /* TCP flags. */
+   } tp;
+   struct {
+   struct in6_addr target; /* ND target address. */
+   u8 sll[ETH_ALEN];   /* ND source link layer 
address. */
+   u8 tll[ETH_ALEN];   /* ND target link layer 
address. */
+   } nd;
+ 

[ovs-dev] [patch net-next RFC v3 03/10] net: introduce generic switch devices support

2014-04-17 Thread Jiri Pirko
The goal of this is to provide a possibility to suport various switch
chips. Drivers should implement relevant ndos to do so. So far, only one
ndo for getting physical switch id is in place.

Signed-off-by: Jiri Pirko 
---
 Documentation/networking/switchdev.txt | 53 ++
 include/linux/netdevice.h  | 10 +++
 include/linux/switchdev.h  | 30 +++
 net/Kconfig|  6 
 net/core/Makefile  |  1 +
 net/core/switchdev.c   | 32 
 6 files changed, 132 insertions(+)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 include/linux/switchdev.h
 create mode 100644 net/core/switchdev.c

diff --git a/Documentation/networking/switchdev.txt 
b/Documentation/networking/switchdev.txt
new file mode 100644
index 000..20345cc
--- /dev/null
+++ b/Documentation/networking/switchdev.txt
@@ -0,0 +1,53 @@
+Switch device drivers HOWTO
+===
+
+First lets describe a topology a bit. Imagine the following example:
+
+   +++---+
+   | SOME switch chip   ||  CPU  |
+   +++---+
+   port1 port2 port3 port4 MNGMNT| PCI-E |
+ | | | | |   +---+
+PHY   PHY| | | |  NIC0 NIC1
+ | | | |   ||
+ | | +- PCI-E -+   ||
+ | +--- MII ---+|
+ +- MII +
+
+In this example, there are two independent lines between the switch silicon
+and CPU. NIC0 and NIC1 drivers are not aware of a switch presence. They are
+separate from the switch driver. SOME switch chip is by managed by a driver
+via PCI-E device MNGMNT. Note that MNGMNT device, NIC0 and NIC1 may be
+connected to some other type of bus.
+
+Now, for the previous example show the representation in kernel:
+
+   +++---+
+   | SOME switch chip   ||  CPU  |
+   +++---+
+   sw0p0 sw0p1 sw0p2 sw0p3 MNGMNT| PCI-E |
+ | | | | |   +---+
+PHY   PHY| | | |  eth0 eth1
+ | | | |   ||
+ | | +- PCI-E -+   ||
+ | +--- MII ---+|
+ +- MII +
+
+Lets call the example switch driver for SOME switch chip "SOMEswitch". This
+driver takes care of PCI-E device MNGMNT. There is a netdevice instance sw0pX
+created for each port of a switch. These netdevices are instances
+of "SOMEswitch" driver. sw0pX netdevices serve as a "representation"
+of the switch chip. eth0 and eth1 are instances of some other existing driver.
+
+The only difference of the switch-port netdevice from the ordinary netdevice
+is that is implements couple more NDOs:
+
+   ndo_swdev_get_id - This returns the same ID for two port netdevices of
+  the same physical switch chip. This is mandatory to
+  be implemented by all switch drivers and serves
+  the caller for recognition of a port netdevice.
+   ndo_swdev_* - Functions that serve for a manipulation of the switch chip
+ itself. They are not port-specific. Caller might use
+ arbitrary port netdevice of the same switch and it will
+ make no difference.
+   ndo_swportdev_* - Functions that serve for a port-specific manipulation.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8cf4f5e..83e38a0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -991,6 +991,12 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
*dev,
  * Callback to use for xmit over the accelerated station. This
  * is used in place of ndo_start_xmit on accelerated net
  * devices.
+ *
+ * int (*ndo_swdev_get_id)(struct net_device *dev,
+ *struct netdev_phys_item_id *psid);
+ * Called to get an ID of the switch chip this port is part of.
+ * If driver implements this, it indicates that it represents a port
+ * of a switch chip.
  */
 struct net_device_ops {
int (*ndo_init)(struct net_device *dev);
@@ -1137,6 +1143,10 @@ struct net_device_ops {
netdev_tx_t (*ndo_dfwd_start_xmit) (struct sk_buff *skb,
struct net_device *dev,
void *priv);
+#ifdef CONFIG_NET_SWITCHDEV
+   int (*ndo_swdev_get_id)(struct net_device *

[ovs-dev] [patch net-next RFC v3 00/10] introduce infrastructure for support of switch chip datapath

2014-04-17 Thread Jiri Pirko
The basic idea is to introduce a generic infractructure to support various
switch chips in kernel.

The current patchset RFC is a heavy rework of the previous ones. The goal is to
be low-intrusive and to use as much as possible from the existing
infrastructure.

Also the idea is to benefit from currently existing Open vSwitch userspace
infrastructure prividing a HW offload for it. Note that this is just one
usecase. Other sw*dev_ndos can be easily added to support many more usecases,

Please see Documentation/networking/switchdev.txt for more details about
the model.

Next step would be to introduce a flag via which OVS userspace could tell if the
flow should be inserted in HW or just in the SW datapath. That would extend 
current
2 level cashing (kernel, userspace) to 3 level (hw, kernel, userspace).

Next step would be to improve internal kernel notifiers and introduce a listener
which would call sw*dev_ndos to setup the switch according to what the user
wants. This would allow a usecase then a user configures bridges, bonds, etc and
the switch chip underneath gets configured by that.

Any suggestions, feedbacks welcome.

Jiri Pirko (10):
  openvswitch: split flow structures into ovs specific and generic ones
  net: rename netdev_phys_port_id to more generic name
  net: introduce generic switch devices support
  rtnl: expose physical switch id for particular device
  switchdev: introduce basic support for flows
  net: introduce dummy switch
  dsa: implement ndo_swdev_get_id
  net: add netdev_for_each_all_lower_dev_rcu helper
  openvswitch: introduce vport_op get_netdev
  openvswitch: add support for datapath hardware offload

 Documentation/networking/switchdev.txt   |  53 +++
 drivers/net/Kconfig  |   7 +
 drivers/net/Makefile |   1 +
 drivers/net/dummyswitch.c| 126 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |   2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |   2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |   2 +-
 include/linux/netdevice.h|  74 --
 include/linux/sw_flow.h  | 122 
 include/linux/switchdev.h|  55 
 include/uapi/linux/if_link.h |  10 ++
 net/Kconfig  |   6 +
 net/core/Makefile|   1 +
 net/core/dev.c   |  28 ++--
 net/core/net-sysfs.c |   2 +-
 net/core/rtnetlink.c |  30 +++-
 net/core/switchdev.c |  90 
 net/dsa/slave.c  |  16 +++
 net/openvswitch/Makefile |   3 +-
 net/openvswitch/datapath.c   |  63 +
 net/openvswitch/datapath.h   |   4 +-
 net/openvswitch/dp_notify.c  |   7 +-
 net/openvswitch/flow.c   |  14 +-
 net/openvswitch/flow.h   | 123 
 net/openvswitch/flow_netlink.c   |  24 ++--
 net/openvswitch/flow_netlink.h   |   4 +-
 net/openvswitch/flow_table.c | 100 ++---
 net/openvswitch/flow_table.h |  18 +--
 net/openvswitch/hw_offload.c | 170 +++
 net/openvswitch/hw_offload.h |  31 +
 net/openvswitch/vport-gre.c  |   4 +-
 net/openvswitch/vport-internal_dev.c |  53 ---
 net/openvswitch/vport-netdev.c   |  16 +++
 net/openvswitch/vport-netdev.h   |  12 --
 net/openvswitch/vport-vxlan.c|   2 +-
 net/openvswitch/vport.c  |   2 +-
 net/openvswitch/vport.h  |   4 +-
 37 files changed, 1004 insertions(+), 277 deletions(-)
 create mode 100644 Documentation/networking/switchdev.txt
 create mode 100644 drivers/net/dummyswitch.c
 create mode 100644 include/linux/sw_flow.h
 create mode 100644 include/linux/switchdev.h
 create mode 100644 net/core/switchdev.c
 create mode 100644 net/openvswitch/hw_offload.c
 create mode 100644 net/openvswitch/hw_offload.h

-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch net-next RFC v3 04/10] rtnl: expose physical switch id for particular device

2014-04-17 Thread Jiri Pirko
The the netdevice represents a port in a switch, it will expose
IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with a same value
belong to one physical switch.

Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c | 26 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 9a7f7ac..c11f492 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -145,6 +145,7 @@ enum {
IFLA_CARRIER,
IFLA_PHYS_PORT_ID,
IFLA_CARRIER_CHANGES,
+   IFLA_PHYS_SWITCH_ID,
__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 231e043..7170e24 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -43,6 +43,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -829,7 +830,8 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
   + rtnl_link_get_size(dev) /* IFLA_LINKINFO */
   + rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
-  + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
+  + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
+  + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_SWITCH_ID */
 }
 
 static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
@@ -926,6 +928,24 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, 
struct net_device *dev)
return 0;
 }
 
+static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device 
*dev)
+{
+   int err;
+   struct netdev_phys_item_id psid;
+
+   err = swdev_get_id(dev, &psid);
+   if (err) {
+   if (err == -EOPNOTSUPP)
+   return 0;
+   return err;
+   }
+
+   if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
+   return -EMSGSIZE;
+
+   return 0;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
int type, u32 pid, u32 seq, u32 change,
unsigned int flags, u32 ext_filter_mask)
@@ -998,6 +1018,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
if (rtnl_phys_port_id_fill(skb, dev))
goto nla_put_failure;
 
+   if (rtnl_phys_switch_id_fill(skb, dev))
+   goto nla_put_failure;
+
attr = nla_reserve(skb, IFLA_STATS,
sizeof(struct rtnl_link_stats));
if (attr == NULL)
@@ -1151,6 +1174,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[IFLA_NUM_RX_QUEUES]= { .type = NLA_U32 },
[IFLA_PHYS_PORT_ID] = { .type = NLA_BINARY, .len = 
MAX_PHYS_ITEM_ID_LEN },
[IFLA_CARRIER_CHANGES]  = { .type = NLA_U32 },  /* ignored */
+   [IFLA_PHYS_SWITCH_ID]   = { .type = NLA_BINARY, .len = 
MAX_PHYS_ITEM_ID_LEN },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch net-next RFC v3 05/10] switchdev: introduce basic support for flows

2014-04-17 Thread Jiri Pirko
This patch adds a couple of ndos which can be used to work with flows.
Note that user can use random port netdevice to access the switch.

Signed-off-by: Jiri Pirko 
---
 include/linux/netdevice.h | 30 
 include/linux/switchdev.h | 25 
 net/core/switchdev.c  | 58 +++
 3 files changed, 113 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 83e38a0..14bd8b3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -49,6 +49,8 @@
 
 #include 
 #include 
+#include 
+
 #include 
 
 struct netpoll_info;
@@ -997,6 +999,26 @@ typedef u16 (*select_queue_fallback_t)(struct net_device 
*dev,
  * Called to get an ID of the switch chip this port is part of.
  * If driver implements this, it indicates that it represents a port
  * of a switch chip.
+ *
+ * int (*ndo_swdev_flow_insert)(struct net_device *dev,
+ * const struct sw_flow *flow);
+ * Called to insert a flow into switch device. If driver does
+ * not implement this, it is assumed that the hw does not have
+ * a capability to work with flows.
+ *
+ * int (*ndo_swdev_flow_remove)(struct net_device *dev,
+ * const struct sw_flow *flow);
+ * Called to remove a flow from switch device. If driver does
+ * not implement this, it is assumed that the hw does not have
+ * a capability to work with flows.
+ *
+ * int (*ndo_swdev_flow_action_set)(struct net_device *dev,
+ * const struct sw_flow *flow,
+ * const struct sw_flow_action *action,
+ * size_t action_count);
+ * Called to set actions for a flow inserted in swtich device.
+ * If driver does not implement this, it is assumed that the hw does
+ * not have a capability to work with flows.
  */
 struct net_device_ops {
int (*ndo_init)(struct net_device *dev);
@@ -1146,6 +1168,14 @@ struct net_device_ops {
 #ifdef CONFIG_NET_SWITCHDEV
int (*ndo_swdev_get_id)(struct net_device *dev,
struct netdev_phys_item_id 
*psid);
+   int (*ndo_swdev_flow_insert)(struct net_device *dev,
+const struct sw_flow 
*flow);
+   int (*ndo_swdev_flow_remove)(struct net_device *dev,
+const struct sw_flow 
*flow);
+   int (*ndo_swdev_flow_action_set)(struct net_device 
*dev,
+const struct 
sw_flow *flow,
+const struct 
sw_flow_action *action,
+size_t 
action_count);
 #endif
 };
 
diff --git a/include/linux/switchdev.h b/include/linux/switchdev.h
index b147dde..6f8ce06 100644
--- a/include/linux/switchdev.h
+++ b/include/linux/switchdev.h
@@ -16,6 +16,11 @@
 #ifdef CONFIG_NET_SWITCHDEV
 
 int swdev_get_id(struct net_device *dev, struct netdev_phys_item_id *psid);
+int swdev_flow_insert(struct net_device *dev, const struct sw_flow *flow);
+int swdev_flow_remove(struct net_device *dev, const struct sw_flow *flow);
+int swdev_flow_action_set(struct net_device *dev, const struct sw_flow *flow,
+ const struct sw_flow_action *action,
+ size_t action_count);
 
 #else
 
@@ -25,6 +30,26 @@ static inline int swdev_get_id(struct net_device *dev,
return -EOPNOTSUPP;
 }
 
+static inline int swdev_flow_insert(struct net_device *dev,
+   const struct sw_flow *flow)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int swdev_flow_remove(struct net_device *dev,
+   const struct sw_flow *flow)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int swdev_flow_action_set(struct net_device *dev,
+   const struct sw_flow *flow,
+   const struct sw_flow_action *action,
+   size_t action_count)
+{
+   return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* _LINUX_SWITCHDEV_H_ */
diff --git a/net/core/switchdev.c b/net/core/switchdev.c
index 2d275f5..cb5b6a2 100644
--- a/net/core/switchdev.c
+++ b/net/core/switchdev.c
@@ -30,3 +30,61 @@ int swdev_get_id(struct net_device *dev, struct 
netdev_phys_item_id *psid)
return ops->ndo_swdev_get_id(dev, psid);
 }
 EXPORT_SYMBOL(swdev_get_id);
+
+/**
+ * swdev_flow_insert - Insert a flow into switch
+ * @dev: port device
+ * @flow: flow descriptor
+ *
+ * Insert a flow into switch this port is part of.
+ */
+int swdev_flow_insert(struct net_device *dev, const struct sw_flow *flow)
+{
+

[ovs-dev] [patch net-next RFC v3 07/10] dsa: implement ndo_swdev_get_id

2014-04-17 Thread Jiri Pirko
Signed-off-by: Jiri Pirko 
---
 net/dsa/slave.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 02c0e17..22855f3 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -171,6 +171,19 @@ static int dsa_slave_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
return -EOPNOTSUPP;
 }
 
+static int dsa_slave_swdev_get_id(struct net_device *dev,
+ struct netdev_phys_item_id *psid)
+{
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   struct dsa_switch *ds = p->parent;
+   u64 tmp = (u64) ds;
+
+   /* TODO: add more sophisticated id generation */
+   memcpy(&psid->id, &tmp, sizeof(tmp));
+   psid->id_len = sizeof(tmp);
+
+   return 0;
+}
 
 /* ethtool operations ***/
 static int
@@ -303,6 +316,7 @@ static const struct net_device_ops dsa_netdev_ops = {
.ndo_set_rx_mode= dsa_slave_set_rx_mode,
.ndo_set_mac_address= dsa_slave_set_mac_address,
.ndo_do_ioctl   = dsa_slave_ioctl,
+   .ndo_swdev_get_id   = dsa_slave_swdev_get_id,
 };
 #endif
 #ifdef CONFIG_NET_DSA_TAG_EDSA
@@ -315,6 +329,7 @@ static const struct net_device_ops edsa_netdev_ops = {
.ndo_set_rx_mode= dsa_slave_set_rx_mode,
.ndo_set_mac_address= dsa_slave_set_mac_address,
.ndo_do_ioctl   = dsa_slave_ioctl,
+   .ndo_swdev_get_id   = dsa_slave_swdev_get_id,
 };
 #endif
 #ifdef CONFIG_NET_DSA_TAG_TRAILER
@@ -327,6 +342,7 @@ static const struct net_device_ops trailer_netdev_ops = {
.ndo_set_rx_mode= dsa_slave_set_rx_mode,
.ndo_set_mac_address= dsa_slave_set_mac_address,
.ndo_do_ioctl   = dsa_slave_ioctl,
+   .ndo_swdev_get_id   = dsa_slave_swdev_get_id,
 };
 #endif
 
-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch net-next RFC v3 10/10] openvswitch: add support for datapath hardware offload

2014-04-17 Thread Jiri Pirko
Benefit from the possibility to work with flows in switch devices and
use the swdev api to offload flow datapath.

Signed-off-by: Jiri Pirko 
---
 include/linux/sw_flow.h  |  16 +++-
 net/openvswitch/Makefile |   3 +-
 net/openvswitch/datapath.c   |  16 +++-
 net/openvswitch/hw_offload.c | 170 +++
 net/openvswitch/hw_offload.h |  31 
 5 files changed, 231 insertions(+), 5 deletions(-)
 create mode 100644 net/openvswitch/hw_offload.c
 create mode 100644 net/openvswitch/hw_offload.h

diff --git a/include/linux/sw_flow.h b/include/linux/sw_flow.h
index 3bbd5aa..12a24f9 100644
--- a/include/linux/sw_flow.h
+++ b/include/linux/sw_flow.h
@@ -102,7 +102,21 @@ struct sw_flow {
struct sw_flow_mask *mask;
 };
 
+enum sw_flow_action_type {
+   SW_FLOW_ACTION_TYPE_OUTPUT,
+   SW_FLOW_ACTION_TYPE_VLAN_PUSH,
+   SW_FLOW_ACTION_TYPE_VLAN_POP,
+};
+
 struct sw_flow_action {
-}
+   enum sw_flow_action_type type;
+   union {
+   struct net_device *output_dev;
+   struct {
+   __be16 vlan_proto;
+   u16 vlan_tci;
+   } vlan;
+   };
+};
 
 #endif /* _LINUX_SW_FLOW_H_ */
diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 3591cb5..5152437 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -13,7 +13,8 @@ openvswitch-y := \
flow_table.o \
vport.o \
vport-internal_dev.o \
-   vport-netdev.o
+   vport-netdev.o \
+   hw_offload.o
 
 ifneq ($(CONFIG_OPENVSWITCH_VXLAN),)
 openvswitch-y += vport-vxlan.o
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 10ffb0a..f1e0792 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -59,6 +59,7 @@
 #include "flow_netlink.h"
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
+#include "hw_offload.h"
 
 int ovs_net_id __read_mostly;
 
@@ -840,13 +841,15 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, 
struct genl_info *info)
flow->flow.key = masked_key;
flow->flow.unmasked_key = key;
rcu_assign_pointer(flow->sf_acts, acts);
+   acts = NULL;
 
/* Put flow in bucket. */
error = ovs_flow_tbl_insert(&dp->table, flow, &mask);
-   if (error) {
-   acts = NULL;
+   if (error)
goto err_flow_free;
-   }
+   error = ovs_hw_flow_insert(dp, flow, flow->sf_acts);
+   if (error)
+   goto err_flow_tbl_remove;
 
reply = ovs_flow_cmd_build_info(flow, dp, info, 
OVS_FLOW_CMD_NEW);
} else {
@@ -868,6 +871,10 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, 
struct genl_info *info)
if (!ovs_flow_cmp_unmasked_key(flow, &match))
goto err_unlock_ovs;
 
+   error = ovs_hw_flow_actions_update(dp, flow, acts);
+   if (error)
+   goto err_unlock_ovs;
+
/* Update actions. */
old_acts = ovsl_dereference(flow->sf_acts);
rcu_assign_pointer(flow->sf_acts, acts);
@@ -888,6 +895,8 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, 
struct genl_info *info)
 0, PTR_ERR(reply));
return 0;
 
+err_flow_tbl_remove:
+   ovs_flow_tbl_remove(&dp->table, flow);
 err_flow_free:
ovs_flow_free(flow, false);
 err_unlock_ovs:
@@ -985,6 +994,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct 
genl_info *info)
goto unlock;
}
 
+   ovs_hw_flow_remove(dp, flow);
ovs_flow_tbl_remove(&dp->table, flow);
 
err = ovs_flow_cmd_fill_info(flow, dp, reply, info->snd_portid,
diff --git a/net/openvswitch/hw_offload.c b/net/openvswitch/hw_offload.c
new file mode 100644
index 000..c4c64d0
--- /dev/null
+++ b/net/openvswitch/hw_offload.c
@@ -0,0 +1,170 @@
+/*
+ * Copyright (c) 2014 Jiri Pirko 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "datapath.h"
+#include "vport-netdev.h"
+
+static struct net_device *__dp_master(struct datapath *dp)
+{
+   struct vport *local;
+
+   local =

[ovs-dev] [patch net-next RFC v3 08/10] net: add netdev_for_each_all_lower_dev_rcu helper

2014-04-17 Thread Jiri Pirko
Signed-off-by: Jiri Pirko 
---
 include/linux/netdevice.h | 18 +-
 net/core/dev.c| 26 ++
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 14bd8b3..bee9673 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3076,15 +3076,23 @@ extern int  weight_p;
 extern int bpf_jit_enable;
 
 bool netdev_has_upper_dev(struct net_device *dev, struct net_device 
*upper_dev);
-struct net_device *netdev_all_upper_get_next_dev_rcu(struct net_device *dev,
-struct list_head **iter);
+struct net_device *netdev_all_adj_get_next_dev_rcu(struct net_device *dev,
+  struct list_head **iter,
+  struct list_head *adj_list);
 
 /* iterate through upper list, must be called under RCU read lock */
 #define netdev_for_each_all_upper_dev_rcu(dev, updev, iter) \
-   for (iter = &(dev)->all_adj_list.upper, \
-updev = netdev_all_upper_get_next_dev_rcu(dev, &(iter)); \
+   for (iter = NULL, \
+updev = netdev_all_adj_get_next_dev_rcu(dev, &(iter), 
&(dev)->all_adj_list.upper); \
 updev; \
-updev = netdev_all_upper_get_next_dev_rcu(dev, &(iter)))
+updev = netdev_all_adj_get_next_dev_rcu(dev, &(iter), 
&(dev)->all_adj_list.upper))
+
+/* iterate through lower list, must be called under RCU read lock */
+#define netdev_for_each_all_lower_dev_rcu(dev, lodev, iter) \
+   for (iter = NULL, \
+lodev = netdev_all_adj_get_next_dev_rcu(dev, &(iter), 
&(dev)->all_adj_list.lower); \
+lodev; \
+lodev = netdev_all_adj_get_next_dev_rcu(dev, &(iter), 
&(dev)->all_adj_list.lower))
 
 void *netdev_lower_get_next_private(struct net_device *dev,
struct list_head **iter);
diff --git a/net/core/dev.c b/net/core/dev.c
index a0de81c..62bcec9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4542,30 +4542,32 @@ void *netdev_adjacent_get_private(struct list_head 
*adj_list)
 EXPORT_SYMBOL(netdev_adjacent_get_private);
 
 /**
- * netdev_all_upper_get_next_dev_rcu - Get the next dev from upper list
+ * netdev_all_adj_get_next_dev_rcu - Get the next dev from adjacent list
  * @dev: device
  * @iter: list_head ** of the current position
+ * @adj_list: adjacent list to go through
  *
- * Gets the next device from the dev's upper list, starting from iter
+ * Gets the next device from the dev's adjacent list, starting from iter
  * position. The caller must hold RCU read lock.
  */
-struct net_device *netdev_all_upper_get_next_dev_rcu(struct net_device *dev,
-struct list_head **iter)
+struct net_device *netdev_all_adj_get_next_dev_rcu(struct net_device *dev,
+  struct list_head **iter,
+  struct list_head *adj_list)
 {
-   struct netdev_adjacent *upper;
+   struct netdev_adjacent *adj;
 
WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held());
 
-   upper = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
-
-   if (&upper->list == &dev->all_adj_list.upper)
+   if (!*iter)
+   *iter = adj_list;
+   if ((*iter)->next == adj_list)
return NULL;
 
-   *iter = &upper->list;
-
-   return upper->dev;
+   adj = list_entry_rcu((*iter)->next, struct netdev_adjacent, list);
+   *iter = &adj->list;
+   return adj->dev;
 }
-EXPORT_SYMBOL(netdev_all_upper_get_next_dev_rcu);
+EXPORT_SYMBOL(netdev_all_adj_get_next_dev_rcu);
 
 /**
  * netdev_lower_get_next_private - Get the next ->private from the
-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch net-next RFC v3 09/10] openvswitch: introduce vport_op get_netdev

2014-04-17 Thread Jiri Pirko
This will allow to query easily if the vport has netdev. Also it allows
to unexpose netdev_vport_priv and struct netdev_vport.

Signed-off-by: Jiri Pirko 
---
 net/openvswitch/datapath.c   |  2 +-
 net/openvswitch/dp_notify.c  |  7 ++---
 net/openvswitch/vport-internal_dev.c | 53 
 net/openvswitch/vport-netdev.c   | 16 +++
 net/openvswitch/vport-netdev.h   | 12 
 net/openvswitch/vport.h  |  2 ++
 6 files changed, 57 insertions(+), 35 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index fcbdb52..10ffb0a 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -148,7 +148,7 @@ static int get_dpifindex(struct datapath *dp)
 
local = ovs_vport_rcu(dp, OVSP_LOCAL);
if (local)
-   ifindex = netdev_vport_priv(local)->dev->ifindex;
+   ifindex = local->ops->get_netdev(local)->ifindex;
else
ifindex = 0;
 
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index 2c631fe..d2cc24b 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -58,13 +58,12 @@ void ovs_dp_notify_wq(struct work_struct *work)
struct hlist_node *n;
 
hlist_for_each_entry_safe(vport, n, &dp->ports[i], 
dp_hash_node) {
-   struct netdev_vport *netdev_vport;
+   struct net_device *dev;
 
if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
continue;
-
-   netdev_vport = netdev_vport_priv(vport);
-   if (!(netdev_vport->dev->priv_flags & 
IFF_OVS_DATAPATH))
+   dev = vport->ops->get_netdev(vport);
+   if (!(dev->priv_flags & IFF_OVS_DATAPATH))
dp_detach_port_notify(vport);
}
}
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index 729c687..86ba186 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -31,6 +31,16 @@
 #include "vport-internal_dev.h"
 #include "vport-netdev.h"
 
+struct internal_dev_vport {
+   struct rcu_head rcu;
+   struct net_device *dev;
+};
+
+static struct internal_dev_vport *internal_dev_vport_priv(const struct vport 
*vport)
+{
+   return vport_priv(vport);
+}
+
 struct internal_dev {
struct vport *vport;
 };
@@ -145,48 +155,49 @@ static void do_setup(struct net_device *netdev)
 static struct vport *internal_dev_create(const struct vport_parms *parms)
 {
struct vport *vport;
-   struct netdev_vport *netdev_vport;
+   struct internal_dev_vport *int_vport;
struct internal_dev *internal_dev;
+   struct net_device *dev;
int err;
 
-   vport = ovs_vport_alloc(sizeof(struct netdev_vport),
+   vport = ovs_vport_alloc(sizeof(struct internal_dev_vport),
&ovs_internal_vport_ops, parms);
if (IS_ERR(vport)) {
err = PTR_ERR(vport);
goto error;
}
 
-   netdev_vport = netdev_vport_priv(vport);
+   int_vport = internal_dev_vport_priv(vport);
 
-   netdev_vport->dev = alloc_netdev(sizeof(struct internal_dev),
-parms->name, do_setup);
-   if (!netdev_vport->dev) {
+   dev = alloc_netdev(sizeof(struct internal_dev), parms->name, do_setup);
+   if (!dev) {
err = -ENOMEM;
goto error_free_vport;
}
+   int_vport->dev = dev;
 
-   dev_net_set(netdev_vport->dev, ovs_dp_get_net(vport->dp));
-   internal_dev = internal_dev_priv(netdev_vport->dev);
+   dev_net_set(dev, ovs_dp_get_net(vport->dp));
+   internal_dev = internal_dev_priv(dev);
internal_dev->vport = vport;
 
/* Restrict bridge port to current netns. */
if (vport->port_no == OVSP_LOCAL)
-   netdev_vport->dev->features |= NETIF_F_NETNS_LOCAL;
+   dev->features |= NETIF_F_NETNS_LOCAL;
 
rtnl_lock();
-   err = register_netdevice(netdev_vport->dev);
+   err = register_netdevice(dev);
if (err)
goto error_free_netdev;
 
-   dev_set_promiscuity(netdev_vport->dev, 1);
+   dev_set_promiscuity(dev, 1);
rtnl_unlock();
-   netif_start_queue(netdev_vport->dev);
+   netif_start_queue(dev);
 
return vport;
 
 error_free_netdev:
rtnl_unlock();
-   free_netdev(netdev_vport->dev);
+   free_netdev(dev);
 error_free_vport:
ovs_vport_free(vport);
 error:
@@ -195,21 +206,21 @@ error:
 
 static void internal_dev_destroy(struct vport *vport)
 {
-   struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
+   struct inte

[ovs-dev] [patch net-next RFC v3 06/10] net: introduce dummy switch

2014-04-17 Thread Jiri Pirko
Dummy switch implementation using switchdev interface

Signed-off-by: Jiri Pirko 
---
 drivers/net/Kconfig  |   7 +++
 drivers/net/Makefile |   1 +
 drivers/net/dummyswitch.c| 126 +++
 include/uapi/linux/if_link.h |   9 
 4 files changed, 143 insertions(+)
 create mode 100644 drivers/net/dummyswitch.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 89402c3..a9629a7 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -71,6 +71,13 @@ config DUMMY
  To compile this driver as a module, choose M here: the module
  will be called dummy.
 
+config NET_DUMMY_SWITCH
+   tristate "Dummy switch net driver support"
+   depends on NET_SWITCHDEV
+   ---help---
+ To compile this driver as a module, choose M here: the module
+ will be called dummyswitch.
+
 config EQUALIZER
tristate "EQL (serial line load balancing) support"
---help---
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 3fef8a8..d5d4ce6 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -7,6 +7,7 @@
 #
 obj-$(CONFIG_BONDING) += bonding/
 obj-$(CONFIG_DUMMY) += dummy.o
+obj-$(CONFIG_NET_DUMMY_SWITCH) += dummyswitch.o
 obj-$(CONFIG_EQUALIZER) += eql.o
 obj-$(CONFIG_IFB) += ifb.o
 obj-$(CONFIG_MACVLAN) += macvlan.o
diff --git a/drivers/net/dummyswitch.c b/drivers/net/dummyswitch.c
new file mode 100644
index 000..f0d1195
--- /dev/null
+++ b/drivers/net/dummyswitch.c
@@ -0,0 +1,126 @@
+/*
+ * drivers/net/dummyswitch.c - Dummy switch device
+ * Copyright (c) 2014 Jiri Pirko 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+struct dummyswport_priv {
+   struct netdev_phys_item_id psid;
+};
+
+static netdev_tx_t dummyswport_start_xmit(struct sk_buff *skb,
+ struct net_device *dev)
+{
+   dev_kfree_skb(skb);
+   return NETDEV_TX_OK;
+}
+
+static int dummyswport_swdev_get_id(struct net_device *dev,
+   struct netdev_phys_item_id *psid)
+{
+   struct dummyswport_priv *dsp = netdev_priv(dev);
+
+   memcpy(psid, &dsp->psid, sizeof(*psid));
+   return 0;
+}
+
+static const struct net_device_ops dummyswport_netdev_ops = {
+   .ndo_start_xmit = dummyswport_start_xmit,
+   .ndo_swdev_get_id   = dummyswport_swdev_get_id,
+};
+
+static void dummyswport_setup(struct net_device *dev)
+{
+   ether_setup(dev);
+
+   /* Initialize the device structure. */
+   dev->netdev_ops = &dummyswport_netdev_ops;
+   dev->destructor = free_netdev;
+
+   /* Fill in device structure with ethernet-generic values. */
+   dev->tx_queue_len = 0;
+   dev->flags |= IFF_NOARP;
+   dev->flags &= ~IFF_MULTICAST;
+   dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+   dev->features   |= NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_TSO;
+   dev->features   |= NETIF_F_HW_CSUM | NETIF_F_HIGHDMA | NETIF_F_LLTX;
+   eth_hw_addr_random(dev);
+}
+
+static int dummyswport_validate(struct nlattr *tb[], struct nlattr *data[])
+{
+   if (tb[IFLA_ADDRESS])
+   return -EINVAL;
+   if (!data || !data[IFLA_DYMMYSWPORT_PHYS_SWITCH_ID])
+   return -EINVAL;
+   return 0;
+}
+
+static int dummyswport_newlink(struct net *src_net, struct net_device *dev,
+  struct nlattr *tb[], struct nlattr *data[])
+{
+   struct dummyswport_priv *dsp = netdev_priv(dev);
+   int err;
+
+   dsp->psid.id_len = nla_len(data[IFLA_DYMMYSWPORT_PHYS_SWITCH_ID]);
+   memcpy(dsp->psid.id, nla_data(data[IFLA_DYMMYSWPORT_PHYS_SWITCH_ID]),
+  dsp->psid.id_len);
+
+   err = register_netdevice(dev);
+   if (err)
+   return err;
+
+   netif_carrier_on(dev);
+
+   return 0;
+}
+
+static const struct nla_policy dummyswport_policy[IFLA_DUMMYSWPORT_MAX + 1] = {
+   [IFLA_DYMMYSWPORT_PHYS_SWITCH_ID] = { .type = NLA_BINARY,
+ .len = MAX_PHYS_ITEM_ID_LEN },
+};
+
+static struct rtnl_link_ops dummyswport_link_ops __read_mostly = {
+   .kind   = "dummyswport",
+   .priv_size  = sizeof(struct dummyswport_priv),
+   .setup  = dummyswport_setup,
+   .validate   = dummyswport_validate,
+   .newlink= dummyswport_newlink,
+   .policy = dummyswport_policy,
+   .maxtype= IFLA_DUMMYSWPORT_MAX,
+};
+
+
+/*
+ * Module init/exit
+ */
+
+static int __init dummysw_module_init(void)
+{
+   return rtnl_link_register(&dummyswport_link_ops);
+}
+
+static void __exit dummysw_module_exit(void)
+{
+   rtnl_link_unregister(&dummysw

[ovs-dev] [patch net-next RFC v3 0/5] iproute2: support switch chip infrastructure

2014-04-17 Thread Jiri Pirko
Jiri Pirko (5):
  iproute2: arpd: use ll_addr_a2n and ll_addr_n2a
  iproute2: utils: change hexstring_n2a and hexstring_a2n to do not work
with ":"
  iproute2: ipa: show switch id
  iproute2: add support for dummyswport
  iproute2: ipa: show port id

 include/linux/if_link.h | 11 ++
 ip/Makefile |  3 ++-
 ip/ipaddress.c  | 16 ++
 ip/iplink_dummyswport.c | 56 +
 lib/utils.c | 46 
 misc/arpd.c |  6 +++---
 6 files changed, 101 insertions(+), 37 deletions(-)
 create mode 100644 ip/iplink_dummyswport.c

-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch iproute2 RFC v3 2/5] iproute2: utils: change hexstring_n2a and hexstring_a2n to do not work with ":"

2014-04-17 Thread Jiri Pirko
Signed-off-by: Jiri Pirko 
---
 lib/utils.c | 46 +-
 1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/lib/utils.c b/lib/utils.c
index 4e9c719..e9e1040 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -743,10 +743,6 @@ char *hexstring_n2a(const __u8 *str, int len, char *buf, 
int blen)
sprintf(ptr, "%02x", str[i]);
ptr += 2;
blen -= 2;
-   if (i != len-1 && blen > 1) {
-   *ptr++ = ':';
-   blen--;
-   }
}
return buf;
 }
@@ -754,38 +750,22 @@ char *hexstring_n2a(const __u8 *str, int len, char *buf, 
int blen)
 __u8* hexstring_a2n(const char *str, __u8 *buf, int blen)
 {
int cnt = 0;
+   char *endptr;
 
-   for (;;) {
-   unsigned acc;
-   char ch;
-
-   acc = 0;
-
-   while ((ch = *str) != ':' && ch != 0) {
-   if (ch >= '0' && ch <= '9')
-   ch -= '0';
-   else if (ch >= 'a' && ch <= 'f')
-   ch -= 'a'-10;
-   else if (ch >= 'A' && ch <= 'F')
-   ch -= 'A'-10;
-   else
-   return NULL;
-   acc = (acc<<4) + ch;
-   str++;
-   }
-
-   if (acc > 255)
+   if (strlen(str) % 2)
+   return NULL;
+   while (cnt < blen && strlen(str) > 1) {
+   unsigned int tmp;
+   char tmpstr[3];
+
+   strncpy(tmpstr, str, 2);
+   tmpstr[2] = '\0';
+   tmp = strtoul(tmpstr, &endptr, 16);
+   if (errno != 0 || tmp > 0xFF || *endptr != '\0')
return NULL;
-   if (cnt < blen) {
-   buf[cnt] = acc;
-   cnt++;
-   }
-   if (ch == 0)
-   break;
-   ++str;
+   buf[cnt++] = tmp;
+   str += 2;
}
-   if (cnt < blen)
-   memset(buf+cnt, 0, blen-cnt);
return buf;
 }
 
-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch iproute2 RFC v3 1/5] iproute2: arpd: use ll_addr_a2n and ll_addr_n2a

2014-04-17 Thread Jiri Pirko
Signed-off-by: Jiri Pirko 
---
 misc/arpd.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/misc/arpd.c b/misc/arpd.c
index bfe7de9..0839e3f 100644
--- a/misc/arpd.c
+++ b/misc/arpd.c
@@ -36,6 +36,7 @@
 
 #include "libnetlink.h"
 #include "utils.h"
+#include "rt_names.h"
 
 int resolve_hosts;
 
@@ -721,8 +722,7 @@ int main(int argc, char **argv)
goto do_abort;
}
 
-   dbdat.data = hexstring_a2n(macbuf, b1, 6);
-   if (dbdat.data == NULL)
+   if (ll_addr_a2n((char *) b1, 6, macbuf) != 6)
goto do_abort;
dbdat.size = 6;
 
@@ -747,7 +747,7 @@ int main(int argc, char **argv)
printf("%-8d %-15s %s\n",
   key->iface,
   inet_ntoa(*(struct 
in_addr*)&key->addr),
-  hexstring_n2a(dbdat.data, 6, b1, 
18));
+  ll_addr_n2a(dbdat.data, 6, 
ARPHRD_ETHER, b1, 18));
} else {
printf("%-8d %-15s FAILED: %dsec ago\n",
   key->iface,
-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch iproute2 RFC v3 3/5] iproute2: ipa: show switch id

2014-04-17 Thread Jiri Pirko
Signed-off-by: Jiri Pirko 
---
 include/linux/if_link.h | 2 ++
 ip/ipaddress.c  | 8 
 2 files changed, 10 insertions(+)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index f08505c..2932453 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -144,6 +144,8 @@ enum {
IFLA_NUM_RX_QUEUES,
IFLA_CARRIER,
IFLA_PHYS_PORT_ID,
+   IFLA_CARRIER_CHANGES,
+   IFLA_PHYS_SWITCH_ID,
__IFLA_MAX
 };
 
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 14d1720..43f7296 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -470,6 +470,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
fprintf(fp, "master %s ", 
ll_idx_n2a(*(int*)RTA_DATA(tb[IFLA_MASTER]), b1));
}
 
+   if (tb[IFLA_PHYS_SWITCH_ID]) {
+   SPRINT_BUF(b1);
+   fprintf(fp, "swid %s ",
+   hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
+ RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
+ b1, sizeof(b1)));
+   }
+
if (tb[IFLA_OPERSTATE])
print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
 
-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch iproute2 RFC v3 4/5] iproute2: add support for dummyswport

2014-04-17 Thread Jiri Pirko
Signed-off-by: Jiri Pirko 
---
 include/linux/if_link.h |  9 
 ip/Makefile |  3 ++-
 ip/iplink_dummyswport.c | 56 +
 3 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 ip/iplink_dummyswport.c

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 2932453..f2fdb85 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -544,4 +544,13 @@ enum {
 
 #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
 
+/* DUMMYSWPORT section */
+enum {
+   IFLA_DUMMYSWPORT_UNSPEC,
+   IFLA_DYMMYSWPORT_PHYS_SWITCH_ID,
+   __IFLA_DUMMYSWPORT_MAX,
+};
+
+#define IFLA_DUMMYSWPORT_MAX (__IFLA_DUMMYSWPORT_MAX - 1)
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/ip/Makefile b/ip/Makefile
index 713adf5..35546e5 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -5,7 +5,8 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o 
ipnetns.o \
 iplink_vlan.o link_veth.o link_gre.o iplink_can.o \
 iplink_macvlan.o iplink_macvtap.o ipl2tp.o link_vti.o \
 iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \
-link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o
+link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \
+iplink_dummyswport.o \
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/iplink_dummyswport.c b/ip/iplink_dummyswport.c
new file mode 100644
index 000..64f1a02
--- /dev/null
+++ b/ip/iplink_dummyswport.c
@@ -0,0 +1,56 @@
+/*
+ * iplink_dummyswport.cdummy switch port device support
+ *
+ *  This program is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU General Public License
+ *  as published by the Free Software Foundation; either version
+ *  2 of the License, or (at your option) any later version.
+ *
+ * Authors: Jiri Pirko 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "rt_names.h"
+#include "utils.h"
+#include "ip_common.h"
+
+static void explain(void)
+{
+   fprintf(stderr, "Usage: ... dummyswport swid SWITCH_ID\n");
+}
+
+static int dummyswport_parse_opt(struct link_util *lu, int argc, char **argv,
+struct nlmsghdr *n)
+{
+   while (argc > 0) {
+   if (matches(*argv, "swid") == 0) {
+   __u8 buf[32];
+
+   NEXT_ARG();
+   if (!hexstring_a2n(*argv, buf, sizeof(buf)))
+   invarg("invalid \"swid\" value\n", *argv);
+   addattr_l(n, 1024, IFLA_DYMMYSWPORT_PHYS_SWITCH_ID,
+ buf, strlen(*argv) / 2);
+   } else {
+   fprintf(stderr, "dummyswport: unknown option 
\"%s\"?\n", *argv);
+   explain();
+   return -1;
+   }
+   argc--, argv++;
+   }
+   return 0;
+}
+
+struct link_util dummyswport_link_util = {
+   .id = "dummyswport",
+   .maxattr= IFLA_DUMMYSWPORT_MAX,
+   .parse_opt  = dummyswport_parse_opt,
+};
-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch iproute2 RFC v3 5/5] iproute2: ipa: show port id

2014-04-17 Thread Jiri Pirko
Signed-off-by: Jiri Pirko 
---
 ip/ipaddress.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 43f7296..f794d02 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -470,6 +470,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
fprintf(fp, "master %s ", 
ll_idx_n2a(*(int*)RTA_DATA(tb[IFLA_MASTER]), b1));
}
 
+   if (tb[IFLA_PHYS_PORT_ID]) {
+   SPRINT_BUF(b1);
+   fprintf(fp, "portid %s ",
+   hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_PORT_ID]),
+ RTA_PAYLOAD(tb[IFLA_PHYS_PORT_ID]),
+ b1, sizeof(b1)));
+   }
+
if (tb[IFLA_PHYS_SWITCH_ID]) {
SPRINT_BUF(b1);
fprintf(fp, "swid %s ",
-- 
1.9.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next RFC v3 04/10] rtnl: expose physical switch id for particular device

2014-04-17 Thread Yegor Yefremov
On Thu, Apr 17, 2014 at 2:14 PM, Jiri Pirko  wrote:
> The the netdevice represents a port in a switch, it will expose

remove one 'the'

> IFLA_PHYS_SWITCH_ID value via rtnl. Two netdevices with a same value

s/a same/the same

> belong to one physical switch.
>
> Signed-off-by: Jiri Pirko 

Reviewed-by: Yegor Yefremov 

> ---
>  include/uapi/linux/if_link.h |  1 +
>  net/core/rtnetlink.c | 26 +-
>  2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 9a7f7ac..c11f492 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
> IFLA_CARRIER,
> IFLA_PHYS_PORT_ID,
> IFLA_CARRIER_CHANGES,
> +   IFLA_PHYS_SWITCH_ID,
> __IFLA_MAX
>  };
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 231e043..7170e24 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -43,6 +43,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -829,7 +830,8 @@ static noinline size_t if_nlmsg_size(const struct 
> net_device *dev,
>+ rtnl_port_size(dev) /* IFLA_VF_PORTS + IFLA_PORT_SELF */
>+ rtnl_link_get_size(dev) /* IFLA_LINKINFO */
>+ rtnl_link_get_af_size(dev) /* IFLA_AF_SPEC */
> -  + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_PORT_ID */
> +  + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
> +  + nla_total_size(MAX_PHYS_ITEM_ID_LEN); /* IFLA_PHYS_SWITCH_ID 
> */
>  }
>
>  static int rtnl_vf_ports_fill(struct sk_buff *skb, struct net_device *dev)
> @@ -926,6 +928,24 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, 
> struct net_device *dev)
> return 0;
>  }
>
> +static int rtnl_phys_switch_id_fill(struct sk_buff *skb, struct net_device 
> *dev)
> +{
> +   int err;
> +   struct netdev_phys_item_id psid;
> +
> +   err = swdev_get_id(dev, &psid);
> +   if (err) {
> +   if (err == -EOPNOTSUPP)
> +   return 0;
> +   return err;
> +   }
> +
> +   if (nla_put(skb, IFLA_PHYS_SWITCH_ID, psid.id_len, psid.id))
> +   return -EMSGSIZE;
> +
> +   return 0;
> +}
> +
>  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
> int type, u32 pid, u32 seq, u32 change,
> unsigned int flags, u32 ext_filter_mask)
> @@ -998,6 +1018,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
> net_device *dev,
> if (rtnl_phys_port_id_fill(skb, dev))
> goto nla_put_failure;
>
> +   if (rtnl_phys_switch_id_fill(skb, dev))
> +   goto nla_put_failure;
> +
> attr = nla_reserve(skb, IFLA_STATS,
> sizeof(struct rtnl_link_stats));
> if (attr == NULL)
> @@ -1151,6 +1174,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] 
> = {
> [IFLA_NUM_RX_QUEUES]= { .type = NLA_U32 },
> [IFLA_PHYS_PORT_ID] = { .type = NLA_BINARY, .len = 
> MAX_PHYS_ITEM_ID_LEN },
> [IFLA_CARRIER_CHANGES]  = { .type = NLA_U32 },  /* ignored */
> +   [IFLA_PHYS_SWITCH_ID]   = { .type = NLA_BINARY, .len = 
> MAX_PHYS_ITEM_ID_LEN },
>  };
>
>  static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> --
> 1.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch iproute2 RFC v3 4/5] iproute2: add support for dummyswport

2014-04-17 Thread Nicolas Dichtel

Le 17/04/2014 14:17, Jiri Pirko a écrit :

Signed-off-by: Jiri Pirko 
---
  include/linux/if_link.h |  9 
  ip/Makefile |  3 ++-
  ip/iplink_dummyswport.c | 56 +
  3 files changed, 67 insertions(+), 1 deletion(-)
  create mode 100644 ip/iplink_dummyswport.c


Don't forget to update iplink_usage() in ip/iplink.c, with the new type.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next RFC v3 02/10] net: rename netdev_phys_port_id to more generic name

2014-04-17 Thread Stephen Hemminger
On Thu, 17 Apr 2014 14:14:29 +0200
Jiri Pirko  wrote:

> So this can be reused for identification of other "items" as well.
> 
> Signed-off-by: Jiri Pirko 
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  2 +-
>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  2 +-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |  2 +-
>  include/linux/netdevice.h| 16 
>  net/core/dev.c   |  2 +-
>  net/core/net-sysfs.c |  2 +-
>  net/core/rtnetlink.c |  6 +++---
>  7 files changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index a78edac..a4b25b1 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -12360,7 +12360,7 @@ static int bnx2x_validate_addr(struct net_device *dev)
>  }
>  
>  static int bnx2x_get_phys_port_id(struct net_device *netdev,
> -   struct netdev_phys_port_id *ppid)
> +   struct netdev_phys_item_id *ppid)
>  {
>   struct bnx2x *bp = netdev_priv(netdev);
>  
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index f085c2d..e784bb4 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2251,7 +2251,7 @@ static int mlx4_en_set_vf_link_state(struct net_device 
> *dev, int vf, int link_st
>  
>  #define PORT_ID_BYTE_LEN 8
>  static int mlx4_en_get_phys_port_id(struct net_device *dev,
> - struct netdev_phys_port_id *ppid)
> + struct netdev_phys_item_id *ppid)
>  {
>   struct mlx4_en_priv *priv = netdev_priv(dev);
>   struct mlx4_dev *mdev = priv->mdev->dev;
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> index 309d056..55af16a 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
> @@ -450,7 +450,7 @@ static void qlcnic_82xx_cancel_idc_work(struct 
> qlcnic_adapter *adapter)
>  }
>  
>  static int qlcnic_get_phys_port_id(struct net_device *netdev,
> -struct netdev_phys_port_id *ppid)
> +struct netdev_phys_item_id *ppid)
>  {
>   struct qlcnic_adapter *adapter = netdev_priv(netdev);
>   struct qlcnic_hardware_context *ahw = adapter->ahw;
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 775cc95..8cf4f5e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -735,13 +735,13 @@ struct netdev_fcoe_hbainfo {
>  };
>  #endif
>  
> -#define MAX_PHYS_PORT_ID_LEN 32
> +#define MAX_PHYS_ITEM_ID_LEN 32
>  

Why the rename?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch net-next RFC v3 02/10] net: rename netdev_phys_port_id to more generic name

2014-04-17 Thread Jiri Pirko
Thu, Apr 17, 2014 at 05:01:10PM CEST, step...@networkplumber.org wrote:
>On Thu, 17 Apr 2014 14:14:29 +0200
>Jiri Pirko  wrote:
>
>> So this can be reused for identification of other "items" as well.
>> 
>> Signed-off-by: Jiri Pirko 
>> ---
>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  2 +-
>>  drivers/net/ethernet/mellanox/mlx4/en_netdev.c   |  2 +-
>>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c |  2 +-
>>  include/linux/netdevice.h| 16 
>>  net/core/dev.c   |  2 +-
>>  net/core/net-sysfs.c |  2 +-
>>  net/core/rtnetlink.c |  6 +++---
>>  7 files changed, 16 insertions(+), 16 deletions(-)
>> 
>> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
>> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
>> index a78edac..a4b25b1 100644
>> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
>> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
>> @@ -12360,7 +12360,7 @@ static int bnx2x_validate_addr(struct net_device 
>> *dev)
>>  }
>>  
>>  static int bnx2x_get_phys_port_id(struct net_device *netdev,
>> -  struct netdev_phys_port_id *ppid)
>> +  struct netdev_phys_item_id *ppid)
>>  {
>>  struct bnx2x *bp = netdev_priv(netdev);
>>  
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
>> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> index f085c2d..e784bb4 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
>> @@ -2251,7 +2251,7 @@ static int mlx4_en_set_vf_link_state(struct net_device 
>> *dev, int vf, int link_st
>>  
>>  #define PORT_ID_BYTE_LEN 8
>>  static int mlx4_en_get_phys_port_id(struct net_device *dev,
>> -struct netdev_phys_port_id *ppid)
>> +struct netdev_phys_item_id *ppid)
>>  {
>>  struct mlx4_en_priv *priv = netdev_priv(dev);
>>  struct mlx4_dev *mdev = priv->mdev->dev;
>> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c 
>> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
>> index 309d056..55af16a 100644
>> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
>> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
>> @@ -450,7 +450,7 @@ static void qlcnic_82xx_cancel_idc_work(struct 
>> qlcnic_adapter *adapter)
>>  }
>>  
>>  static int qlcnic_get_phys_port_id(struct net_device *netdev,
>> -   struct netdev_phys_port_id *ppid)
>> +   struct netdev_phys_item_id *ppid)
>>  {
>>  struct qlcnic_adapter *adapter = netdev_priv(netdev);
>>  struct qlcnic_hardware_context *ahw = adapter->ahw;
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 775cc95..8cf4f5e 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -735,13 +735,13 @@ struct netdev_fcoe_hbainfo {
>>  };
>>  #endif
>>  
>> -#define MAX_PHYS_PORT_ID_LEN 32
>> +#define MAX_PHYS_ITEM_ID_LEN 32
>>  
>
>Why the rename?


I use the same struct in "net: introduce generic switch devices support"
to identify switch. So that is no longer only a "port id"
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] OpenFlow rule deletion during port destroy

2014-04-17 Thread Zoltan Kiss

On 16/04/14 18:00, Justin Pettit wrote:

On April 16, 2014 at 9:00:15 AM, Zoltan Kiss (zoltan.k...@citrix.com) wrote:


My actual problem is that an important rule gets deleted:

cookie=0x0, duration=1581.083s, table=0, n_packets=52804,
n_bytes=88968151, idle_age=0, priority=0,in_port=ANY actions=NORMAL

...

There is no sign the controller did anything about deleting those rules,
but somehow it still happened. Does anyone knows
Unfortunately it is hard to reproduce the problem, it is only
intermittent in one of our testcases.


I'm not sure that it's related, but it sounds similar to bug NIC-512 that I 
filed with Citrix over a year ago.  Here's the relevant part:
Many thanks Justin it was this problem indeed! Now I poked the right 
people to fix this quite forgotten problem!



I did commit a change that should lessen the impact:

   
http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=commitdiff;h=05dddba
Yes, this would solve the problem, but this patch removed it on the 
assumption that ofputil_port_from_string() solved it anyway:


http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=commitdiff;h=33ab38d9

But it doesn't, in fact it does some pretty mad conversions:
- ofputil_port_from_string pass a uint32_t* to str_to_uint
- it is casted to an int*, which is fortunately 32 bit in most places, 
but it is a dangerous assumption

- the string is converted to a 'long long' and then copied into that int*
- so if the string was "-1", the value of port32 will be 0x

I think the best place to catch this problem would be to check in 
str_to_uint() if the returned '(int *) u' is a negative number, and 
return false in that case. What do you think?


Zoli
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCHv10] ofproto-dpif: Remove the flow_dumper thread.

2014-04-17 Thread Ethan Jackson
Someone other than me needs to review this as I co-authored it.
Jarno, do you have time to look at it?

Ethan

On Wed, Apr 16, 2014 at 9:06 PM, Joe Stringer  wrote:
> From: Ethan Jackson 
>
> Previously, we had a separate flow_dumper thread that fetched flows from
> the datapath to distribute to revalidator threads. This patch takes the
> logic for dumping and pushes it into the revalidator threads, resulting
> in simpler code with similar performance to the current code.
>
> One thread, the "leader", is responsible for beginning and ending each
> flow dump, maintaining the flow_limit, and checking whether the
> revalidator threads need to exit. All revalidator threads dump,
> revalidate, delete datapath flows and garbage collect ukeys.
>
> Co-authored-by: Joe Stringer 
> Signed-off-by: Joe Stringer 
> ---
> v10: Minor whitespace and documentation fixups.
> v9: Update testsuite for also printing actions on flow_dump.
> v8: Rebase.
> v7: Add back logic (present in master) that deletes all flows older than
>  100ms if we are currently exceeding the flow limit.
> Rebase.
> v6: Shift ukeys hmaps from revalidators into udpif.
> Documentation tidyups.
> v5: Handle ukey creation race.
> Style fixes.
> v4: Rebase.
> v3: First post.
> ---
>  ofproto/ofproto-dpif-upcall.c |  633 
> -
>  tests/ofproto-dpif.at |8 +-
>  2 files changed, 305 insertions(+), 336 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 938cfde..b463f0e 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -67,34 +67,27 @@ struct handler {
>'mutex'. */
>  };
>
> -/* A thread that processes each kernel flow handed to it by the flow_dumper
> - * thread, updates OpenFlow statistics, and updates or removes the kernel 
> flow
> - * as necessary. */
> +/* A thread that processes datapath flows, updates OpenFlow statistics, and
> + * updates or removes them if necessary. */
>  struct revalidator {
>  struct udpif *udpif;   /* Parent udpif. */
>  char *name;/* Thread name. */
>
>  pthread_t thread;  /* Thread ID. */
> -struct hmap ukeys; /* Datapath flow keys. */
> -
> -uint64_t dump_seq;
> -
> -struct ovs_mutex mutex;/* Mutex guarding the following. */
> -pthread_cond_t wake_cond;
> -struct list udumps OVS_GUARDED;/* Unprocessed udumps. */
> -size_t n_udumps OVS_GUARDED;   /* Number of unprocessed udumps. */
> +struct hmap *ukeys;/* Points into udpif->ukeys for this
> +  revalidator. Used for GC phase. */
>  };
>
>  /* An upcall handler for ofproto_dpif.
>   *
>   * udpif has two logically separate pieces:
>   *
> - *- A "dispatcher" thread that reads upcalls from the kernel and 
> dispatches
> - *  them to one of several "handler" threads (see struct handler).
> + *- Miss handling threads led by a "dispatcher" thread that reads upcalls
> + *  from the kernel and dispatches them to one of several "handler" 
> threads
> + *  (see struct handler).
>   *
> - *- A "flow_dumper" thread that reads the kernel flow table and 
> dispatches
> - *  flows to one of several "revalidator" threads (see struct
> - *  revalidator). */
> + *- Revalidation threads which read the datapath flow table and maintains
> + *  them. */
>  struct udpif {
>  struct list list_node; /* In all_udpifs list. */
>
> @@ -104,7 +97,6 @@ struct udpif {
>  uint32_t secret;   /* Random seed for upcall hash. */
>
>  pthread_t dispatcher;  /* Dispatcher thread ID. */
> -pthread_t flow_dumper; /* Flow dumper thread ID. */
>
>  struct handler *handlers;  /* Upcall handlers. */
>  size_t n_handlers;
> @@ -112,14 +104,24 @@ struct udpif {
>  struct revalidator *revalidators;  /* Flow revalidators. */
>  size_t n_revalidators;
>
> -uint64_t last_reval_seq;   /* 'reval_seq' at last revalidation. 
> */
> -struct seq *reval_seq; /* Incremented to force revalidation. 
> */
> -
> -struct seq *dump_seq;  /* Increments each dump iteration. */
> -
>  struct latch exit_latch;   /* Tells child threads to exit. */
>
> +/* Revalidation. */
> +struct seq *reval_seq; /* Incremented to force revalidation. 
> */
> +bool need_revalidate;  /* As indicated by 'reval_seq'. */
> +bool reval_exit;   /* Set by leader on 'exit_latch. */
> +pthread_barrier_t reval_barrier;   /* Barrier used by revalidators. */
> +struct dpif_flow_dump dump;/* DPIF flow dump state. */
>  long long int dump_duration;   /* Duration of the last flow dump. */
> +struct seq *dump_seq;  /* Increme

[ovs-dev] [PATCH V4 1/3] bfd/cfm: Check status change before update status to database.

2014-04-17 Thread Alex Wang
This commit adds boolean flag in bfd/cfm module for checking
status change.  If there is no status change, the current
update to OVS database will skip the bfd/cfm session.

In the experiment with 5K bfd sessions, when one session is
flapping at rate of every 0.3 second, this patch reduces the
cpu utilization of the ovs-vswitchd thread from 13 to 6.

Signed-off-by: Alex Wang 
---
V3 -> V4:
- mirror the comment format of putting 'EOPNOTSUPP...' in newline.
- state in comment that '*status' is indeterminate if get_cfm_status()
  returns non-zero value.
- fix the comment of get_cfm_status(), since the caller do free the
  'status->rmps' array.
- update the comment of ofproto_port_get_bfd_status().
- always destroy the smap in instant_stats_run().

V2 -> V3:
- rebase.

PATCH -> V2:
- replace the callback function to boolean flag.
- add experiment results.
---
 lib/bfd.c  |   35 ---
 lib/bfd.h  |1 +
 lib/cfm.c  |   32 ++--
 lib/cfm.h  |1 +
 ofproto/ofproto-dpif.c |   37 ++---
 ofproto/ofproto-provider.h |   27 +--
 ofproto/ofproto.c  |   30 +-
 ofproto/ofproto.h  |6 +++---
 vswitchd/bridge.c  |   16 +++-
 9 files changed, 138 insertions(+), 47 deletions(-)

diff --git a/lib/bfd.c b/lib/bfd.c
index 8bfe385..e3e3ae5 100644
--- a/lib/bfd.c
+++ b/lib/bfd.c
@@ -213,6 +213,10 @@ struct bfd {
 long long int decay_detect_time; /* Decay detection time. */
 
 uint64_t flap_count;  /* Counts bfd forwarding flaps. */
+
+/* True when the variables returned by bfd_get_status() are changed
+ * since last check. */
+bool status_changed;
 };
 
 static struct ovs_mutex mutex = OVS_MUTEX_INITIALIZER;
@@ -240,6 +244,7 @@ static void bfd_put_details(struct ds *, const struct bfd *)
 static uint64_t bfd_rx_packets(const struct bfd *) OVS_REQUIRES(mutex);
 static void bfd_try_decay(struct bfd *) OVS_REQUIRES(mutex);
 static void bfd_decay_update(struct bfd *) OVS_REQUIRES(mutex);
+static void bfd_status_changed(struct bfd *) OVS_REQUIRES(mutex);
 
 static void bfd_forwarding_if_rx_update(struct bfd *) OVS_REQUIRES(mutex);
 static void bfd_unixctl_show(struct unixctl_conn *, int argc,
@@ -279,6 +284,20 @@ bfd_account_rx(struct bfd *bfd, const struct 
dpif_flow_stats *stats)
 }
 }
 
+/* Returns and resets the 'bfd->status_changed'. */
+bool
+bfd_check_status_change(struct bfd *bfd) OVS_EXCLUDED(mutex)
+{
+bool ret;
+
+ovs_mutex_lock(&mutex);
+ret = bfd->status_changed;
+bfd->status_changed = false;
+ovs_mutex_unlock(&mutex);
+
+return ret;
+}
+
 /* Returns a 'smap' of key value pairs representing the status of 'bfd'
  * intended for the OVS database. */
 void
@@ -749,7 +768,7 @@ bfd_process_packet(struct bfd *bfd, const struct flow *flow,
 }
 
 if (bfd->rmt_state != rmt_state) {
-seq_change(connectivity_seq_get());
+bfd_status_changed(bfd);
 }
 
 bfd->rmt_disc = ntohl(msg->my_disc);
@@ -872,7 +891,7 @@ bfd_forwarding__(struct bfd *bfd) OVS_REQUIRES(mutex)
 && bfd->rmt_diag != DIAG_RCPATH_DOWN;
 if (bfd->last_forwarding != last_forwarding) {
 bfd->flap_count++;
-seq_change(connectivity_seq_get());
+bfd_status_changed(bfd);
 }
 return bfd->last_forwarding;
 }
@@ -1085,7 +1104,7 @@ bfd_set_state(struct bfd *bfd, enum state state, enum 
diag diag)
 bfd_decay_update(bfd);
 }
 
-seq_change(connectivity_seq_get());
+bfd_status_changed(bfd);
 }
 }
 
@@ -1132,6 +1151,14 @@ bfd_decay_update(struct bfd * bfd) OVS_REQUIRES(mutex)
 bfd->decay_detect_time = MAX(bfd->decay_min_rx, 2000) + time_msec();
 }
 
+/* Records the status change and changes the global connectivity seq. */
+static void
+bfd_status_changed(struct bfd *bfd) OVS_REQUIRES(mutex)
+{
+seq_change(connectivity_seq_get());
+bfd->status_changed = true;
+}
+
 static void
 bfd_forwarding_if_rx_update(struct bfd *bfd) OVS_REQUIRES(mutex)
 {
@@ -1279,9 +1306,11 @@ bfd_unixctl_set_forwarding_override(struct unixctl_conn 
*conn, int argc,
 goto out;
 }
 bfd->forwarding_override = forwarding_override;
+bfd_status_changed(bfd);
 } else {
 HMAP_FOR_EACH (bfd, node, all_bfds) {
 bfd->forwarding_override = forwarding_override;
+bfd_status_changed(bfd);
 }
 }
 
diff --git a/lib/bfd.h b/lib/bfd.h
index 4e7d4cb..039b4dd 100644
--- a/lib/bfd.h
+++ b/lib/bfd.h
@@ -49,6 +49,7 @@ void bfd_unref(struct bfd *);
 
 void bfd_account_rx(struct bfd *, const struct dpif_flow_stats *);
 bool bfd_forwarding(struct bfd *);
+bool bfd_check_status_change(struct bfd *);
 void bfd_get_status(const struct bfd *, struct smap *);
 void bfd_set_netdev(struct bfd *, const struct netdev 

[ovs-dev] [PATCH V4 2/3] ofproto-dpif: Use sequence number to wake up main thread for packet-in I/O.

2014-04-17 Thread Alex Wang
This commit adds per 'struct ofproto_dpif' sequence number for
packet-in I/O.  Whenever a packet-in is inserted into the 'pins'
queue, the inserting thread will change the sequence number to
wake up the main thread.

Signed-off-by: Alex Wang 

---
V4:
- this is a functional change separated from the next patch.
---
 ofproto/ofproto-dpif.c |   17 +
 1 file changed, 17 insertions(+)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 5b0f6bd..f15c399 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -317,6 +317,8 @@ struct ofproto_dpif {
 
 /* Work queues. */
 struct guarded_list pins;  /* Contains "struct ofputil_packet_in"s. */
+struct seq *pins_seq;  /* For notifying 'pins' reception. */
+uint64_t pins_seqno;
 };
 
 /* All existing ofproto_dpif instances, indexed by ->up.name. */
@@ -375,6 +377,9 @@ ofproto_dpif_send_packet_in(struct ofproto_dpif *ofproto,
 COVERAGE_INC(packet_in_overflow);
 free(CONST_CAST(void *, pin->up.packet));
 free(pin);
+} else {
+/* Wakes up main thread for packet-in I/O. */
+seq_change(ofproto->pins_seq);
 }
 }
 
@@ -1148,6 +1153,9 @@ construct(struct ofproto *ofproto_)
 sset_init(&ofproto->port_poll_set);
 ofproto->port_poll_errno = 0;
 ofproto->change_seq = 0;
+ofproto->pins_seq = seq_create();
+ofproto->pins_seqno = seq_read(ofproto->pins_seq);
+
 
 SHASH_FOR_EACH_SAFE (node, next, &init_ofp_ports) {
 struct iface_hint *iface_hint = node->data;
@@ -1321,6 +1329,8 @@ destruct(struct ofproto *ofproto_)
 ovs_mutex_destroy(&ofproto->stats_mutex);
 ovs_mutex_destroy(&ofproto->vsp_mutex);
 
+seq_destroy(ofproto->pins_seq);
+
 close_dpif_backer(ofproto->backer);
 }
 
@@ -1353,6 +1363,12 @@ run(struct ofproto *ofproto_)
 }
 }
 
+/* Always updates the ofproto->pins_seqno to avoid frequent wakeup during
+ * flow restore.  Even though nothing is processed during flow restore,
+ * all queued 'pins' will be handled immediately when flow restore
+ * completes. */
+ofproto->pins_seqno = seq_read(ofproto->pins_seq);
+
 if (ofproto->netflow) {
 netflow_run(ofproto->netflow);
 }
@@ -1462,6 +1478,7 @@ wait(struct ofproto *ofproto_)
 }
 
 seq_wait(udpif_dump_seq(ofproto->backer->udpif), ofproto->dump_seq);
+seq_wait(ofproto->pins_seq, ofproto->pins_seqno);
 }
 
 static void
-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH V4 3/3] bridge: Remove the 'Instant' stats.

2014-04-17 Thread Alex Wang
This commit removes the 'Instant' stats related logic in bridge.c.
Instead, the corresponding status is updated immediately after the
global connectivity sequence number changes.

This change brings the following effects:

1. There is no change to the database update speed, since both the master
   (in ofproto_wait) and this patch wait on the global connectivity sequence
   number.

2. The main thread will no longer be waken up every 100 ms for 'Instant'
   stats check.  The related overhead is eliminated.

3. The update is more efficient, since the netdev's sequence number is
   used to avoid updating unchanged netdev status.

4. The 'Instant' stats logic checks the return status of database commit.
   If status is TXN_INCOMPLETE, future update is suspended until the previous
   commit finishes.  This patch imports this optimization to prevent
   overwhelming traffic when the status is flapping fast.

Signed-off-by: Alex Wang 

---
V3 -> V4:
- include only refactor changes in the commit.

V2 -> V3:
- refine the commit log.
- when the current status database transaction returns "TXN_INCOMPLETE",
  do not allow future status updates until the hanging transaction finishes.
  This is extremely helpful in that it can batch multiple future status
  changes into one transaction.  And that is main reason for the elimination
  of backlogged (jasonrpc) database transactions.

PATCH -> V2:
- refine the commit message.
- refine the comments.
---
 vswitchd/bridge.c |  261 +++--
 1 file changed, 112 insertions(+), 149 deletions(-)

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 295e1be..cd2301e 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -77,6 +77,7 @@ struct iface {
 char *name; /* Host network device name. */
 struct netdev *netdev;  /* Network device. */
 ofp_port_t ofp_port;/* OpenFlow port number. */
+uint64_t change_seq;
 
 /* These members are valid only within bridge_reconfigure(). */
 const char *type;   /* Usually same as cfg->type. */
@@ -263,7 +264,8 @@ static void iface_configure_qos(struct iface *, const 
struct ovsrec_qos *);
 static void iface_configure_cfm(struct iface *);
 static void iface_refresh_cfm_stats(struct iface *);
 static void iface_refresh_stats(struct iface *);
-static void iface_refresh_status(struct iface *);
+static void iface_refresh_netdev_status(struct iface *);
+static void iface_refresh_ofproto_status(struct iface *);
 static bool iface_is_synthetic(const struct iface *);
 static ofp_port_t iface_get_requested_ofp_port(
 const struct ovsrec_interface *);
@@ -1506,7 +1508,7 @@ iface_create(struct bridge *br, const struct 
ovsrec_interface *iface_cfg,
 
 /* Populate initial status in database. */
 iface_refresh_stats(iface);
-iface_refresh_status(iface);
+iface_refresh_netdev_status(iface);
 
 /* Add bond fake iface if necessary. */
 if (port_is_bond_fake_iface(port)) {
@@ -1764,22 +1766,27 @@ dpid_from_hash(const void *data, size_t n)
 }
 
 static void
-iface_refresh_status(struct iface *iface)
+iface_refresh_netdev_status(struct iface *iface)
 {
 struct smap smap;
 
 enum netdev_features current;
-int64_t bps;
-int mtu;
-int64_t mtu_64;
+enum netdev_flags flags;
+const char *link_state;
 uint8_t mac[ETH_ADDR_LEN];
-int64_t ifindex64;
-int error;
+int64_t bps, mtu_64, ifindex64, link_resets;
+int mtu, error;
 
 if (iface_is_synthetic(iface)) {
 return;
 }
 
+if (iface->change_seq == netdev_get_change_seq(iface->netdev)) {
+return;
+}
+
+iface->change_seq = netdev_get_change_seq(iface->netdev);
+
 smap_init(&smap);
 
 if (!netdev_get_status(iface->netdev, &smap)) {
@@ -1790,6 +1797,21 @@ iface_refresh_status(struct iface *iface)
 
 smap_destroy(&smap);
 
+error = netdev_get_flags(iface->netdev, &flags);
+if (!error) {
+const char *state = flags & NETDEV_UP ? "up" : "down";
+
+ovsrec_interface_set_admin_state(iface->cfg, state);
+} else {
+ovsrec_interface_set_admin_state(iface->cfg, NULL);
+}
+
+link_state = netdev_get_carrier(iface->netdev) ? "up" : "down";
+ovsrec_interface_set_link_state(iface->cfg, link_state);
+
+link_resets = netdev_get_carrier_resets(iface->netdev);
+ovsrec_interface_set_link_resets(iface->cfg, &link_resets, 1);
+
 error = netdev_get_features(iface->netdev, ¤t, NULL, NULL, NULL);
 bps = !error ? netdev_features_to_bps(current, 0) : 0;
 if (bps) {
@@ -1829,6 +1851,36 @@ iface_refresh_status(struct iface *iface)
 ovsrec_interface_set_ifindex(iface->cfg, &ifindex64, 1);
 }
 
+static void
+iface_refresh_ofproto_status(struct iface *iface)
+{
+struct smap smap;
+int current, error;
+
+if (iface_is_synthetic(iface)) {
+return;
+}
+
+current = ofproto_port_is_lacp_current(iface->port->bridge->ofproto,
+  

Re: [ovs-dev] [PATCH V3 2/2] bridge: Remove the 'Instant' stats.

2014-04-17 Thread Alex Wang
>
>  @@ -2481,8 +2436,17 @@ bridge_wait(void)
>>>  poll_timer_wait_until(iface_stats_timer);
>>>  }
>>>
>>> +/* If the status database transaction is "TXN_INCOMPLETE" in this
>>> run,
>>> + * register a timeout in "STATUS_CHECK_AGAIN_MSEC".  Else, wait on
>>> the
>>> + * global connectivity sequence number.  Note, this also helps batch
>>> + * multiple status changes into one transaction. */
>>> +if (status_txn) {
>>> +poll_timer_wait_until(time_msec() + STATUS_CHECK_AGAIN_MSEC);
>>> +} else {
>>> +seq_wait(connectivity_seq_get(), connectivity_seqno);
>>> +}
>>>
>>
>> So, this is like a backoff? If there's so much database update happening
>> that we can't immediately transact (I equate this with TXN_INCOMPLETE),
>> then don't try again for another 500ms?
>> Otherwise, respond immediately to any changes to connectivity?
>>
>
>
> Yes, this is like a backoff.   we are doing the same backoff in current
> master code using the 'INSTANT_INTERVAL_MSEC'.
>
> Upon further thinking, I decide to go back to using the 100ms backoff
> interval.
>
> The reason is that
>
> 1. the main thread will always be waken up immediately after connectivity
> change (since ofproto_wait() always wait on it).  So,  when connectivity seq
> is changing fast, the backoff will be of no use and the main thread
> will keep checking the 'TXN_INCOMPLETE' transaction.
> 2. backoff is only useful when the previous update status is
> 'TXN_INCOMPLETE' and now we only have infrequent connectivity changes.  we
> need
> to wakeup periodically and check for completion of previous update.
>
> So, the backoff interval should not make a big difference.
>
> My previous experiment showed that there is slight reduction of backlog
> when the interval is 500ms.  (10K tunnel, flap the forwarding flag every
> 0.3 sec)
> I'll experiment again to confirm it.
>
> For my next version, I'll only include the refactoring changes.
>


Hey Joe,

with more experiment (10K bfd sessions, flapping forwarding flap of all
sessions every 0.3 sec) I have following observation:
1. Slight backlog can be observed via 'top' out memory growth on both
master and my patch.
2. Master's backlog is growing faster, since there are some netdev related
entries updated in the instant_stats_run().
3. There is no observable change in backlog growing rate between 100ms
backoff and 500ms backoff, on my patch.

So, I dropped the change of backoff interval and kept using 100ms in my
recent V4 patch (http://openvswitch.org/pipermail/dev/2014-April/039054.html
)

For the slight memory backlog, I think my patch makes it hard to happen
(need to flap all 10K tunnels) than on master (just flap one tunnel).  So,
I'll leave it so is for now.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [recirculation 2/3] datapath: add hash action

2014-04-17 Thread Jesse Gross
On Wed, Apr 16, 2014 at 7:45 PM, Pravin Shelar  wrote:
> On Thu, Apr 17, 2014 at 8:12 AM, Jesse Gross  wrote:
>> On Wed, Apr 16, 2014 at 7:39 PM, Pravin Shelar  wrote:
>>> On Wed, Apr 16, 2014 at 2:09 AM, Andy Zhou  wrote:
 Thanks for the review. I will send V2.

 On Tue, Apr 15, 2014 at 7:14 AM, Pravin Shelar  wrote:
> On Sat, Apr 12, 2014 at 3:30 AM, Andy Zhou  wrote:
>> Implements Linux kernel datapath hash action. Hash action computes
>> hash and stores it into current packet key.
>>
>> Signed-off-by: Andy Zhou 
>>
>> [OVS_KEY_ATTR_ARP] = sizeof(struct ovs_key_arp),
>> [OVS_KEY_ATTR_ND] = sizeof(struct ovs_key_nd),
>> +   [OVS_KEY_ATTR_DP_HASH] = sizeof(u32),
>> [OVS_KEY_ATTR_TUNNEL] = -1,
>>  };
>>
>> @@ -455,6 +461,19 @@ static int ipv4_tun_to_nlattr(struct sk_buff *skb,
>>  static int metadata_from_nlattrs(struct sw_flow_match *match,  u64 
>> *attrs,
>>  const struct nlattr **a, bool is_mask)
>>  {
>> +
>> +   if (*attrs & (1ULL << OVS_KEY_ATTR_DP_HASH)) {
>> +   u32 hash_val = nla_get_u32(a[OVS_KEY_ATTR_DP_HASH]);
>> +
>> +   if (!is_mask && !hash_val) {
>> +   OVS_NLERR("Hash value can not be zero\n");
>> +   return -EINVAL;
>> +   }
>> +
>> +   SW_FLOW_KEY_PUT(match, dp_hash, hash_val, is_mask);
>> +   *attrs &= ~(1ULL << OVS_KEY_ATTR_DP_HASH);
>> +   }
> Is hash allowed to be partially masked?
 Yes.
>>>
>>> How can arbitrary hash value which is outside control of vswitchd can
>>> be partially masked. I think we can either exact match it or
>>> completely wildcard. Am I missing something?
>>
>> If you are trying to split between bond slaves then you probably want
>> to mask off some number of bits based on the number of interfaces to
>> randomly divide the traffic.
> I see.
> But then we should have some control over hash function? Or is it too
> expensive ?

There's some control over the hash function by being able to change the basis.

However, for the most part it's not really needed assuming that the
hash is reasonably well distributed. There's nothing particularly
special about the userspace hash functions that we currently use. We
usually use 256 hash buckets currently, so you can just mask off all
but 8 bits and then setup 256 flows to steer each bit combination to
roughly balance the traffic.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [recirc datapath V3 RFC 1/2] datapath: add hash action

2014-04-17 Thread Jesse Gross
On Wed, Apr 16, 2014 at 11:49 PM, Andy Zhou  wrote:
> diff --git a/datapath/actions.c b/datapath/actions.c
> index 82cfd2d..820075f 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> +static void execute_hash(struct sk_buff *skb, const struct nlattr *attr)
> +{
> +   struct sw_flow_key *key = OVS_CB(skb)->pkt_key;
> +   u32 hash = 0;
> +
> +   /* OVS_HASH_ALG_L4 is the only possible hash algorithm.  */
> +   hash = skb_get_rxhash(skb);
> +   if (!hash)
> +   hash = 0x1;
> +
> +   key->ovs_flow_hash = hash;

I just realized that this ignores the basis. Isn't that a problem?

We could potentially rehash the hash - we do this in
flow_table.c:find_bucket() although it's not a perfect solution.

> @@ -511,7 +524,7 @@ static int do_execute_actions(struct datapath *dp, struct 
> sk_buff *skb,
> const struct nlattr *attr, int len, bool keep_skb)
>  {
> /* Every output action needs a separate clone of 'skb', but the common
> -* case is just a single output action, so that doing a clone and
> +* case is just a single output action, so that doin a clone and

This introduces a typo.

> diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
> index 5c32cd0..f464a5b 100644
> --- a/datapath/flow_netlink.c
> +++ b/datapath/flow_netlink.c
> @@ -123,6 +123,7 @@ static bool match_validate(const struct sw_flow_match 
> *match,
> | (1ULL << OVS_KEY_ATTR_ICMP)
> | (1ULL << OVS_KEY_ATTR_ICMPV6)
> | (1ULL << OVS_KEY_ATTR_ARP)
> +   | (1ULL << OVS_KEY_ATTR_DP_HASH)

I don't think this should be in the list, since these are for things
that require additional validation.

> @@ -130,7 +131,6 @@ static bool match_validate(const struct sw_flow_match 
> *match,
>| (1ULL << OVS_KEY_ATTR_IN_PORT)
>| (1ULL << OVS_KEY_ATTR_ETHERTYPE));
>
> -   /* Check key attributes. */

I think this comment should remain.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [recirc datapath V3 RFC 2/2] datapath: add recirc action

2014-04-17 Thread Jesse Gross
On Wed, Apr 16, 2014 at 11:49 PM, Andy Zhou  wrote:
> diff --git a/datapath/actions.c b/datapath/actions.c
> index 820075f..9307ee1 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> +static int execute_recirc(struct datapath *dp, struct sk_buff *skb,
> +const struct nlattr *a)
> +{
> +   struct sw_flow_key recirc_key;
> +   const struct vport *p = OVS_CB(skb)->input_vport;
> +   uint32_t hash = OVS_CB(skb)->pkt_key->ovs_flow_hash;
> +   int err;
> +
> +   err = ovs_flow_extract(skb, p->port_no, &recirc_key);

I think there's a possibility of OVS_CB(skb)->input_vport being NULL
if the packet comes from userspace through ovs_packet_cmd_execute().

> diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
> index f464a5b..6edf54b 100644
> --- a/datapath/flow_netlink.c
> +++ b/datapath/flow_netlink.c
> @@ -902,6 +911,10 @@ int ovs_nla_put_flow(const struct sw_flow_key *swkey,
>  output->ovs_flow_hash)))
> goto nla_put_failure;
>
> +   if ((swkey->recirc_id) &&
> +   (nla_put_u32(skb, OVS_KEY_ATTR_RECIRC_ID, output->recirc_id)))
> +   goto nla_put_failure;

I think the extra sets of parentheses around each clause is
unnecessary - and actually I see this is in the hash patch as well.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [recirc datapath V3 RFC 1/2] datapath: add hash action

2014-04-17 Thread Andy Zhou
On Thu, Apr 17, 2014 at 2:48 PM, Jesse Gross  wrote:
> On Wed, Apr 16, 2014 at 11:49 PM, Andy Zhou  wrote:
>> diff --git a/datapath/actions.c b/datapath/actions.c
>> index 82cfd2d..820075f 100644
>> --- a/datapath/actions.c
>> +++ b/datapath/actions.c
>> +static void execute_hash(struct sk_buff *skb, const struct nlattr *attr)
>> +{
>> +   struct sw_flow_key *key = OVS_CB(skb)->pkt_key;
>> +   u32 hash = 0;
>> +
>> +   /* OVS_HASH_ALG_L4 is the only possible hash algorithm.  */
>> +   hash = skb_get_rxhash(skb);
>> +   if (!hash)
>> +   hash = 0x1;
>> +
>> +   key->ovs_flow_hash = hash;
>
> I just realized that this ignores the basis. Isn't that a problem?
>
> We could potentially rehash the hash - we do this in
> flow_table.c:find_bucket() although it's not a perfect solution.
This is a reasonable solution. I will make the change.  WIll fix the
reset in the next rev.  Thanks.
>
>> @@ -511,7 +524,7 @@ static int do_execute_actions(struct datapath *dp, 
>> struct sk_buff *skb,
>> const struct nlattr *attr, int len, bool keep_skb)
>>  {
>> /* Every output action needs a separate clone of 'skb', but the 
>> common
>> -* case is just a single output action, so that doing a clone and
>> +* case is just a single output action, so that doin a clone and
>
> This introduces a typo.
>
>> diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
>> index 5c32cd0..f464a5b 100644
>> --- a/datapath/flow_netlink.c
>> +++ b/datapath/flow_netlink.c
>> @@ -123,6 +123,7 @@ static bool match_validate(const struct sw_flow_match 
>> *match,
>> | (1ULL << OVS_KEY_ATTR_ICMP)
>> | (1ULL << OVS_KEY_ATTR_ICMPV6)
>> | (1ULL << OVS_KEY_ATTR_ARP)
>> +   | (1ULL << OVS_KEY_ATTR_DP_HASH)
>
> I don't think this should be in the list, since these are for things
> that require additional validation.
>
>> @@ -130,7 +131,6 @@ static bool match_validate(const struct sw_flow_match 
>> *match,
>>| (1ULL << OVS_KEY_ATTR_IN_PORT)
>>| (1ULL << OVS_KEY_ATTR_ETHERTYPE));
>>
>> -   /* Check key attributes. */
>
> I think this comment should remain.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] dpif-linux: Implement the API functions to allow multiple handler threads read upcall.

2014-04-17 Thread Ethan Jackson
I don't like that dpif_linux_run() has to take the write lock every
time it's called.  Maybe you could make the dpif->refresh_channels
variable atomic, check it, and only take the lock if it's true?

I think a lot of these functions could use annotations saying whether
they require a read/write lock from the upcall lock.

Otherwise looks fine to me.

Ethan

On Tue, Apr 15, 2014 at 1:23 PM, Alex Wang  wrote:
> Signed-off-by: Alex Wang 
>
> ---
> V7 -> V8:
> - rebase.
>
> V6 -> V7:
> - rebase.
> - set n_handlers to 0 in destroy_all_channels().
> - fix the indexing error in dpif_linux_recv_purge().
>
> V5 -> V6:
> - move the 'struct dpif_epoll' in 'struct dpif_handler'
> - fix typos based on review.
> - refine refresh_channels() comments.
> - rename 'n_pids' of 'struct dpif_linux_vport' to 'n_upcall_pids'.
>
> V4 -> V5:
> - rebase.
>
> V3 -> V4:
> - add check of handler_id range.
>
> V2 -> V3:
> - use OVS_DP_ATTR_USER_FEATURES to inform datapath about the type of
>   OVS_VPORT_ATTR_UPCALL_PID attribute.
> - close epoll fd when destroying all channels.
>
> PATCH -> V2:
> - rebase.
>
> major changes since RFC:
> - free the 'upcall_pids' pointer in dpif_linux_refresh_channels().
> - for cache access efficiency, in 'recv()' index the handlers
>   array first and then port channels array.
> ---
>  lib/dpif-linux.c |  549 
> +++---
>  lib/dpif-linux.h |3 +-
>  2 files changed, 361 insertions(+), 191 deletions(-)
>
> diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c
> index e8efdac..87f180e 100644
> --- a/lib/dpif-linux.c
> +++ b/lib/dpif-linux.c
> @@ -36,6 +36,7 @@
>  #include "dpif-provider.h"
>  #include "dynamic-string.h"
>  #include "flow.h"
> +#include "fat-rwlock.h"
>  #include "netdev.h"
>  #include "netdev-linux.h"
>  #include "netdev-vport.h"
> @@ -132,7 +133,13 @@ struct dpif_channel {
>  long long int last_poll;/* Last time this channel was polled. */
>  };
>
> -static void report_loss(struct dpif *, struct dpif_channel *);
> +struct dpif_handler {
> +struct dpif_channel *channels;/* Array of channels for each handler. */
> +struct epoll_event *epoll_events;
> +int epoll_fd; /* epoll fd that includes channel socks. */
> +int n_events; /* Num events returned by epoll_wait(). */
> +int event_offset; /* Offset into 'epoll_events'. */
> +};
>
>  /* Datapath interface for the openvswitch Linux kernel module. */
>  struct dpif_linux {
> @@ -140,19 +147,20 @@ struct dpif_linux {
>  int dp_ifindex;
>
>  /* Upcall messages. */
> -struct ovs_mutex upcall_lock;
> -int uc_array_size;  /* Size of 'channels' and 'epoll_events'. */
> -struct dpif_channel *channels;
> -struct epoll_event *epoll_events;
> -int epoll_fd;   /* epoll fd that includes channel socks. */
> -int n_events;   /* Num events returned by epoll_wait(). */
> -int event_offset;   /* Offset into 'epoll_events'. */
> +struct fat_rwlock upcall_lock;
> +struct dpif_handler *handlers;
> +uint32_t n_handlers;   /* Num of upcall handlers. */
> +int uc_array_size; /* Size of 'handler->channels' and */
> +   /* 'handler->epoll_events'. */
>
>  /* Change notification. */
>  struct nl_sock *port_notifier; /* vport multicast group subscriber. */
>  bool refresh_channels;
>  };
>
> +static void report_loss(struct dpif *, struct dpif_channel *, uint32_t 
> ch_idx,
> +uint32_t handler_id);
> +
>  static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(, 5);
>
>  /* Generic Netlink family numbers for OVS.
> @@ -172,8 +180,7 @@ static int dpif_linux_init(void);
>  static int open_dpif(const struct dpif_linux_dp *, struct dpif **);
>  static uint32_t dpif_linux_port_get_pid(const struct dpif *,
>  odp_port_t port_no, uint32_t hash);
> -static int dpif_linux_refresh_channels(struct dpif *);
> -
> +static int dpif_linux_refresh_channels(struct dpif *, uint32_t n_handlers);
>  static void dpif_linux_vport_to_ofpbuf(const struct dpif_linux_vport *,
> struct ofpbuf *);
>  static int dpif_linux_vport_from_ofpbuf(struct dpif_linux_vport *,
> @@ -238,6 +245,7 @@ dpif_linux_open(const struct dpif_class *class 
> OVS_UNUSED, const char *name,
>  }
>  dp_request.name = name;
>  dp_request.user_features |= OVS_DP_F_UNALIGNED;
> +dp_request.user_features |= OVS_DP_F_VPORT_PIDS;
>  error = dpif_linux_dp_transact(&dp_request, &dp, &buf);
>  if (error) {
>  return error;
> @@ -255,8 +263,7 @@ open_dpif(const struct dpif_linux_dp *dp, struct dpif 
> **dpifp)
>
>  dpif = xzalloc(sizeof *dpif);
>  dpif->port_notifier = NULL;
> -ovs_mutex_init(&dpif->upcall_lock);
> -dpif->epoll_fd = -1;
> +fat_rwlock_init(&dpif->upcall_lock);
>
>  dpif_init(&dpif->

Re: [ovs-dev] [PATCH 1/2] dpif-linux: Implement the API functions to allow multiple handler threads read upcall.

2014-04-17 Thread Ethan Jackson
All that sounds good to me, go ahead and add my acked by then.

Acked-by: Ethan Jackson 


On Thu, Apr 17, 2014 at 4:04 PM, Alex Wang  wrote:
> Thx for the review, Ethan,
>
>
> On Thu, Apr 17, 2014 at 3:50 PM, Ethan Jackson  wrote:
>>
>> I don't like that dpif_linux_run() has to take the write lock every
>> time it's called.  Maybe you could make the dpif->refresh_channels
>> variable atomic, check it, and only take the lock if it's true
>
>
>
> Yeah, just checked that refresh_channels can only be set/used by main
> thread.  So, I'll move
> the lock acquisition inside if-statement.
>
>
>>
>> I think a lot of these functions could use annotations saying whether
>> they require a read/write lock from the upcall lock.
>
>
>
> Thx for reminding me, I'll add them.  my bad,
>
>
>
>>
>>
>>
>> Otherwise looks fine to me.
>>
>> Ethan
>>
>> On Tue, Apr 15, 2014 at 1:23 PM, Alex Wang  wrote:
>> > Signed-off-by: Alex Wang 
>> >
>> > ---
>> > V7 -> V8:
>> > - rebase.
>> >
>> > V6 -> V7:
>> > - rebase.
>> > - set n_handlers to 0 in destroy_all_channels().
>> > - fix the indexing error in dpif_linux_recv_purge().
>> >
>> > V5 -> V6:
>> > - move the 'struct dpif_epoll' in 'struct dpif_handler'
>> > - fix typos based on review.
>> > - refine refresh_channels() comments.
>> > - rename 'n_pids' of 'struct dpif_linux_vport' to 'n_upcall_pids'.
>> >
>> > V4 -> V5:
>> > - rebase.
>> >
>> > V3 -> V4:
>> > - add check of handler_id range.
>> >
>> > V2 -> V3:
>> > - use OVS_DP_ATTR_USER_FEATURES to inform datapath about the type of
>> >   OVS_VPORT_ATTR_UPCALL_PID attribute.
>> > - close epoll fd when destroying all channels.
>> >
>> > PATCH -> V2:
>> > - rebase.
>> >
>> > major changes since RFC:
>> > - free the 'upcall_pids' pointer in dpif_linux_refresh_channels().
>> > - for cache access efficiency, in 'recv()' index the handlers
>> >   array first and then port channels array.
>> > ---
>> >  lib/dpif-linux.c |  549
>> > +++---
>> >  lib/dpif-linux.h |3 +-
>> >  2 files changed, 361 insertions(+), 191 deletions(-)
>> >
>> > diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c
>> > index e8efdac..87f180e 100644
>> > --- a/lib/dpif-linux.c
>> > +++ b/lib/dpif-linux.c
>> > @@ -36,6 +36,7 @@
>> >  #include "dpif-provider.h"
>> >  #include "dynamic-string.h"
>> >  #include "flow.h"
>> > +#include "fat-rwlock.h"
>> >  #include "netdev.h"
>> >  #include "netdev-linux.h"
>> >  #include "netdev-vport.h"
>> > @@ -132,7 +133,13 @@ struct dpif_channel {
>> >  long long int last_poll;/* Last time this channel was polled.
>> > */
>> >  };
>> >
>> > -static void report_loss(struct dpif *, struct dpif_channel *);
>> > +struct dpif_handler {
>> > +struct dpif_channel *channels;/* Array of channels for each
>> > handler. */
>> > +struct epoll_event *epoll_events;
>> > +int epoll_fd; /* epoll fd that includes channel
>> > socks. */
>> > +int n_events; /* Num events returned by
>> > epoll_wait(). */
>> > +int event_offset; /* Offset into 'epoll_events'. */
>> > +};
>> >
>> >  /* Datapath interface for the openvswitch Linux kernel module. */
>> >  struct dpif_linux {
>> > @@ -140,19 +147,20 @@ struct dpif_linux {
>> >  int dp_ifindex;
>> >
>> >  /* Upcall messages. */
>> > -struct ovs_mutex upcall_lock;
>> > -int uc_array_size;  /* Size of 'channels' and
>> > 'epoll_events'. */
>> > -struct dpif_channel *channels;
>> > -struct epoll_event *epoll_events;
>> > -int epoll_fd;   /* epoll fd that includes channel
>> > socks. */
>> > -int n_events;   /* Num events returned by epoll_wait().
>> > */
>> > -int event_offset;   /* Offset into 'epoll_events'. */
>> > +struct fat_rwlock upcall_lock;
>> > +struct dpif_handler *handlers;
>> > +uint32_t n_handlers;   /* Num of upcall handlers. */
>> > +int uc_array_size; /* Size of 'handler->channels' and
>> > */
>> > +   /* 'handler->epoll_events'. */
>> >
>> >  /* Change notification. */
>> >  struct nl_sock *port_notifier; /* vport multicast group subscriber.
>> > */
>> >  bool refresh_channels;
>> >  };
>> >
>> > +static void report_loss(struct dpif *, struct dpif_channel *, uint32_t
>> > ch_idx,
>> > +uint32_t handler_id);
>> > +
>> >  static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(, 5);
>> >
>> >  /* Generic Netlink family numbers for OVS.
>> > @@ -172,8 +180,7 @@ static int dpif_linux_init(void);
>> >  static int open_dpif(const struct dpif_linux_dp *, struct dpif **);
>> >  static uint32_t dpif_linux_port_get_pid(const struct dpif *,
>> >  odp_port_t port_no, uint32_t
>> > hash);
>> > -static int dpif_linux_refresh_channels(struct dpif *);
>> > -
>> > +static int dpif_linux_refresh_channels(struct dpif *, uint32_t
>> > n_handlers);
>> >  static void dpi

Re: [ovs-dev] [PATCH 2/2] ofproto-dpif-upcall: Remove the dispatcher thread.

2014-04-17 Thread Ethan Jackson
I'd remove the comment about the flow dumper being removed in future.
Doesn't really fit as a code comment.

The comment to upcall_destroy() is rather weird, that function doesn't
know how it's allocated.  At any rate, there's only one caller, I'd be
inclined to just get rid of that function and do it directly.

After the dpif_recv() function, there's a stale comment about
upcall_destroy() which doesn't make sense anymore.

The whole is_valid bit is ugly and a sign that we have the wrong
architecture.  I think the problem is that we're rather artifically
separating the dpif_recv calls, from the upcall classification which
happens in handle_upcalls.  I think the following steps would make the
code cleaner:

1) Get rid of struct upcall entirely (yay).
2) Move all the data in struct upcall onto the stack of handle_upcalls().
3) Have each iteration of the first loop of handle_upcalls() call
dpif_recv() directly instead of taking an array of upcalls as an
argument.

What do you think?
Ethan

On Tue, Apr 15, 2014 at 1:23 PM, Alex Wang  wrote:
> With the foundation laid in previous commits, this commit
> removes the 'dispatcher' thread by allowing 'handler'
> threads to read upcalls directly from dpif.
>
> This commit significantly simplifies the flow miss handling
> code and brings slight improvement to flow setup rate.
>
> Signed-off-by: Alex Wang 
>
> ---
> V7 -> V8:
> - rebase.
>
> V6 -> V7:
> - rebase.
>
> V5 -> V6:
> - remove the input argument 'free_upcall' in upcall-destroy().
> - destroy all upcalls at the end of handle_upcalls().
>
> V4 -> V5:
> - rebase.
>
> V3 -> V4:
> - rebase.
>
> V2 -> V3:
> - rebase.
>
> PATCH -> V2:
> - rebase.
> ---
>  ofproto/ofproto-dpif-upcall.c |  280 
> +++--
>  ofproto/ofproto-dpif-xlate.c  |6 +-
>  2 files changed, 78 insertions(+), 208 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 938cfde..951bb64 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -45,26 +45,13 @@
>
>  VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);
>
> -COVERAGE_DEFINE(upcall_queue_overflow);
> -
> -/* A thread that processes each upcall handed to it by the dispatcher thread,
> - * forwards the upcall's packet, and possibly sets up a kernel flow as a
> - * cache. */
> +/* A thread that reads upcalls from dpif, forwards each upcall's packet,
> + * and possibly sets up a kernel flow as a cache. */
>  struct handler {
>  struct udpif *udpif;   /* Parent udpif. */
>  pthread_t thread;  /* Thread ID. */
>  char *name;/* Thread name. */
> -
> -struct ovs_mutex mutex;/* Mutex guarding the following. */
> -
> -/* Atomic queue of unprocessed upcalls. */
> -struct list upcalls OVS_GUARDED;
> -size_t n_upcalls OVS_GUARDED;
> -
> -bool need_signal;  /* Only changed by the dispatcher. */
> -
> -pthread_cond_t wake_cond;  /* Wakes 'thread' while holding
> -  'mutex'. */
> +uint32_t handler_id;   /* Handler id. */
>  };
>
>  /* A thread that processes each kernel flow handed to it by the flow_dumper
> @@ -87,12 +74,18 @@ struct revalidator {
>
>  /* An upcall handler for ofproto_dpif.
>   *
> - * udpif has two logically separate pieces:
> + * udpif keeps records of two kind of logically separate units:
> + *
> + *- An array of 'struct handler's for upcall handling and flow
> + *  installation.
>   *
> - *- A "dispatcher" thread that reads upcalls from the kernel and 
> dispatches
> - *  them to one of several "handler" threads (see struct handler).
> + *- An array of 'struct revalidator's for flow revalidation and
> + *  stats collection.
>   *
> - *- A "flow_dumper" thread that reads the kernel flow table and 
> dispatches
> + * the following module is currently in use but is going to be removed
> + * in the near feature:
> + *
> + *- "flow_dumper" thread that reads the kernel flow table and dispatches
>   *  flows to one of several "revalidator" threads (see struct
>   *  revalidator). */
>  struct udpif {
> @@ -103,7 +96,6 @@ struct udpif {
>
>  uint32_t secret;   /* Random seed for upcall hash. */
>
> -pthread_t dispatcher;  /* Dispatcher thread ID. */
>  pthread_t flow_dumper; /* Flow dumper thread ID. */
>
>  struct handler *handlers;  /* Upcall handlers. */
> @@ -143,7 +135,7 @@ enum upcall_type {
>  };
>
>  struct upcall {
> -struct list list_node;  /* For queuing upcalls. */
> +bool is_valid;  /* If the upcall can be used. */
>  struct flow_miss *flow_miss;/* This upcall's flow_miss. */
>
>  /* Raw upcall plus data for keeping track of the memory backing it. */
> @@ -221,10 +213,9 @@ static void upcall_destroy(struct upcall *);
>  static struct vlog_rate_limit 

[ovs-dev] [PATCH V9 1/3] dpif-linux: Implement the API functions to allow multiple handler threads read upcall.

2014-04-17 Thread Alex Wang
Signed-off-by: Alex Wang 

---
V8 -> v9:
- move the fat-lock acquisition inside the if-statement in dpif_linux_run().
- fix an error: dpif_linux_recv_purge() should acquire wrlock.

V7 -> V8:
- rebase.

V6 -> V7:
- rebase.
- set n_handlers to 0 in destroy_all_channels().
- fix the indexing error in dpif_linux_recv_purge().

V5 -> V6:
- move the 'struct dpif_epoll' in 'struct dpif_handler'
- fix typos based on review.
- refine refresh_channels() comments.
- rename 'n_pids' of 'struct dpif_linux_vport' to 'n_upcall_pids'.

V4 -> V5:
- rebase.

V3 -> V4:
- add check of handler_id range.

V2 -> V3:
- use OVS_DP_ATTR_USER_FEATURES to inform datapath about the type of
  OVS_VPORT_ATTR_UPCALL_PID attribute.
- close epoll fd when destroying all channels.

PATCH -> V2:
- rebase.

major changes since RFC:
- free the 'upcall_pids' pointer in dpif_linux_refresh_channels().
- for cache access efficiency, in 'recv()' index the handlers
  array first and then port channels array.
---
 lib/dpif-linux.c |  549 +++---
 lib/dpif-linux.h |3 +-
 2 files changed, 361 insertions(+), 191 deletions(-)

diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c
index e8efdac..c119c12 100644
--- a/lib/dpif-linux.c
+++ b/lib/dpif-linux.c
@@ -36,6 +36,7 @@
 #include "dpif-provider.h"
 #include "dynamic-string.h"
 #include "flow.h"
+#include "fat-rwlock.h"
 #include "netdev.h"
 #include "netdev-linux.h"
 #include "netdev-vport.h"
@@ -132,7 +133,13 @@ struct dpif_channel {
 long long int last_poll;/* Last time this channel was polled. */
 };
 
-static void report_loss(struct dpif *, struct dpif_channel *);
+struct dpif_handler {
+struct dpif_channel *channels;/* Array of channels for each handler. */
+struct epoll_event *epoll_events;
+int epoll_fd; /* epoll fd that includes channel socks. */
+int n_events; /* Num events returned by epoll_wait(). */
+int event_offset; /* Offset into 'epoll_events'. */
+};
 
 /* Datapath interface for the openvswitch Linux kernel module. */
 struct dpif_linux {
@@ -140,19 +147,20 @@ struct dpif_linux {
 int dp_ifindex;
 
 /* Upcall messages. */
-struct ovs_mutex upcall_lock;
-int uc_array_size;  /* Size of 'channels' and 'epoll_events'. */
-struct dpif_channel *channels;
-struct epoll_event *epoll_events;
-int epoll_fd;   /* epoll fd that includes channel socks. */
-int n_events;   /* Num events returned by epoll_wait(). */
-int event_offset;   /* Offset into 'epoll_events'. */
+struct fat_rwlock upcall_lock;
+struct dpif_handler *handlers;
+uint32_t n_handlers;   /* Num of upcall handlers. */
+int uc_array_size; /* Size of 'handler->channels' and */
+   /* 'handler->epoll_events'. */
 
 /* Change notification. */
 struct nl_sock *port_notifier; /* vport multicast group subscriber. */
 bool refresh_channels;
 };
 
+static void report_loss(struct dpif *, struct dpif_channel *, uint32_t ch_idx,
+uint32_t handler_id);
+
 static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(, 5);
 
 /* Generic Netlink family numbers for OVS.
@@ -172,8 +180,7 @@ static int dpif_linux_init(void);
 static int open_dpif(const struct dpif_linux_dp *, struct dpif **);
 static uint32_t dpif_linux_port_get_pid(const struct dpif *,
 odp_port_t port_no, uint32_t hash);
-static int dpif_linux_refresh_channels(struct dpif *);
-
+static int dpif_linux_refresh_channels(struct dpif *, uint32_t n_handlers);
 static void dpif_linux_vport_to_ofpbuf(const struct dpif_linux_vport *,
struct ofpbuf *);
 static int dpif_linux_vport_from_ofpbuf(struct dpif_linux_vport *,
@@ -238,6 +245,7 @@ dpif_linux_open(const struct dpif_class *class OVS_UNUSED, 
const char *name,
 }
 dp_request.name = name;
 dp_request.user_features |= OVS_DP_F_UNALIGNED;
+dp_request.user_features |= OVS_DP_F_VPORT_PIDS;
 error = dpif_linux_dp_transact(&dp_request, &dp, &buf);
 if (error) {
 return error;
@@ -255,8 +263,7 @@ open_dpif(const struct dpif_linux_dp *dp, struct dpif 
**dpifp)
 
 dpif = xzalloc(sizeof *dpif);
 dpif->port_notifier = NULL;
-ovs_mutex_init(&dpif->upcall_lock);
-dpif->epoll_fd = -1;
+fat_rwlock_init(&dpif->upcall_lock);
 
 dpif_init(&dpif->dpif, &dpif_linux_class, dp->name,
   dp->dp_ifindex, dp->dp_ifindex);
@@ -267,64 +274,110 @@ open_dpif(const struct dpif_linux_dp *dp, struct dpif 
**dpifp)
 return 0;
 }
 
+/* Destroys the netlink sockets pointed by the elements in 'socksp'
+ * and frees the 'socksp'.  */
 static void
-destroy_channels(struct dpif_linux *dpif)
+vport_del_socksp(struct nl_sock **socksp, uint32_t n_socks)
 {
-unsigned int i;
+size_t i;
 
-if (dpif->epoll_fd < 0) {
-

[ovs-dev] [PATCH V9 3/3] dpif-linux: Add thread-safety annotations.

2014-04-17 Thread Alex Wang
Signed-off-by: Alex Wang 

---
V9:
- add thread-safety annotations.
---
 lib/dpif-linux.c |   60 +-
 1 file changed, 46 insertions(+), 14 deletions(-)

diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c
index b70ddef..a575b78 100644
--- a/lib/dpif-linux.c
+++ b/lib/dpif-linux.c
@@ -180,7 +180,8 @@ static int dpif_linux_init(void);
 static int open_dpif(const struct dpif_linux_dp *, struct dpif **);
 static uint32_t dpif_linux_port_get_pid(const struct dpif *,
 odp_port_t port_no, uint32_t hash);
-static int dpif_linux_refresh_channels(struct dpif_linux *, uint32_t 
n_handlers);
+static int dpif_linux_refresh_channels(struct dpif_linux *,
+   uint32_t n_handlers);
 static void dpif_linux_vport_to_ofpbuf(const struct dpif_linux_vport *,
struct ofpbuf *);
 static int dpif_linux_vport_from_ofpbuf(struct dpif_linux_vport *,
@@ -459,7 +460,7 @@ vport_del_channels(struct dpif_linux *dpif, odp_port_t 
port_no)
 }
 
 static void
-destroy_all_channels(struct dpif_linux *dpif)
+destroy_all_channels(struct dpif_linux *dpif) OVS_REQ_WRLOCK(dpif->upcall_lock)
 {
 unsigned int i;
 
@@ -625,6 +626,7 @@ netdev_to_ovs_vport_type(const struct netdev *netdev)
 static int
 dpif_linux_port_add__(struct dpif_linux *dpif, struct netdev *netdev,
   odp_port_t *port_nop)
+OVS_REQ_WRLOCK(dpif->upcall_lock)
 {
 const struct netdev_tunnel_config *tnl_cfg;
 char namebuf[NETDEV_VPORT_NAME_BUFSIZE];
@@ -731,6 +733,7 @@ dpif_linux_port_add(struct dpif *dpif_, struct netdev 
*netdev,
 
 static int
 dpif_linux_port_del__(struct dpif_linux *dpif, odp_port_t port_no)
+OVS_REQ_WRLOCK(dpif->upcall_lock)
 {
 struct dpif_linux_vport vport;
 int error;
@@ -809,14 +812,13 @@ dpif_linux_port_query_by_name(const struct dpif *dpif_, 
const char *devname,
 }
 
 static uint32_t
-dpif_linux_port_get_pid(const struct dpif *dpif_, odp_port_t port_no,
-uint32_t hash)
+dpif_linux_port_get_pid__(const struct dpif_linux *dpif, odp_port_t port_no,
+  uint32_t hash)
+OVS_REQ_RDLOCK(dpif->upcall_lock)
 {
-struct dpif_linux *dpif = dpif_linux_cast(dpif_);
 uint32_t port_idx = odp_to_u32(port_no);
 uint32_t pid = 0;
 
-fat_rwlock_rdlock(&dpif->upcall_lock);
 if (dpif->handlers) {
 /* The ODPP_NONE "reserved" port number uses the "ovs-system"'s
  * channel, since it is not heavily loaded. */
@@ -825,11 +827,24 @@ dpif_linux_port_get_pid(const struct dpif *dpif_, 
odp_port_t port_no,
 
 pid = nl_sock_pid(h->channels[idx].sock);
 }
-fat_rwlock_unlock(&dpif->upcall_lock);
 
 return pid;
 }
 
+static uint32_t
+dpif_linux_port_get_pid(const struct dpif *dpif_, odp_port_t port_no,
+uint32_t hash)
+{
+const struct dpif_linux *dpif = dpif_linux_cast(dpif_);
+uint32_t ret;
+
+fat_rwlock_rdlock(&dpif->upcall_lock);
+ret = dpif_linux_port_get_pid__(dpif, port_no, hash);
+fat_rwlock_unlock(&dpif->upcall_lock);
+
+return ret;
+}
+
 static int
 dpif_linux_flow_flush(struct dpif *dpif_)
 {
@@ -1461,6 +1476,7 @@ dpif_linux_operate(struct dpif *dpif_, struct dpif_op 
**ops, size_t n_ops)
  * backing kernel vports. */
 static int
 dpif_linux_refresh_channels(struct dpif_linux *dpif, uint32_t n_handlers)
+OVS_REQ_WRLOCK(dpif->upcall_lock)
 {
 unsigned long int *keep_channels;
 struct dpif_linux_vport vport;
@@ -1586,6 +1602,7 @@ dpif_linux_refresh_channels(struct dpif_linux *dpif, 
uint32_t n_handlers)
 
 static int
 dpif_linux_recv_set__(struct dpif_linux *dpif, bool enable)
+OVS_REQ_WRLOCK(dpif->upcall_lock)
 {
 if ((dpif->handlers != NULL) == enable) {
 return 0;
@@ -1702,6 +1719,7 @@ parse_odp_packet(struct ofpbuf *buf, struct dpif_upcall 
*upcall,
 static int
 dpif_linux_recv__(struct dpif_linux *dpif, uint32_t handler_id,
   struct dpif_upcall *upcall, struct ofpbuf *buf)
+OVS_REQ_RDLOCK(dpif->upcall_lock)
 {
 struct dpif_handler *handler;
 int read_tries = 0;
@@ -1787,25 +1805,30 @@ dpif_linux_recv(struct dpif *dpif_, uint32_t handler_id,
 }
 
 static void
-dpif_linux_recv_wait(struct dpif *dpif_, uint32_t handler_id)
+dpif_linux_recv_wait__(struct dpif_linux *dpif, uint32_t handler_id)
+OVS_REQ_RDLOCK(dpif->upcall_lock)
 {
-struct dpif_linux *dpif = dpif_linux_cast(dpif_);
-
-fat_rwlock_rdlock(&dpif->upcall_lock);
 if (dpif->handlers && handler_id < dpif->n_handlers) {
 struct dpif_handler *handler = &dpif->handlers[handler_id];
 
 poll_fd_wait(handler->epoll_fd, POLLIN);
 }
-fat_rwlock_unlock(&dpif->upcall_lock);
 }
 
 static void
-dpif_linux_recv_purge(struct dpif *dpif_)
+dpif_linux_recv_wait(struct dpif *dpif_, uint32_t handler_id)
 {
 struct dpif_linux *dpif = dpif_linux_cast(dpif_);
 
-fat_rwlock_wrlock(&d

[ovs-dev] [PATCH V9 2/3] dpif-linux: Pass 'struct dpif_linux *' to internal static functions.

2014-04-17 Thread Alex Wang
This commit reformats the dpif-linux module so that all internal
static functions take 'struct dpif_linux *' as input argument.
This will allow the adding of thread-safety annotations.

Signed-off-by: Alex Wang 

---
V9:
- necessary changes for adding thread-safety annotations.
---
 lib/dpif-linux.c |  113 +++---
 1 file changed, 56 insertions(+), 57 deletions(-)

diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c
index c119c12..b70ddef 100644
--- a/lib/dpif-linux.c
+++ b/lib/dpif-linux.c
@@ -158,8 +158,8 @@ struct dpif_linux {
 bool refresh_channels;
 };
 
-static void report_loss(struct dpif *, struct dpif_channel *, uint32_t ch_idx,
-uint32_t handler_id);
+static void report_loss(struct dpif_linux *, struct dpif_channel *,
+uint32_t ch_idx, uint32_t handler_id);
 
 static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(, 5);
 
@@ -180,7 +180,7 @@ static int dpif_linux_init(void);
 static int open_dpif(const struct dpif_linux_dp *, struct dpif **);
 static uint32_t dpif_linux_port_get_pid(const struct dpif *,
 odp_port_t port_no, uint32_t hash);
-static int dpif_linux_refresh_channels(struct dpif *, uint32_t n_handlers);
+static int dpif_linux_refresh_channels(struct dpif_linux *, uint32_t 
n_handlers);
 static void dpif_linux_vport_to_ofpbuf(const struct dpif_linux_vport *,
struct ofpbuf *);
 static int dpif_linux_vport_from_ofpbuf(struct dpif_linux_vport *,
@@ -538,7 +538,7 @@ dpif_linux_run(struct dpif *dpif_)
 if (dpif->refresh_channels) {
 dpif->refresh_channels = false;
 fat_rwlock_wrlock(&dpif->upcall_lock);
-dpif_linux_refresh_channels(dpif_, dpif->n_handlers);
+dpif_linux_refresh_channels(dpif, dpif->n_handlers);
 fat_rwlock_unlock(&dpif->upcall_lock);
 }
 }
@@ -623,10 +623,9 @@ netdev_to_ovs_vport_type(const struct netdev *netdev)
 }
 
 static int
-dpif_linux_port_add__(struct dpif *dpif_, struct netdev *netdev,
+dpif_linux_port_add__(struct dpif_linux *dpif, struct netdev *netdev,
   odp_port_t *port_nop)
 {
-struct dpif_linux *dpif = dpif_linux_cast(dpif_);
 const struct netdev_tunnel_config *tnl_cfg;
 char namebuf[NETDEV_VPORT_NAME_BUFSIZE];
 const char *name = netdev_vport_get_dpif_port(netdev,
@@ -654,7 +653,7 @@ dpif_linux_port_add__(struct dpif *dpif_, struct netdev 
*netdev,
 if (request.type == OVS_VPORT_TYPE_UNSPEC) {
 VLOG_WARN_RL(&error_rl, "%s: cannot create port `%s' because it has "
  "unsupported type `%s'",
- dpif_name(dpif_), name, type);
+ dpif_name(&dpif->dpif), name, type);
 vport_del_socksp(socksp, dpif->n_handlers);
 return EINVAL;
 }
@@ -684,7 +683,7 @@ dpif_linux_port_add__(struct dpif *dpif_, struct netdev 
*netdev,
 } else {
 if (error == EBUSY && *port_nop != ODPP_NONE) {
 VLOG_INFO("%s: requested port %"PRIu32" is in use",
-  dpif_name(dpif_), *port_nop);
+  dpif_name(&dpif->dpif), *port_nop);
 }
 
 vport_del_socksp(socksp, dpif->n_handlers);
@@ -695,7 +694,7 @@ dpif_linux_port_add__(struct dpif *dpif_, struct netdev 
*netdev,
 error = vport_add_channels(dpif, *port_nop, socksp);
 if (error) {
 VLOG_INFO("%s: could not add channel for port %s",
-  dpif_name(dpif_), name);
+  dpif_name(&dpif->dpif), name);
 
 /* Delete the port. */
 dpif_linux_vport_init(&request);
@@ -724,16 +723,15 @@ dpif_linux_port_add(struct dpif *dpif_, struct netdev 
*netdev,
 int error;
 
 fat_rwlock_wrlock(&dpif->upcall_lock);
-error = dpif_linux_port_add__(dpif_, netdev, port_nop);
+error = dpif_linux_port_add__(dpif, netdev, port_nop);
 fat_rwlock_unlock(&dpif->upcall_lock);
 
 return error;
 }
 
 static int
-dpif_linux_port_del__(struct dpif *dpif_, odp_port_t port_no)
+dpif_linux_port_del__(struct dpif_linux *dpif, odp_port_t port_no)
 {
-struct dpif_linux *dpif = dpif_linux_cast(dpif_);
 struct dpif_linux_vport vport;
 int error;
 
@@ -755,14 +753,14 @@ dpif_linux_port_del(struct dpif *dpif_, odp_port_t 
port_no)
 int error;
 
 fat_rwlock_wrlock(&dpif->upcall_lock);
-error = dpif_linux_port_del__(dpif_, port_no);
+error = dpif_linux_port_del__(dpif, port_no);
 fat_rwlock_unlock(&dpif->upcall_lock);
 
 return error;
 }
 
 static int
-dpif_linux_port_query__(const struct dpif *dpif, odp_port_t port_no,
+dpif_linux_port_query__(const struct dpif_linux *dpif, odp_port_t port_no,
 const char *port_name, struct dpif_port *dpif_port)
 {
 struct dpif_linux_vport request;
@@ -772,7 +770,7 @@ dpif_linux_port_query__(const struct dpif *dpif, odp_port_t 
port_no,
 
 dpif_linux_vpor

[ovs-dev] [PATCH 0/5] lib: Extract packet to a miniflow

2014-04-17 Thread Jarno Rajahalme
Userspace datapath can be made faster by streamlining the flow
extraction code.  We already use miniflows internally within the
classifier.  This series of patches changes flow extraction to use
miniflow directly, and then also use the extracted miniflow as a
lookup key in the classifier.  This means less memory is touched by
each packet passing through the datapath.  Only when a packet miss is
generated is the miniflow key expanded to a full struct flow.

Jarno Rajahalme (5):
  lib/flow: Introduce miniflow_extract().
  lib/flow: Add miniflow accessors and miniflow_get_tcp_flags().
  lib/flow: Possibly faster miniflow_hash_in_minimask()
  classifier: Support miniflow as a key.
  dpif-netdev: Use miniflow as a flow key.

 lib/byte-order.h   |   12 +
 lib/classifier.c   |   62 +++
 lib/classifier.h   |3 +
 lib/dpif-netdev.c  |   52 ++-
 lib/flow.c |  842 +++-
 lib/flow.h |  206 --
 lib/match.c|2 +-
 lib/nx-match.c |2 +-
 lib/ofp-util.c |2 +-
 lib/packets.h  |   52 +--
 ofproto/ofproto-dpif-monitor.h |1 +
 ofproto/ofproto-dpif-xlate.c   |2 +-
 tests/test-classifier.c|2 +-
 13 files changed, 802 insertions(+), 438 deletions(-)

-- 
1.7.10.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 1/5] lib/flow: Introduce miniflow_extract().

2014-04-17 Thread Jarno Rajahalme
miniflow_extract() extracts packet headers directly to a miniflow,
which is a compressed form of the struct flow.  This does not require
a large struct to be cleared to begin with, and accesses less memory.
These performance benefits should allow this to be used in the DPDK
datapath.

miniflow_extract() takes a miniflow as an input/output parameter.  On
input the buffer for values to be extracted must be properly
initialized.  On output the map contains ones for all the fields that
have been extracted.

Some struct flow fields are reordered to make miniflow_extract to
progress in the logical order.

Some explicit "inline" keywords are necessary for GCC to optimize this
properly.  Also, macros are used for same reason instead of inline
functions for pushing data to the miniflow.

Signed-off-by: Jarno Rajahalme 
---
 lib/byte-order.h |   12 +
 lib/flow.c   |  756 +-
 lib/flow.h   |   38 ++-
 lib/match.c  |2 +-
 lib/nx-match.c   |2 +-
 lib/ofp-util.c   |2 +-
 lib/packets.h|1 +
 ofproto/ofproto-dpif-xlate.c |2 +-
 8 files changed, 500 insertions(+), 315 deletions(-)

diff --git a/lib/byte-order.h b/lib/byte-order.h
index e5ea8db..544f46f 100644
--- a/lib/byte-order.h
+++ b/lib/byte-order.h
@@ -78,4 +78,16 @@ uint32_byteswap(uint32_t crc) {
  ovs_be64) (VALUE)) & UINT64_C(0xff00)) >> 56))
 #endif
 
+#if WORDS_BIGENDIAN
+#define BYTES_TO_BE32(B1, B2, B3, B4) \
+(OVS_FORCE ovs_be32)((uint32_t)(B1) << 24 | (B2) << 16 | (B3) << 8 | (B4))
+#define BE16S_TO_BE32(B1, B2) \
+(OVS_FORCE ovs_be32)((uint32_t)(B1) << 16 | (B2))
+#else
+#define BYTES_TO_BE32(B1, B2, B3, B4) \
+(OVS_FORCE ovs_be32)((uint32_t)(B1) | (B2) << 8 | (B3) << 16 | (B4) << 24)
+#define BE16S_TO_BE32(B1, B2) \
+(OVS_FORCE ovs_be32)((uint32_t)(B1) | (B2) << 16)
+#endif
+
 #endif /* byte-order.h */
diff --git a/lib/flow.c b/lib/flow.c
index 6c6978d..8cb2b37 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -50,255 +50,245 @@ const uint8_t flow_segment_u32s[4] = {
 FLOW_U32S
 };
 
-static struct arp_eth_header *
-pull_arp(struct ofpbuf *packet)
-{
-return ofpbuf_try_pull(packet, ARP_ETH_HEADER_LEN);
-}
-
-static struct ip_header *
-pull_ip(struct ofpbuf *packet)
-{
-if (ofpbuf_size(packet) >= IP_HEADER_LEN) {
-struct ip_header *ip = ofpbuf_data(packet);
-int ip_len = IP_IHL(ip->ip_ihl_ver) * 4;
-if (ip_len >= IP_HEADER_LEN && ofpbuf_size(packet) >= ip_len) {
-return ofpbuf_pull(packet, ip_len);
-}
-}
-return NULL;
-}
-
-static struct icmp_header *
-pull_icmp(struct ofpbuf *packet)
-{
-return ofpbuf_try_pull(packet, ICMP_HEADER_LEN);
-}
-
-static struct icmp6_hdr *
-pull_icmpv6(struct ofpbuf *packet)
-{
-return ofpbuf_try_pull(packet, sizeof(struct icmp6_hdr));
-}
-
-static void
-parse_mpls(struct ofpbuf *b, struct flow *flow)
-{
-struct mpls_hdr *mh;
-int idx = 0;
+/* miniflow_extract() assumes the following to be true to optimize the
+ * extraction process. */
+BUILD_ASSERT_DECL(offsetof(struct flow, dl_type) + 2
+  == offsetof(struct flow, vlan_tci) &&
+  offsetof(struct flow, dl_type) / 4
+  == offsetof(struct flow, vlan_tci) / 4 );
+
+BUILD_ASSERT_DECL(offsetof(struct flow, nw_frag) + 3
+  == offsetof(struct flow, nw_proto) &&
+  offsetof(struct flow, nw_tos) + 2
+  == offsetof(struct flow, nw_proto) &&
+  offsetof(struct flow, nw_ttl) + 1
+  == offsetof(struct flow, nw_proto) &&
+  offsetof(struct flow, nw_frag) / 4
+  == offsetof(struct flow, nw_tos) / 4 &&
+  offsetof(struct flow, nw_ttl) / 4
+  == offsetof(struct flow, nw_tos) / 4 &&
+  offsetof(struct flow, nw_proto) / 4
+  == offsetof(struct flow, nw_tos) / 4);
+
+/* TCP flags in the first half of a BE32, zeroes in the other half. */
+BUILD_ASSERT_DECL(offsetof(struct flow, tcp_flags) + 2
+  == offsetof(struct flow, pad) &&
+  offsetof(struct flow, tcp_flags) / 4
+  == offsetof(struct flow, pad) / 4);
+#if WORDS_BIGENDIAN
+#define TCP_FLAGS_BE32(tcp_ctl) ((OVS_FORCE ovs_be32)TCP_FLAGS_BE16(tcp_ctl) \
+ << 16)
+#else
+#define TCP_FLAGS_BE32(tcp_ctl) ((OVS_FORCE ovs_be32)TCP_FLAGS_BE16(tcp_ctl))
+#endif
+
+BUILD_ASSERT_DECL(offsetof(struct flow, tp_src) + 2
+  == offsetof(struct flow, tp_dst) &&
+  offsetof(struct flow, tp_src) / 4
+  == offsetof(struct flow, tp_dst) / 4);
+
+/* Removes 'size' bytes from the head end of '*datap', of size '*sizep', which
+ * must contain at least 'size' bytes of data.  Returns the first byte of data
+ * removed. */
+static inline const void *
+data_p

[ovs-dev] [PATCH] lib/ofp-util: Restore the check for minus sign in port number strings.

2014-04-17 Thread Jarno Rajahalme
Commit 33ab38d9 (meta-flow: Simplify mf_from_ofp_port_string())
inadvertently removed a check for minus sign at the beginning of a
port number string introduced by commit 05dddba (meta-flow: Don't
allow negative port numbers).  This check is still needed, so put it
back, but to ofputil_port_from_string() this time.

Signed-off-by: Jarno Rajahalme 
---
 lib/ofp-util.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/lib/ofp-util.c b/lib/ofp-util.c
index af2d853..a990f46 100644
--- a/lib/ofp-util.c
+++ b/lib/ofp-util.c
@@ -5394,8 +5394,12 @@ ofputil_port_to_ofp11(ofp_port_t ofp10_port)
 bool
 ofputil_port_from_string(const char *s, ofp_port_t *portp)
 {
-uint32_t port32;
+unsigned int port32; /* int is at least 32 bits wide. */
 
+if (*s == '-') {
+VLOG_WARN("Negative values are not supported as port numbers.");
+return false;
+}
 *portp = 0;
 if (str_to_uint(s, 10, &port32)) {
 if (port32 < ofp_to_u16(OFPP_MAX)) {
-- 
1.7.10.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 5/5] dpif-netdev: Use miniflow as a flow key.

2014-04-17 Thread Jarno Rajahalme
Use miniflow as a flow key in the userspace datapath classifier.  The
miniflow is expanded for upcalls, but for existing datapath flows, the
key need not be expanded.

Signed-off-by: Jarno Rajahalme 
---
 lib/dpif-netdev.c |   52 +---
 lib/flow.c|   31 +--
 lib/flow.h|2 ++
 3 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 76141e3..790fe23 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -353,10 +353,11 @@ static int dpif_netdev_open(const struct dpif_class *, 
const char *name,
 bool create, struct dpif **);
 static int dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *,
   int queue_no, int type,
-  const struct flow *,
+  const struct miniflow *,
   const struct nlattr *userdata);
 static void dp_netdev_execute_actions(struct dp_netdev *dp,
-  const struct flow *, struct ofpbuf *, 
bool may_steal,
+  const struct miniflow *,
+  struct ofpbuf *, bool may_steal,
   struct pkt_metadata *,
   const struct nlattr *actions,
   size_t actions_len);
@@ -1063,13 +1064,15 @@ dp_netdev_flow_cast(const struct cls_rule *cr)
 }
 
 static struct dp_netdev_flow *
-dp_netdev_lookup_flow(const struct dp_netdev *dp, const struct flow *flow)
+dp_netdev_lookup_flow(const struct dp_netdev *dp, const struct miniflow *key)
 OVS_EXCLUDED(dp->cls.rwlock)
 {
 struct dp_netdev_flow *netdev_flow;
+struct cls_rule *rule;
 
 fat_rwlock_rdlock(&dp->cls.rwlock);
-netdev_flow = dp_netdev_flow_cast(classifier_lookup(&dp->cls, flow, NULL));
+rule = classifier_lookup_miniflow_first(&dp->cls, key);
+netdev_flow = dp_netdev_flow_cast(rule);
 fat_rwlock_unlock(&dp->cls.rwlock);
 
 return netdev_flow;
@@ -1294,6 +1297,7 @@ dpif_netdev_flow_put(struct dpif *dpif, const struct 
dpif_flow_put *put)
 struct dp_netdev *dp = get_dp_netdev(dpif);
 struct dp_netdev_flow *netdev_flow;
 struct flow flow;
+struct miniflow miniflow;
 struct flow_wildcards wc;
 int error;
 
@@ -1307,9 +1311,10 @@ dpif_netdev_flow_put(struct dpif *dpif, const struct 
dpif_flow_put *put)
 if (error) {
 return error;
 }
+miniflow_init(&miniflow, &flow);
 
 ovs_mutex_lock(&dp->flow_mutex);
-netdev_flow = dp_netdev_lookup_flow(dp, &flow);
+netdev_flow = dp_netdev_lookup_flow(dp, &miniflow);
 if (!netdev_flow) {
 if (put->flags & DPIF_FP_CREATE) {
 if (hmap_count(&dp->flow_table) < MAX_FLOWS) {
@@ -1522,7 +1527,8 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
 {
 struct dp_netdev *dp = get_dp_netdev(dpif);
 struct pkt_metadata *md = &execute->md;
-struct flow key;
+struct miniflow key;
+uint32_t buf[FLOW_U32S];
 
 if (ofpbuf_size(execute->packet) < ETH_HEADER_LEN ||
 ofpbuf_size(execute->packet) > UINT16_MAX) {
@@ -1530,7 +1536,8 @@ dpif_netdev_execute(struct dpif *dpif, struct 
dpif_execute *execute)
 }
 
 /* Extract flow key. */
-flow_extract(execute->packet, md, &key);
+miniflow_initialize(&key, buf);
+miniflow_extract(execute->packet, md, &key);
 
 ovs_rwlock_rdlock(&dp->port_rwlock);
 dp_netdev_execute_actions(dp, &key, execute->packet, false, md,
@@ -1963,9 +1970,9 @@ dp_netdev_flow_stats_new_cb(void)
 static void
 dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow,
 const struct ofpbuf *packet,
-const struct flow *key)
+const struct miniflow *key)
 {
-uint16_t tcp_flags = ntohs(key->tcp_flags);
+uint16_t tcp_flags = miniflow_get_tcp_flags(key);
 long long int now = time_msec();
 struct dp_netdev_flow_stats *bucket;
 
@@ -2005,13 +2012,16 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf 
*packet,
 OVS_REQ_RDLOCK(dp->port_rwlock)
 {
 struct dp_netdev_flow *netdev_flow;
-struct flow key;
+struct miniflow key;
+uint32_t buf[FLOW_U32S];
 
 if (ofpbuf_size(packet) < ETH_HEADER_LEN) {
 ofpbuf_delete(packet);
 return;
 }
-flow_extract(packet, md, &key);
+miniflow_initialize(&key, buf);
+miniflow_extract(packet, md, &key);
+
 netdev_flow = dp_netdev_lookup_flow(dp, &key);
 if (netdev_flow) {
 struct dp_netdev_actions *actions;
@@ -2025,7 +2035,8 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf 
*packet,
 } else if (dp->handler_queues) {
 dp_netdev_count_packet(dp, DP_STAT_MISS);
 dp_netdev_output_userspace(dp, packet,
-

[ovs-dev] [PATCH 3/5] lib/flow: Possibly faster miniflow_hash_in_minimask()

2014-04-17 Thread Jarno Rajahalme
Upcoming patches add classifier lookups using miniflows, this is
heavily used for it.

Signed-off-by: Jarno Rajahalme 
---
 lib/flow.c |   20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/lib/flow.c b/lib/flow.c
index c6e5e07..03d86ae 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -1815,11 +1815,27 @@ miniflow_hash_in_minimask(const struct miniflow *flow,
 const uint32_t *p = mask->masks.values;
 uint32_t hash;
 uint64_t map;
+const uint32_t *fp = flow->values;
+uint64_t fmap = flow->map;
+uint64_t rm1bit;
 
 hash = basis;
 
-for (map = mask->masks.map; map; map = zero_rightmost_1bit(map)) {
-hash = mhash_add(hash, miniflow_get(flow, raw_ctz(map)) & *p++);
+for (map = mask->masks.map; map; map -= rm1bit) {
+uint32_t flow_u32 = 0;
+
+rm1bit = rightmost_1bit(map);
+if (fmap & rm1bit) {
+uint64_t trash = fmap & (rm1bit - 1);
+
+/* Skip data in 'flow' that is of no interest, once. */
+if (trash) {
+fp += count_1bits(trash);
+fmap -= trash;
+}
+flow_u32 = *fp;
+}
+hash = mhash_add(hash, flow_u32 & *p++);
 }
 
 return mhash_finish(hash, (p - mask->masks.values) * 4);
-- 
1.7.10.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 2/5] lib/flow: Add miniflow accessors and miniflow_get_tcp_flags().

2014-04-17 Thread Jarno Rajahalme
Add inlined generic accessors for miniflow integer type fields, and a
new miniflow_get_tcp_flags() usinge these.  These will be used in a
later patch.

Some definitions also used in lib/packets.h had to be moved there to
resolve circular include dependencies.  Similarly, some inline
functions using struct flow are now in lib/flow.h.  IMO this is
cleaner, since now the lib/flow.h need not be included from
lib/packets.h.

Signed-off-by: Jarno Rajahalme 
---
 lib/flow.c |   45 ++---
 lib/flow.h |  140 +---
 lib/packets.h  |   51 ++-
 ofproto/ofproto-dpif-monitor.h |1 +
 tests/test-classifier.c|2 +-
 5 files changed, 143 insertions(+), 96 deletions(-)

diff --git a/lib/flow.c b/lib/flow.c
index 8cb2b37..c6e5e07 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -1692,42 +1692,15 @@ miniflow_expand(const struct miniflow *src, struct flow 
*dst)
 flow_union_with_miniflow(dst, src);
 }
 
-static const uint32_t *
-miniflow_get__(const struct miniflow *flow, unsigned int u32_ofs)
-{
-if (!(flow->map & (UINT64_C(1) << u32_ofs))) {
-static const uint32_t zero = 0;
-return &zero;
-}
-return flow->values +
-   count_1bits(flow->map & ((UINT64_C(1) << u32_ofs) - 1));
-}
-
 /* Returns the uint32_t that would be at byte offset '4 * u32_ofs' if 'flow'
  * were expanded into a "struct flow". */
-uint32_t
+static uint32_t
 miniflow_get(const struct miniflow *flow, unsigned int u32_ofs)
 {
-return *miniflow_get__(flow, u32_ofs);
-}
-
-/* Returns the ovs_be16 that would be at byte offset 'u8_ofs' if 'flow' were
- * expanded into a "struct flow". */
-static ovs_be16
-miniflow_get_be16(const struct miniflow *flow, unsigned int u8_ofs)
-{
-const uint32_t *u32p = miniflow_get__(flow, u8_ofs / 4);
-const ovs_be16 *be16p = (const ovs_be16 *) u32p;
-return be16p[u8_ofs % 4 != 0];
-}
-
-/* Returns the VID within the vlan_tci member of the "struct flow" represented
- * by 'flow'. */
-uint16_t
-miniflow_get_vid(const struct miniflow *flow)
-{
-ovs_be16 tci = miniflow_get_be16(flow, offsetof(struct flow, vlan_tci));
-return vlan_tci_to_vid(tci);
+return (flow->map & UINT64_C(1) << u32_ofs)
+? *(flow->values +
+count_1bits(flow->map & ((UINT64_C(1) << u32_ofs) - 1)))
+: 0;
 }
 
 /* Returns true if 'a' and 'b' are the same flow, false otherwise.  */
@@ -1976,14 +1949,6 @@ minimask_get(const struct minimask *mask, unsigned int 
u32_ofs)
 return miniflow_get(&mask->masks, u32_ofs);
 }
 
-/* Returns the VID mask within the vlan_tci member of the "struct
- * flow_wildcards" represented by 'mask'. */
-uint16_t
-minimask_get_vid_mask(const struct minimask *mask)
-{
-return miniflow_get_vid(&mask->masks);
-}
-
 /* Returns true if 'a' and 'b' are the same flow mask, false otherwise.  */
 bool
 minimask_equal(const struct minimask *a, const struct minimask *b)
diff --git a/lib/flow.h b/lib/flow.h
index 4f42f0c..c834cb5 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -24,6 +24,7 @@
 #include "byte-order.h"
 #include "openflow/nicira-ext.h"
 #include "openflow/openflow.h"
+#include "packets.h"
 #include "hash.h"
 #include "util.h"
 
@@ -60,23 +61,6 @@ BUILD_ASSERT_DECL(FLOW_NW_FRAG_LATER == NX_IP_FRAG_LATER);
 
 const char *flow_tun_flag_to_string(uint32_t flags);
 
-struct flow_tnl {
-ovs_be64 tun_id;
-ovs_be32 ip_src;
-ovs_be32 ip_dst;
-uint16_t flags;
-uint8_t ip_tos;
-uint8_t ip_ttl;
-};
-
-/* Unfortunately, a "struct flow" sometimes has to handle OpenFlow port
- * numbers and other times datapath (dpif) port numbers.  This union allows
- * access to both. */
-union flow_in_port {
-odp_port_t odp_port;
-ofp_port_t ofp_port;
-};
-
 /* Maximum number of supported MPLS labels. */
 #define FLOW_MAX_MPLS_LABELS 3
 
@@ -418,8 +402,21 @@ void miniflow_destroy(struct miniflow *);
 
 void miniflow_expand(const struct miniflow *, struct flow *);
 
-uint32_t miniflow_get(const struct miniflow *, unsigned int u32_ofs);
-uint16_t miniflow_get_vid(const struct miniflow *);
+/* These accessors use byte offsets, which are assumed to be compile-time
+ * constants. */
+static inline uint8_t miniflow_get_u8(const struct miniflow *,
+   unsigned int ofs);
+static inline uint16_t miniflow_get_u16(const struct miniflow *,
+unsigned int ofs);
+static inline ovs_be16 miniflow_get_be16(const struct miniflow *,
+ unsigned int ofs);
+static inline uint32_t miniflow_get_u32(const struct miniflow *,
+unsigned int ofs);
+static inline ovs_be32 miniflow_get_be32(const struct miniflow *,
+ unsigned int ofs);
+
+static inline uint16_t miniflow_get_vid(const struct miniflow *);
+static inline uint16_t miniflow_get_tcp_flags(const struct mi

[ovs-dev] [PATCH 4/5] classifier: Support miniflow as a key.

2014-04-17 Thread Jarno Rajahalme
Support struct miniflow as a key for datapath flow lookup.

The new classifier interface classifier_lookup_miniflow_first() takes
a miniflow as a key and stops at the first match with no regard to
flow prioritites.  This works only if the classifier has no
conflicting rules (as is the case with the userspace datapath
classifier).

Signed-off-by: Jarno Rajahalme 
---
 lib/classifier.c |   62 ++
 lib/classifier.h |3 +++
 lib/flow.c   |   24 +++--
 lib/flow.h   |   26 +++
 4 files changed, 94 insertions(+), 21 deletions(-)

diff --git a/lib/classifier.c b/lib/classifier.c
index 55ca5ea..31a0e8b 100644
--- a/lib/classifier.c
+++ b/lib/classifier.c
@@ -566,6 +566,68 @@ classifier_lookup(const struct classifier *cls, const 
struct flow *flow,
 return best;
 }
 
+/* Returns true if 'target' satisifies 'match', that is, if each bit for which
+ * 'match' specifies a particular value has the correct value in 'target'. */
+static bool
+minimatch_matches_miniflow(const struct minimatch *match,
+   const struct miniflow *target)
+{
+const uint32_t *flowp = (const uint32_t *)match->flow.values;
+const uint32_t *maskp = (const uint32_t *)match->mask.masks.values;
+uint32_t target_u32;
+
+MINIFLOW_FOR_EACH_IN_MAP(target_u32, target, match->mask.masks.map) {
+if ((*flowp++ ^ target_u32) & *maskp++) {
+return false;
+}
+}
+
+return true;
+}
+
+static inline struct cls_rule *
+find_match_miniflow(const struct cls_subtable *subtable,
+const struct miniflow *flow,
+uint32_t hash)
+{
+struct cls_rule *rule;
+
+HMAP_FOR_EACH_WITH_HASH (rule, hmap_node, hash, &subtable->rules) {
+if (minimatch_matches_miniflow(&rule->match, flow)) {
+return rule;
+}
+}
+
+return NULL;
+}
+
+/* Finds and returns the highest-priority rule in 'cls' that matches
+ * 'miniflow'.  Returns a null pointer if no rules in 'cls' match 'flow'.
+ * If multiple rules of equal priority match 'flow', returns one arbitrarily.
+ *
+ * This function is optimized for the userspace datapath, which only ever has
+ * one priority value for it's flows!
+ */
+struct cls_rule *classifier_lookup_miniflow_first(const struct classifier *cls,
+  const struct miniflow *flow)
+{
+struct cls_subtable *subtable;
+
+LIST_FOR_EACH (subtable, list_node, &cls->subtables_priority) {
+struct cls_rule *rule;
+
+rule = find_match_miniflow(subtable, flow,
+   miniflow_hash_in_minimask(flow,
+ &subtable->mask,
+ 0));
+if (rule) {
+return rule;
+}
+}
+
+return NULL;
+}
+
 /* Finds and returns a rule in 'cls' with exactly the same priority and
  * matching criteria as 'target'.  Returns a null pointer if 'cls' doesn't
  * contain an exact match. */
diff --git a/lib/classifier.h b/lib/classifier.h
index c3c1c3b..9022fab 100644
--- a/lib/classifier.h
+++ b/lib/classifier.h
@@ -346,6 +346,9 @@ struct cls_rule *classifier_lookup(const struct classifier 
*cls,
const struct flow *,
struct flow_wildcards *)
 OVS_REQ_RDLOCK(cls->rwlock);
+struct cls_rule *classifier_lookup_miniflow_first(const struct classifier *cls,
+  const struct miniflow *)
+OVS_REQ_RDLOCK(cls->rwlock);
 bool classifier_rule_overlaps(const struct classifier *cls,
   const struct cls_rule *)
 OVS_REQ_RDLOCK(cls->rwlock);
diff --git a/lib/flow.c b/lib/flow.c
index 03d86ae..8319dde 100644
--- a/lib/flow.c
+++ b/lib/flow.c
@@ -1813,28 +1813,10 @@ miniflow_hash_in_minimask(const struct miniflow *flow,
   const struct minimask *mask, uint32_t basis)
 {
 const uint32_t *p = mask->masks.values;
-uint32_t hash;
-uint64_t map;
-const uint32_t *fp = flow->values;
-uint64_t fmap = flow->map;
-uint64_t rm1bit;
-
-hash = basis;
-
-for (map = mask->masks.map; map; map -= rm1bit) {
-uint32_t flow_u32 = 0;
-
-rm1bit = rightmost_1bit(map);
-if (fmap & rm1bit) {
-uint64_t trash = fmap & (rm1bit - 1);
+uint32_t hash = basis;
+uint32_t flow_u32;
 
-/* Skip data in 'flow' that is of no interest, once. */
-if (trash) {
-fp += count_1bits(trash);
-fmap -= trash;
-}
-flow_u32 = *fp;
-}
+MINIFLOW_FOR_EACH_IN_MAP(flow_u32, flow, mask->masks.map) {
 hash = mhash_add(hash, flow_u32 & *p++);
 }
 
diff --git a/lib/flow.h b/lib/flow.h
index c834cb5..e8066aa 100644
--- a/lib/flow.h
+++ b/li

Re: [ovs-dev] OpenFlow rule deletion during port destroy

2014-04-17 Thread Jarno Rajahalme

On Apr 17, 2014, at 11:26 AM, Zoltan Kiss  wrote:

> On 16/04/14 18:00, Justin Pettit wrote:
>> On April 16, 2014 at 9:00:15 AM, Zoltan Kiss (zoltan.k...@citrix.com) wrote:
>> 
>>> My actual problem is that an important rule gets deleted:
>>> 
>>> cookie=0x0, duration=1581.083s, table=0, n_packets=52804,
>>> n_bytes=88968151, idle_age=0, priority=0,in_port=ANY actions=NORMAL
>>> 
>>> ...
>>> 
>>> There is no sign the controller did anything about deleting those rules,
>>> but somehow it still happened. Does anyone knows
>>> Unfortunately it is hard to reproduce the problem, it is only
>>> intermittent in one of our testcases.
>> 
>> I'm not sure that it's related, but it sounds similar to bug NIC-512 that I 
>> filed with Citrix over a year ago.  Here's the relevant part:
> Many thanks Justin it was this problem indeed! Now I poked the right people 
> to fix this quite forgotten problem!
> 
>> I did commit a change that should lessen the impact:
>> 
>>   
>> http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=commitdiff;h=05dddba
> Yes, this would solve the problem, but this patch removed it on the 
> assumption that ofputil_port_from_string() solved it anyway:
> 
> http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=commitdiff;h=33ab38d9
> 
> But it doesn't, in fact it does some pretty mad conversions:
> - ofputil_port_from_string pass a uint32_t* to str_to_uint
> - it is casted to an int*, which is fortunately 32 bit in most places, but it 
> is a dangerous assumption
> - the string is converted to a 'long long' and then copied into that int*
> - so if the string was "-1", the value of port32 will be 0x
> 
> I think the best place to catch this problem would be to check in 
> str_to_uint() if the returned '(int *) u' is a negative number, and return 
> false in that case. What do you think?
> 

It seems that the check for the minus sign needs to be done in 
ofputil_port_from_string(). The check for negative return values from 
str_to_uint() would not work as the special port numbers for OpenFlow 1.1+ are 
in that “negative” range.

I just posted this patch, could you verify it solves your problem:

http://openvswitch.org/pipermail/dev/2014-April/039070.html

  Jarno


___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] ofproto-dpif-upcall: Don't use stack garbage

2014-04-17 Thread YAMAMOTO Takashi
Catched by "learning action - self-modifying flow with hard_timeout"
test case.

The bug introduced by commit b256dc52.
("ofproto-dpif-xlate: Cache xlate_actions() effects.")

Signed-off-by: YAMAMOTO Takashi 
---
 ofproto/ofproto-dpif-upcall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 0d7dd8e..96a53e6 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -1379,7 +1379,6 @@ revalidate_ukey(struct udpif *udpif, struct 
udpif_flow_dump *udump,
 xoutp = NULL;
 actions = NULL;
 netflow = NULL;
-may_learn = push.n_packets > 0;
 
 /* If we don't need to revalidate, we can simply push the stats contained
  * in the udump, otherwise we'll have to get the actions so we can check
@@ -1413,6 +1412,7 @@ revalidate_ukey(struct udpif *udpif, struct 
udpif_flow_dump *udump,
 goto exit;
 }
 
+may_learn = push.n_packets > 0;
 if (ukey->xcache && !udump->need_revalidate) {
 xlate_push_stats(ukey->xcache, may_learn, &push);
 ok = true;
-- 
1.8.3.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] lib/ofp-util: Restore the check for minus sign in port number strings.

2014-04-17 Thread Justin Pettit
It might be useful to print the port string that was handed to it, since the 
other errors in that function do that.  Something like:

  VLOG_WARN("port %s is a negative value, which is not a valid port number", s);

Assuming that seems reasonable to you and Zoltan thinks it would prevent the 
issue he saw:

Acked-by: Justin Pettit 

--Justin


On April 17, 2014 at 6:37:42 PM, Jarno Rajahalme (jrajaha...@nicira.com) wrote:
> Commit 33ab38d9 (meta-flow: Simplify mf_from_ofp_port_string())
> inadvertently removed a check for minus sign at the beginning of a
> port number string introduced by commit 05dddba (meta-flow: Don't
> allow negative port numbers). This check is still needed, so put it
> back, but to ofputil_port_from_string() this time.
>  
> Signed-off-by: Jarno Rajahalme  
> ---
> lib/ofp-util.c | 6 +-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>  
> diff --git a/lib/ofp-util.c b/lib/ofp-util.c
> index af2d853..a990f46 100644
> --- a/lib/ofp-util.c
> +++ b/lib/ofp-util.c
> @@ -5394,8 +5394,12 @@ ofputil_port_to_ofp11(ofp_port_t ofp10_port)
> bool
> ofputil_port_from_string(const char *s, ofp_port_t *portp)
> {
> - uint32_t port32;
> + unsigned int port32; /* int is at least 32 bits wide. */
>  
> + if (*s == '-') {
> + VLOG_WARN("Negative values are not supported as port numbers.");
> + return false;
> + }
> *portp = 0;
> if (str_to_uint(s, 10, &port32)) {
> if (port32 < ofp_to_u16(OFPP_MAX)) {
> --
> 1.7.10.4
>  
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>  

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v3] Add Support for 802.1qad (qinq) Allows TPID of 0x88a8

2014-04-17 Thread Avinash
Hi,

Thanks Tom for the patch. The patch is clear and concise. We are
looking forward for the testing results of the working patch.

Currently, if a VLAN packet is received by the openvswitch, where the
input switch port is having vlan_mode as:
a. trunk, the packet is forwarded to the respective destination MAC address.
b. access, the packet is dropped.

As mentioned by manikanta, how about having a new vlan mode for the
switch port to distinguish ports with/without QinQ support.

Any other ideas ?

If this design of having new vlan mode looks good, we are also
interested in implementing stacked VLAN support (OF1.2+ spec) with
this design.

--
Avinash

On 4/15/14, Manikanta Srinivas  wrote:
> Thanks for the reply. Please find my responses inline.
>
>> Date: Mon, 14 Apr 2014 07:11:11 -0700
>> Subject: Re: [ovs-dev] [PATCH v3] Add Support for 802.1qad (qinq) Allows
>> TPID of 0x88a8
>> From: az...@nicira.com
>> To: srinivas...@outlook.com
>> CC: thomasfherb...@gmail.com; dev@openvswitch.org
>>
>> On Mon, Apr 14, 2014 at 4:27 AM, Manikanta Srinivas
>>  wrote:
>> > Hi,
>> >
>> > Thanks for your efforts. We are interested in QinQ implementation of
>> > openvswitch. After going through the patch, we are left with following
>> > queries.
>> >
>> > 1. We think there should be a separate mode to support qinq tunnel. This
>> > can
>> > be achieved by implementing a new configuration parameter in ovsdb. Do
>> > you
>> > have any plans to implement this approach or any other ideas ?
>>
>> What kind of new configuration do you have in mind?
>
> My understanding on QinQ:
>
> When a VLAN tag packet arrives the switch port, then if the port is:
>
> a. normal access: the packet is dropped. (as there is no qinq support)
> b. qinq access: new outer VLAN tag is appended over the C-Vlan tag.
> c. trunk (having qinq support): the packet is forwarded based on the
> destination MAC address.
>
> If the above mentioned cases are valid, then we might need a new
> configuration parameter to support qinq ?
>
>>
>> >
>> > 2. The structure "sw_flow_key" should be extended to represent the
>> > information of outer VLAN (service VLAN) in flow key. Similar change is
>> > also
>> > required in flow structure (lib/flow.h) used in user space.
>>
>> We don't strictly need those changes for Thomas' patch.  open-flow1.0
>> and 1.1 support
>> exactly one vlan tag.  Open flow 1.1 requires 802.1ad tag to be
>> recognized.
>> Thomas' patch is a great step forward to support Openflow 1.1.
>>
>> Openflow 1.2+ removed packet parsing requirements. So we could support
>> the model you proposed.
>> However, this would limit us to support fix number of VLAN tags, for
>> example, double tagging as you have suggested.
>> This is not a bad idea and should cover most of the use cases out
>> there.  On the other hand,  Openflow
>> spec does not rule out supporing multiple vlan tags. Since we are
>> going to support MLPS label stacking, we could also support
>> vlan stacking without much added efforts.
>>
>
> Thanks. If the specification says so, there is no issue. Also the mentioned
> approach would limit
> for only double VLAN tagging and might not support for multiple tags.
> May be I can have better understanding after seeing the testing results.
>
>> >
>> > Please correct our understanding if we went wrong somewhere.
>>
>> This is good time to voice your opinion and use cases.  We'd like to
>> flush out the design for openflow 1.2+ support.
>> Thank you for your interest and participation.
>>
>> >
>> > Thanks and Regards,
>> > Manikanta Srinivas
>> >
>   


-- 
AVINASH
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev