date:20150717

Re: [ovs-dev] [RFC net-next 22/22] openvswitch: Use regular GRE net_device instead of vport

2015-07-17 Thread Thomas Graf

On 07/16/15 at 02:36pm, Pravin Shelar wrote:
> On Thu, Jul 16, 2015 at 7:52 AM, Thomas Graf  wrote:
> > I'm inclined to change this and use an in-kernel API as well to
> > create the net_device just like VXLAN does in patch 21.
> >
> > Pravin, what do you think?
> 
> About the vxlan APIs we also need to direct netlink interface for
> userspace to configure vxlan device. This will allow us to remove
> vxlan compat code from ovs vport-netdev.c in future.

Do you mean creating the tunnel devices from user space? This would
break existing users of the OVS Netlink interface. How do you want
to prevent that?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if initialization fails.

2015-07-17 Thread Stokes, Ian

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Thursday, July 16, 2015 7:48 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if
> initialization fails.
> 
> netdev_dpdk_set_multiq() should not set the number of configured rxq
> and txq if the driver initialization fails (meaning that the driver
> failed to setup the queues).  Otherwise, on a subsequent call to
> netdev_dpdk_set_multiq(), the code may believe that the queues have
> already been setup and there's no work to be done.
> 
> This commit fixes the problem by restoring the old values if
> dpdk_eth_dev_init() fails.
> 
> Reported-by: Ian Stokes 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/netdev-dpdk.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 8b843db..5ae805e 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -743,6 +743,7 @@ netdev_dpdk_set_multiq(struct netdev *netdev_,
> unsigned int n_txq,
>  {
>  struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>  int err = 0;
> +int old_rxq, old_txq;
> 
>  if (netdev->up.n_txq == n_txq && netdev->up.n_rxq == n_rxq) {
>  return err;
> @@ -753,12 +754,20 @@ netdev_dpdk_set_multiq(struct netdev *netdev_,
> unsigned int n_txq,
> 
>  rte_eth_dev_stop(netdev->port_id);
> 
> +old_txq = netdev->up.n_txq;
> +old_rxq = netdev->up.n_rxq;
>  netdev->up.n_txq = n_txq;
>  netdev->up.n_rxq = n_rxq;
> 
>  rte_free(netdev->tx_q);
>  err = dpdk_eth_dev_init(netdev);
>  netdev_dpdk_alloc_txq(netdev, netdev->real_n_txq);
> +if (err) {
> +/* If there has been an error, it means that the requested
> queues
> + * have not been created.  Restore the old numbers. */
> +netdev->up.n_txq = old_txq;
> +netdev->up.n_rxq = old_rxq;
> +}
> 
>  netdev->txq_needs_locking = netdev->real_n_txq != netdev->up.n_txq;
> 
> --
> 2.1.4
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

Thanks for the quick response.

I've tested this and it works as expected now.

Tested-by: Ian Stokes 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup until we don't get any failure.

2015-07-17 Thread Stokes, Ian

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> Proietto
> Sent: Thursday, July 16, 2015 7:48 PM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup
> until we don't get any failure.
> 
> It has been observed that some DPDK device (e.g intel xl710) report an
> high number of queues but make some of them available only for special
> functions (SRIOV).  Therefore the queues will be counted in
> rte_eth_dev_info_get(), but rte_eth_tx_queue_setup() will fail.
> 
> This commit works around the issue by retrying the device initialization
> with a smaller number of queues, if a queue fails to setup.
> 
> Reported-by: Ian Stokes 
> Signed-off-by: Daniele Di Proietto 
> ---
>  lib/netdev-dpdk.c | 100 +++
> ---
>  1 file changed, 73 insertions(+), 27 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 5ae805e..3444bb1 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -423,52 +423,98 @@ dpdk_watchdog(void *dummy OVS_UNUSED)
>  }
> 
>  static int
> +dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq)
> +{
> +int diag = 0;
> +int i;
> +
> +/* A device may report more queues than it makes available (this
> has
> + * been observed for Intel xl710, which reserves some of them for
> + * SRIOV):  rte_eth_*_queue_setup will fail if a queue is not
> + * available.  When this happens we can retry the configuration
> + * and request less queues */
> +while (n_rxq && n_txq) {
> +if (diag) {
> +VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq,
> n_txq);
> +}
> +
> +diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq,
> &port_conf);
> +if (diag) {
> +break;
> +}
> +
> +for (i = 0; i < n_txq; i++) {
> +diag = rte_eth_tx_queue_setup(dev->port_id, i,
> NIC_PORT_TX_Q_SIZE,
> +  dev->socket_id, NULL);
> +if (diag) {
> +VLOG_INFO("Interface %s txq(%d) setup error: %s",
> +  dev->up.name, i, rte_strerror(-diag));
> +break;
> +}
> +}
> +
> +if (i != n_txq) {
> +/* Retry with less tx queues */
> +n_txq = i;
> +continue;
> +}
> +
> +for (i = 0; i < n_rxq; i++) {
> +diag = rte_eth_rx_queue_setup(dev->port_id, i,
> NIC_PORT_RX_Q_SIZE,
> +  dev->socket_id, NULL,
> +  dev->dpdk_mp->mp);
> +if (diag) {
> +VLOG_INFO("Interface %s rxq(%d) setup error: %s",
> +  dev->up.name, i, rte_strerror(-diag));
> +break;
> +}
> +}
> +
> +if (i != n_rxq) {
> +/* Retry with less rx queues */
> +n_rxq = i;
> +continue;
> +}
> +
> +dev->up.n_rxq = n_rxq;
> +dev->real_n_txq = n_txq;
> +
> +return 0;
> +}
> +
> +return diag;
> +}
> +
> +
> +static int
>  dpdk_eth_dev_init(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex)
>  {
>  struct rte_pktmbuf_pool_private *mbp_priv;
>  struct rte_eth_dev_info info;
>  struct ether_addr eth_addr;
>  int diag;
> -int i;
> +int n_rxq, n_txq;
> 
>  if (dev->port_id < 0 || dev->port_id >= rte_eth_dev_count()) {
>  return ENODEV;
>  }
> 
>  rte_eth_dev_info_get(dev->port_id, &info);
> -dev->up.n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
> -dev->real_n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> 
> -diag = rte_eth_dev_configure(dev->port_id, dev->up.n_rxq, dev-
> >real_n_txq,
> - &port_conf);
> +n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
> +n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> +
> +diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
>  if (diag) {
> -VLOG_ERR("eth dev config error %d. rxq:%d txq:%d", diag, dev-
> >up.n_rxq,
> - dev->real_n_txq);
> +VLOG_ERR("Interface %s(rxq:%d txq:%d) configure error: %s",
> + dev->up.name, n_rxq, n_txq, rte_strerror(-diag));
>  return -diag;
>  }
> 
> -for (i = 0; i < dev->real_n_txq; i++) {
> -diag = rte_eth_tx_queue_setup(dev->port_id, i,
> NIC_PORT_TX_Q_SIZE,
> -  dev->socket_id, NULL);
> -if (diag) {
> -VLOG_ERR("eth dev tx queue setup error %d",diag);
> -return -diag;
> -}
> -}
> -
> -for (i = 0; i < dev->up.n_rxq; i++) {
> -diag = rte_eth_rx_queue_setup(dev->port_id, i,
> NIC_PORT_RX_Q_SIZE,
> -  dev->socket_id,
> -  NULL, dev->dpdk_mp->mp);
> -if (diag) {

Re: [ovs-dev] ovs-ofctl mod-table commands supporting OF1.4 Eviction and Vacancy-Events

2015-07-17 Thread Saloni Jain

Hi Ben,

The main problem in the whole implementation which I am facing currently is in 
encoding and decoding of table-mod config value.
For table-config, as per the specification we can send only three values - 
OFPTC14_EVICTION, OFPTC14_VACANCY_EVENTS and 0.

As per the current implementation:
Eviction On -- "ovs-ofctl -O openflow14 mod-table   evict"-- 
OFPTC14_EVICTION is sent as config value and
Eviction Off -- "ovs-ofctl -O openflow14 mod-table   noevict"-- 0 
is sent as config value.

Similarly for Vacancy events implementation:
Vacancy is turned On -- "ovs-ofctl -O openflow14 mod-table   
vacancy-"- OFPTC14_VACANCY_EVENTS are send as as table config value
Vacancy Off -- "ovs-ofctl -O openflow14 mod-table   novacancy"-- 
0 should be send as table-config.

As for "noevict" and "novacancy", 0 is send as config value in table-mod so the 
problem arises while decoding, it is not clear whether to decode "0" for 
"noevict" or for "novacancy", that is when 0 is received as table config 
parameter and both eviction and vacancy are set for that table, we cannot 
decode and find out whether to turn-off eviction or to turn-off vacancy events 
for that table.

So what I think instead of doing "ovs-ofctl -O openflow14 mod-table  
 noevict" and "ovs-ofctl -O openflow14 mod-table   
novacancy", a single command "ovs-ofctl -O openflow14 mod-table   
clear" should be made for which 0 is send as config parameter and it will clear 
both vacancy and Eviction table-config property.

Thanks and Regards,
Saloni Jain
Tata Consultancy Services
Mailto: saloni.j...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting

-Ben Pfaff  wrote: -
To: Saloni Jain 
From: Ben Pfaff 
Date: 07/16/2015 11:13PM
Cc: dev@openvswitch.org, Deepankar Gupta , Partha 
Datta 
Subject: Re: [ovs-dev] ovs-ofctl mod-table commands supporting OF1.4 Eviction 
and Vacancy-Events

On Thu, Jul 16, 2015 at 12:26:20PM +0530, Saloni Jain wrote:
> So what I think that instead of "ovs-ofctl table-mod" to use an
> OFPMP_TABLE_DESC request to obtain the current configuration, then
> modify it according to the user's request, "ovs-vswitchd" should query
> and store the current table-configuration and modify it with user's
> request, as in this case both  "primary controller" and "service
> controller" will be taken into consideration.

Please describe the details of how you intend to do that.  I can't see
how it would work.
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH net-next 00/22] Lightweight & flow based encapsulation

2015-07-17 Thread Thomas Graf

This series combines the work previously posted by Roopa, Robert and
myself. It's according to what we discussed at NFWS. The motivation
of this series is to:

 * Consolidate code between OVS and the rest of the kernel and get
   rid of OVS vports and instead represent them as pure net_devices.
 * Introduce a lightweight tunneling mechanism which enables flow
   based encapsulation to improve scalability on both RX and TX.
 * Do the above in an encapsulation unspecific way so that the
   encapsulation type is eventually abstracted away from the user.
 * Use the same forwarding decision for both native forwarding and
   encapsulation thus allowing to switch between native IPv6 and
   UDP encapsulation based on endpoint without requiring additional
   logic

The fundamental changes introduces in this series are:
 * A new RTA_ENCAP Netlink attribute for routes carrying encapsulation
   instructions. Depending on the specified type, the instructions
   apply to UDP encapsulations, MPLS and possible other in the future.
 * Depending on the encapsulation type, the output function of the
   dst is directly overwritten or the dst merely attaches metadata and
   relies on a subsequent net_device to apply it to the packet. The
   latter is typically used if an inner and outer IP header exist which
   require two subsequent routing lookups to be performed.
 * A new metadata_dst structure which can be attached to skbs to
   carry metadata in between subsystems. This new metadata transport
   is used to provide a single interface for VXLAN, routing and OVS
   to communicate through metadata.

The OVS interfaces remain as-is but will transparently create a real
VXLAN net_device in the background. iproute2 is extended with a new
use cases:

  VXLAN:
  ip route add 40.1.1.1/32 encap vxlan id 10 dst 50.1.1.2 dev vxlan0

  MPLS:
  ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1

Changes since RFC:
 * Addressed comments
 * Folded in various fixes provided by Roopa, Joe, and Wei-Chun Chao
 * New static key to only collect metadata on receive if a filter exists
   which matches on the relevant fields.

Roopa Prabhu (9):
  rtnetlink: introduce new RTA_ENCAP_TYPE and RTA_ENCAP attributes
  lwtunnel: infrastructure for handling light weight tunnels like mpls
  ipv4: support for fib route lwtunnel encap attributes
  ipv6: support for fib route lwtunnel encap attributes
  lwtunnel: support dst output redirect function
  ipv4: redirect dst output to lwtunnel output
  ipv6: rt6_info output redirect to tunnel output
  mpls: export mpls functions for use by mpls iptunnels
  mpls: ip tunnel support

Thomas Graf (13):
  ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
  icmp: Don't leak original dst into ip_route_input()
  dst: Metadata destinations
  arp: Inherit metadata dst when creating ARP requests
  vxlan: Flow based tunneling
  route: Extend flow representation with tunnel key
  route: Per route IP tunnel metadata via lightweight tunnel
  fib: Add fib rule match on tunnel id
  vxlan: Factor out device configuration
  openvswitch: Make tunnel set action attach a metadata dst
  openvswitch: Move dev pointer into vport itself
  openvswitch: Abstract vport name through ovs_vport_name()
  openvswitch: Use regular VXLAN net_device device

 drivers/net/vxlan.c  | 678 +--
 include/linux/lwtunnel.h |   6 +
 include/linux/mpls_iptunnel.h|   6 +
 include/linux/skbuff.h   |   1 +
 include/net/dst.h|   6 +-
 include/net/dst_metadata.h   |  55 +++
 include/net/fib_rules.h  |   1 +
 include/net/flow.h   |   7 +
 include/net/ip6_fib.h|   3 +
 include/net/ip_fib.h |   5 +-
 include/net/ip_tunnels.h |  95 -
 include/net/lwtunnel.h   | 144 
 include/net/mpls_iptunnel.h  |  29 ++
 include/net/route.h  |   1 +
 include/net/rtnetlink.h  |   1 +
 include/net/vxlan.h  |  85 -
 include/uapi/linux/fib_rules.h   |   2 +-
 include/uapi/linux/if_link.h |   1 +
 include/uapi/linux/lwtunnel.h|  16 +
 include/uapi/linux/mpls_iptunnel.h   |  28 ++
 include/uapi/linux/openvswitch.h |   2 +-
 include/uapi/linux/rtnetlink.h   |  17 +
 net/Kconfig  |   7 +
 net/core/Makefile|   1 +
 net/core/dev.c   |   2 +-
 net/core/dst.c   |  84 -
 net/core/fib_rules.c |  24 +-
 net/core/lwtunnel.c  | 235 
 net/core/rtnetlink.c |  26 +-
 net/ipv4/arp.c   |  65 ++--
 net/ipv4/fib_frontend.c  |   8 +
 net/ipv4/fib_semantics.c |  96 -
 net/ipv4/icmp.c  |   1 +
 net/ipv4/ip_input.c  |   3 +-
 net/ipv4/ip_tunnel_core.c| 130 ++

[ovs-dev] [PATCH net-next 05/22] lwtunnel: support dst output redirect function

2015-07-17 Thread Thomas Graf

From: Roopa Prabhu 

This patch introduces lwtunnel_output function to call corresponding
lwtunnels output function to xmit the packet.

It adds two variants lwtunnel_output and lwtunnel_output6 for ipv4 and
ipv6 respectively today. But this is subject to change when lwtstate will
reside in dst or dst_metadata (as per upstream discussions).

Signed-off-by: Roopa Prabhu 
---
 include/net/lwtunnel.h | 12 +++
 net/core/lwtunnel.c| 56 ++
 2 files changed, 68 insertions(+)

diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index df24b36..918e03c 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -69,6 +69,8 @@ int lwtunnel_fill_encap(struct sk_buff *skb,
 int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate);
 struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
 int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
+int lwtunnel_output(struct sock *sk, struct sk_buff *skb);
+int lwtunnel_output6(struct sock *sk, struct sk_buff *skb);
 
 #else
 
@@ -127,6 +129,16 @@ static inline int lwtunnel_cmp_encap(struct lwtunnel_state 
*a,
return 0;
 }
 
+static inline int lwtunnel_output(struct sock *sk, struct sk_buff *skb)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int lwtunnel_output6(struct sock *sk, struct sk_buff *skb)
+{
+   return -EOPNOTSUPP;
+}
+
 #endif
 
 #endif /* __NET_LWTUNNEL_H */
diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
index d7ae3a2..bb58826 100644
--- a/net/core/lwtunnel.c
+++ b/net/core/lwtunnel.c
@@ -25,6 +25,7 @@
 
 #include 
 #include 
+#include 
 
 struct lwtunnel_state *lwtunnel_state_alloc(int encap_len)
 {
@@ -177,3 +178,58 @@ int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct 
lwtunnel_state *b)
return ret;
 }
 EXPORT_SYMBOL(lwtunnel_cmp_encap);
+
+int __lwtunnel_output(struct sock *sk, struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate)
+{
+   const struct lwtunnel_encap_ops *ops;
+   int ret = -EINVAL;
+
+   if (!lwtstate)
+   goto drop;
+
+   if (lwtstate->type == LWTUNNEL_ENCAP_NONE ||
+   lwtstate->type > LWTUNNEL_ENCAP_MAX)
+   return 0;
+
+   ret = -EOPNOTSUPP;
+   rcu_read_lock();
+   ops = rcu_dereference(lwtun_encaps[lwtstate->type]);
+   if (likely(ops && ops->output))
+   ret = ops->output(sk, skb);
+   rcu_read_unlock();
+
+   if (ret == -EOPNOTSUPP)
+   goto drop;
+
+   return ret;
+
+drop:
+   kfree(skb);
+
+   return ret;
+}
+
+int lwtunnel_output6(struct sock *sk, struct sk_buff *skb)
+{
+   struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
+   struct lwtunnel_state *lwtstate = NULL;
+
+   if (rt)
+   lwtstate = rt->rt6i_lwtstate;
+
+   return __lwtunnel_output(sk, skb, lwtstate);
+}
+EXPORT_SYMBOL(lwtunnel_output6);
+
+int lwtunnel_output(struct sock *sk, struct sk_buff *skb)
+{
+   struct rtable *rt = (struct rtable *)skb_dst(skb);
+   struct lwtunnel_state *lwtstate = NULL;
+
+   if (rt)
+   lwtstate = rt->rt_lwtstate;
+
+   return __lwtunnel_output(sk, skb, lwtstate);
+}
+EXPORT_SYMBOL(lwtunnel_output);
-- 
2.4.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH net-next 01/22] rtnetlink: introduce new RTA_ENCAP_TYPE and RTA_ENCAP attributes

2015-07-17 Thread Thomas Graf

From: Roopa Prabhu 

This patch introduces two new RTA attributes to attach encap
data to fib routes.

Example iproute2 command to attach mpls encap data to ipv4 routes

$ip route add 10.1.1.0/30 encap mpls 200 via inet 10.1.1.1 dev swp1

Signed-off-by: Roopa Prabhu 
Suggested-by: Eric W. Biederman 
---
 include/uapi/linux/rtnetlink.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index fdd8f07..0d3d3cc 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -308,6 +308,8 @@ enum rtattr_type_t {
RTA_VIA,
RTA_NEWDST,
RTA_PREF,
+   RTA_ENCAP_TYPE,
+   RTA_ENCAP,
__RTA_MAX
 };
 
-- 
2.4.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH net-next 07/22] ipv6: rt6_info output redirect to tunnel output

2015-07-17 Thread Thomas Graf

From: Roopa Prabhu 

This is similar to ipv4 redirect of dst output to lwtunnel
output function for encapsulation and xmit.

Signed-off-by: Roopa Prabhu 
---
 net/ipv6/route.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b3431b7..7f2214f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1780,6 +1780,7 @@ int ip6_route_add(struct fib6_config *cfg)
goto out;
lwtunnel_state_get(lwtstate);
rt->rt6i_lwtstate = lwtstate;
+   rt->dst.output = lwtunnel_output6;
}
 
ipv6_addr_prefix(&rt->rt6i_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
-- 
2.4.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH net-next 06/22] ipv4: redirect dst output to lwtunnel output

2015-07-17 Thread Thomas Graf

From: Roopa Prabhu 

For input routes with tunnel encap state this patch redirects
dst output functions to lwtunnel_output which later resolves to
the corresponding lwtunnel output function.

This has been tested to work with mpls ip tunnels.

Open items: Support for tunnel mtu, pmtu, fragmentation can be
added by hooking into the corresponding (ipv4, ipv6) dst ops.
We may do this differently when lwtstate moves to dst or dst_metadata
as per upstream discussions.

Signed-off-by: Roopa Prabhu 
---
 net/ipv4/route.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 226570b..cd3157c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1633,6 +1633,8 @@ static int __mkroute_input(struct sk_buff *skb,
rth->dst.output = ip_output;
 
rt_set_nexthop(rth, daddr, res, fnhe, res->fi, res->type, itag);
+   if (lwtunnel_output_redirect(rth->rt_lwtstate))
+   rth->dst.output = lwtunnel_output;
skb_dst_set(skb, &rth->dst);
 out:
err = 0;
-- 
2.4.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH net-next 03/22] ipv4: support for fib route lwtunnel encap attributes

2015-07-17 Thread Thomas Graf

From: Roopa Prabhu 

This patch adds support in ipv4 fib functions to parse user
provided encap attributes and attach encap state data to fib_nh
and rtable.

Signed-off-by: Roopa Prabhu 
---
 include/net/ip_fib.h |  5 ++-
 include/net/route.h  |  1 +
 net/ipv4/fib_frontend.c  |  8 
 net/ipv4/fib_semantics.c | 96 +++-
 net/ipv4/route.c | 16 +++-
 5 files changed, 122 insertions(+), 4 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 49c142b..5e01960 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -44,7 +44,9 @@ struct fib_config {
u32 fc_flow;
u32 fc_nlflags;
struct nl_info  fc_nlinfo;
- };
+   struct nlattr   *fc_encap;
+   u16 fc_encap_type;
+};
 
 struct fib_info;
 struct rtable;
@@ -89,6 +91,7 @@ struct fib_nh {
struct rtable __rcu * __percpu *nh_pcpu_rth_output;
struct rtable __rcu *nh_rth_input;
struct fnhe_hash_bucket __rcu *nh_exceptions;
+   struct lwtunnel_state   *nh_lwtstate;
 };
 
 /*
diff --git a/include/net/route.h b/include/net/route.h
index fe22d03..2d45f41 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -66,6 +66,7 @@ struct rtable {
 
struct list_headrt_uncached;
struct uncached_list*rt_uncached_list;
+   struct lwtunnel_state   *rt_lwtstate;
 };
 
 static inline bool rt_is_input_route(const struct rtable *rt)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 6bbc549..9b2019c 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -591,6 +591,8 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_METRICS]   = { .type = NLA_NESTED },
[RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
[RTA_FLOW]  = { .type = NLA_U32 },
+   [RTA_ENCAP_TYPE]= { .type = NLA_U16 },
+   [RTA_ENCAP] = { .type = NLA_NESTED },
 };
 
 static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
@@ -656,6 +658,12 @@ static int rtm_to_fib_config(struct net *net, struct 
sk_buff *skb,
case RTA_TABLE:
cfg->fc_table = nla_get_u32(attr);
break;
+   case RTA_ENCAP:
+   cfg->fc_encap = attr;
+   break;
+   case RTA_ENCAP_TYPE:
+   cfg->fc_encap_type = nla_get_u16(attr);
+   break;
}
}
 
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index c7358ea..6754c64 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "fib_lookup.h"
 
@@ -208,6 +209,7 @@ static void free_fib_info_rcu(struct rcu_head *head)
change_nexthops(fi) {
if (nexthop_nh->nh_dev)
dev_put(nexthop_nh->nh_dev);
+   lwtunnel_state_put(nexthop_nh->nh_lwtstate);
free_nh_exceptions(nexthop_nh);
rt_fibinfo_free_cpus(nexthop_nh->nh_pcpu_rth_output);
rt_fibinfo_free(&nexthop_nh->nh_rth_input);
@@ -266,6 +268,7 @@ static inline int nh_comp(const struct fib_info *fi, const 
struct fib_info *ofi)
 #ifdef CONFIG_IP_ROUTE_CLASSID
nh->nh_tclassid != onh->nh_tclassid ||
 #endif
+   lwtunnel_cmp_encap(nh->nh_lwtstate, onh->nh_lwtstate) ||
((nh->nh_flags ^ onh->nh_flags) & ~RTNH_COMPARE_MASK))
return -1;
onh++;
@@ -366,6 +369,7 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
payload += nla_total_size((RTAX_MAX * nla_total_size(4)));
 
if (fi->fib_nhs) {
+   size_t nh_encapsize = 0;
/* Also handles the special case fib_nhs == 1 */
 
/* each nexthop is packed in an attribute */
@@ -374,8 +378,21 @@ static inline size_t fib_nlmsg_size(struct fib_info *fi)
/* may contain flow and gateway attribute */
nhsize += 2 * nla_total_size(4);
 
+   /* grab encap info */
+   for_nexthops(fi) {
+   if (nh->nh_lwtstate) {
+   /* RTA_ENCAP_TYPE */
+   nh_encapsize += lwtunnel_get_encap_size(
+   nh->nh_lwtstate);
+   /* RTA_ENCAP */
+   nh_encapsize +=  nla_total_size(2);
+   }
+   } endfor_nexthops(fi);
+
/* all nexthops are packed in a nested attribute */
-   payload += nla_total_size(fi->fib_nhs * nhsize);
+   payload += nla_total_size((fi->fib_nhs * nhsize) +
+ nh_encap

[ovs-dev] [PATCH net-next 04/22] ipv6: support for fib route lwtunnel encap attributes

2015-07-17 Thread Thomas Graf

From: Roopa Prabhu 

This patch adds support in ipv6 fib functions to parse Netlink
RTA encap attributes and attach encap state data to rt6_info.

Signed-off-by: Roopa Prabhu 
---
 include/net/ip6_fib.h |  3 +++
 net/ipv6/ip6_fib.c|  2 ++
 net/ipv6/route.c  | 33 ++---
 3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 3b76849..276328e 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -51,6 +51,8 @@ struct fib6_config {
struct nlattr   *fc_mp;
 
struct nl_info  fc_nlinfo;
+   struct nlattr   *fc_encap;
+   u16 fc_encap_type;
 };
 
 struct fib6_node {
@@ -131,6 +133,7 @@ struct rt6_info {
/* more non-fragment space at head required */
unsigned short  rt6i_nfheader_len;
u8  rt6i_protocol;
+   struct lwtunnel_state   *rt6i_lwtstate;
 };
 
 static inline struct inet6_dev *ip6_dst_idev(struct dst_entry *dst)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 55d1986..d715f2e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -177,6 +178,7 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
 static void rt6_release(struct rt6_info *rt)
 {
if (atomic_dec_and_test(&rt->rt6i_ref)) {
+   lwtunnel_state_put(rt->rt6i_lwtstate);
rt6_free_pcpu(rt);
dst_free(&rt->dst);
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 6090969..b3431b7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -58,6 +58,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1770,6 +1771,17 @@ int ip6_route_add(struct fib6_config *cfg)
 
rt->dst.output = ip6_output;
 
+   if (cfg->fc_encap) {
+   struct lwtunnel_state *lwtstate;
+
+   err = lwtunnel_build_state(dev, cfg->fc_encap_type,
+  cfg->fc_encap, &lwtstate);
+   if (err)
+   goto out;
+   lwtunnel_state_get(lwtstate);
+   rt->rt6i_lwtstate = lwtstate;
+   }
+
ipv6_addr_prefix(&rt->rt6i_dst.addr, &cfg->fc_dst, cfg->fc_dst_len);
rt->rt6i_dst.plen = cfg->fc_dst_len;
if (rt->rt6i_dst.plen == 128)
@@ -2595,6 +2607,8 @@ static const struct nla_policy rtm_ipv6_policy[RTA_MAX+1] 
= {
[RTA_METRICS]   = { .type = NLA_NESTED },
[RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
[RTA_PREF]  = { .type = NLA_U8 },
+   [RTA_ENCAP_TYPE]= { .type = NLA_U16 },
+   [RTA_ENCAP] = { .type = NLA_NESTED },
 };
 
 static int rtm_to_fib6_config(struct sk_buff *skb, struct nlmsghdr *nlh,
@@ -2689,6 +2703,12 @@ static int rtm_to_fib6_config(struct sk_buff *skb, 
struct nlmsghdr *nlh,
cfg->fc_flags |= RTF_PREF(pref);
}
 
+   if (tb[RTA_ENCAP])
+   cfg->fc_encap = tb[RTA_ENCAP];
+
+   if (tb[RTA_ENCAP_TYPE])
+   cfg->fc_encap_type = nla_get_u16(tb[RTA_ENCAP_TYPE]);
+
err = 0;
 errout:
return err;
@@ -2721,6 +2741,10 @@ beginning:
r_cfg.fc_gateway = nla_get_in6_addr(nla);
r_cfg.fc_flags |= RTF_GATEWAY;
}
+   r_cfg.fc_encap = nla_find(attrs, attrlen, RTA_ENCAP);
+   nla = nla_find(attrs, attrlen, RTA_ENCAP_TYPE);
+   if (nla)
+   r_cfg.fc_encap_type = nla_get_u16(nla);
}
err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
if (err) {
@@ -2783,7 +2807,7 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct 
nlmsghdr *nlh)
return ip6_route_add(&cfg);
 }
 
-static inline size_t rt6_nlmsg_size(void)
+static inline size_t rt6_nlmsg_size(struct rt6_info *rt)
 {
return NLMSG_ALIGN(sizeof(struct rtmsg))
   + nla_total_size(16) /* RTA_SRC */
@@ -2797,7 +2821,8 @@ static inline size_t rt6_nlmsg_size(void)
   + RTAX_MAX * nla_total_size(4) /* RTA_METRICS */
   + nla_total_size(sizeof(struct rta_cacheinfo))
   + nla_total_size(TCP_CA_NAME_MAX) /* RTAX_CC_ALGO */
-  + nla_total_size(1); /* RTA_PREF */
+  + nla_total_size(1) /* RTA_PREF */
+  + lwtunnel_get_encap_size(rt->rt6i_lwtstate);
 }
 
 static int rt6_fill_node(struct net *net,
@@ -2945,6 +2970,8 @@ static int rt6_fill_node(struct net *net,
if (nla_put_u8(skb, RTA_PREF, IPV6_EXTRACT_PREF(rt->rt6i_flags)))
goto nla_put_failure;
 
+   lwtunnel_fill_encap(skb, rt->rt6i_lwtstate);
+
nlmsg_end(skb, nlh);
return 0;
 
@@ -3071,7 +3098,7 @@ void inet6_rt_notify(int event, st

[ovs-dev] [PATCH net-next 08/22] mpls: export mpls functions for use by mpls iptunnels

2015-07-17 Thread Thomas Graf

From: Roopa Prabhu 

Signed-off-by: Roopa Prabhu 
---
 net/mpls/af_mpls.c  | 11 ---
 net/mpls/internal.h |  9 +++--
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1f93a59..6e66911 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -58,10 +58,11 @@ static inline struct mpls_dev *mpls_dev_get(const struct 
net_device *dev)
return rcu_dereference_rtnl(dev->mpls_ptr);
 }
 
-static bool mpls_output_possible(const struct net_device *dev)
+bool mpls_output_possible(const struct net_device *dev)
 {
return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
 }
+EXPORT_SYMBOL_GPL(mpls_output_possible);
 
 static unsigned int mpls_rt_header_size(const struct mpls_route *rt)
 {
@@ -69,13 +70,14 @@ static unsigned int mpls_rt_header_size(const struct 
mpls_route *rt)
return rt->rt_labels * sizeof(struct mpls_shim_hdr);
 }
 
-static unsigned int mpls_dev_mtu(const struct net_device *dev)
+unsigned int mpls_dev_mtu(const struct net_device *dev)
 {
/* The amount of data the layer 2 frame can hold */
return dev->mtu;
 }
+EXPORT_SYMBOL_GPL(mpls_dev_mtu);
 
-static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
+bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
 {
if (skb->len <= mtu)
return false;
@@ -85,6 +87,7 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, 
unsigned int mtu)
 
return true;
 }
+EXPORT_SYMBOL_GPL(mpls_pkt_too_big);
 
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
struct mpls_entry_decoded dec)
@@ -626,6 +629,7 @@ int nla_put_labels(struct sk_buff *skb, int attrtype,
 
return 0;
 }
+EXPORT_SYMBOL_GPL(nla_put_labels);
 
 int nla_get_labels(const struct nlattr *nla,
   u32 max_labels, u32 *labels, u32 label[])
@@ -671,6 +675,7 @@ int nla_get_labels(const struct nlattr *nla,
*labels = nla_labels;
return 0;
 }
+EXPORT_SYMBOL_GPL(nla_get_labels);
 
 static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
   struct mpls_route_config *cfg)
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 8cabeb5..2681a4b 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -50,7 +50,12 @@ static inline struct mpls_entry_decoded 
mpls_entry_decode(struct mpls_shim_hdr *
return result;
 }
 
-int nla_put_labels(struct sk_buff *skb, int attrtype,  u8 labels, const u32 
label[]);
-int nla_get_labels(const struct nlattr *nla, u32 max_labels, u32 *labels, u32 
label[]);
+int nla_put_labels(struct sk_buff *skb, int attrtype,  u8 labels,
+  const u32 label[]);
+int nla_get_labels(const struct nlattr *nla, u32 max_labels, u32 *labels,
+  u32 label[]);
+bool mpls_output_possible(const struct net_device *dev);
+unsigned int mpls_dev_mtu(const struct net_device *dev);
+bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu);
 
 #endif /* MPLS_INTERNAL_H */
-- 
2.4.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH net-next 09/22] mpls: ip tunnel support

2015-07-17 Thread Thomas Graf

From: Roopa Prabhu 

This implementation uses lwtunnel infrastructure to register
hooks for mpls tunnel encaps.

It picks cues from iptunnel_encaps infrastructure and previous
mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman

Signed-off-by: Roopa Prabhu 
---
 include/linux/mpls_iptunnel.h  |   6 +
 include/net/mpls_iptunnel.h|  29 +
 include/uapi/linux/mpls_iptunnel.h |  28 +
 net/mpls/Kconfig   |   8 +-
 net/mpls/Makefile  |   1 +
 net/mpls/mpls_iptunnel.c   | 233 +
 6 files changed, 304 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/mpls_iptunnel.h
 create mode 100644 include/net/mpls_iptunnel.h
 create mode 100644 include/uapi/linux/mpls_iptunnel.h
 create mode 100644 net/mpls/mpls_iptunnel.c

diff --git a/include/linux/mpls_iptunnel.h b/include/linux/mpls_iptunnel.h
new file mode 100644
index 000..ef29eb2
--- /dev/null
+++ b/include/linux/mpls_iptunnel.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_MPLS_IPTUNNEL_H
+#define _LINUX_MPLS_IPTUNNEL_H
+
+#include 
+
+#endif  /* _LINUX_MPLS_IPTUNNEL_H */
diff --git a/include/net/mpls_iptunnel.h b/include/net/mpls_iptunnel.h
new file mode 100644
index 000..4757997
--- /dev/null
+++ b/include/net/mpls_iptunnel.h
@@ -0,0 +1,29 @@
+/*
+ * Copyright (c) 2015 Cumulus Networks, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ */
+
+#ifndef _NET_MPLS_IPTUNNEL_H
+#define _NET_MPLS_IPTUNNEL_H 1
+
+#define MAX_NEW_LABELS 2
+
+struct mpls_iptunnel_encap {
+   u32 label[MAX_NEW_LABELS];
+   u32 labels;
+};
+
+static inline struct mpls_iptunnel_encap *mpls_lwtunnel_encap(struct 
lwtunnel_state *lwtstate)
+{
+   return (struct mpls_iptunnel_encap *)lwtstate->data;
+}
+
+#endif
diff --git a/include/uapi/linux/mpls_iptunnel.h 
b/include/uapi/linux/mpls_iptunnel.h
new file mode 100644
index 000..d80a049
--- /dev/null
+++ b/include/uapi/linux/mpls_iptunnel.h
@@ -0,0 +1,28 @@
+/*
+ * mpls tunnel api
+ *
+ * Authors:
+ * Roopa Prabhu 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _UAPI_LINUX_MPLS_IPTUNNEL_H
+#define _UAPI_LINUX_MPLS_IPTUNNEL_H
+
+/* MPLS tunnel attributes
+ * [RTA_ENCAP] = {
+ * [MPLS_IPTUNNEL_DST]
+ * }
+ */
+enum {
+   MPLS_IPTUNNEL_UNSPEC,
+   MPLS_IPTUNNEL_DST,
+   __MPLS_IPTUNNEL_MAX,
+};
+#define MPLS_IPTUNNEL_MAX (__MPLS_IPTUNNEL_MAX - 1)
+
+#endif /* _UAPI_LINUX_MPLS_IPTUNNEL_H */
diff --git a/net/mpls/Kconfig b/net/mpls/Kconfig
index 17bde79..5c467ef 100644
--- a/net/mpls/Kconfig
+++ b/net/mpls/Kconfig
@@ -24,7 +24,13 @@ config NET_MPLS_GSO
 
 config MPLS_ROUTING
tristate "MPLS: routing support"
-   help
+   ---help---
 Add support for forwarding of mpls packets.
 
+config MPLS_IPTUNNEL
+   tristate "MPLS: IP over MPLS tunnel support"
+   depends on LWTUNNEL && MPLS_ROUTING
+   ---help---
+mpls ip tunnel support.
+
 endif # MPLS
diff --git a/net/mpls/Makefile b/net/mpls/Makefile
index 65bbe68..9ca9236 100644
--- a/net/mpls/Makefile
+++ b/net/mpls/Makefile
@@ -3,5 +3,6 @@
 #
 obj-$(CONFIG_NET_MPLS_GSO) += mpls_gso.o
 obj-$(CONFIG_MPLS_ROUTING) += mpls_router.o
+obj-$(CONFIG_MPLS_IPTUNNEL) += mpls_iptunnel.o
 
 mpls_router-y := af_mpls.o
diff --git a/net/mpls/mpls_iptunnel.c b/net/mpls/mpls_iptunnel.c
new file mode 100644
index 000..eea096f
--- /dev/null
+++ b/net/mpls/mpls_iptunnel.c
@@ -0,0 +1,233 @@
+/*
+ * mpls tunnelsAn implementation mpls tunnels using the light weight 
tunnel
+ * infrastructure
+ *
+ * Authors:Roopa Prabhu, 
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "internal.h"
+
+static const struct nla_policy mpls_iptunnel_policy[MPLS_IPTUNNEL_MAX + 1] = {
+   [MPLS_IPTUNNEL_DST] = { .type = NLA_U32 },
+};
+
+static unsigned int mpls_encap_size(struct mpls_iptunnel_encap *en)
+{
+   /* The size of

[ovs-dev] [PATCH net-next 02/22] lwtunnel: infrastructure for handling light weight tunnels like mpls

2015-07-17 Thread Thomas Graf

From: Roopa Prabhu 

Provides infrastructure to parse/dump/store encap information for
light weight tunnels like mpls. Encap information for such tunnels
is associated with fib routes.

This infrastructure is based on previous suggestions from
Eric Biederman to follow the xfrm infrastructure.

Signed-off-by: Roopa Prabhu 
---
 include/linux/lwtunnel.h  |   6 ++
 include/net/lwtunnel.h| 132 +++
 include/uapi/linux/lwtunnel.h |  15 
 net/Kconfig   |   7 ++
 net/core/Makefile |   1 +
 net/core/lwtunnel.c   | 179 ++
 6 files changed, 340 insertions(+)
 create mode 100644 include/linux/lwtunnel.h
 create mode 100644 include/net/lwtunnel.h
 create mode 100644 include/uapi/linux/lwtunnel.h
 create mode 100644 net/core/lwtunnel.c

diff --git a/include/linux/lwtunnel.h b/include/linux/lwtunnel.h
new file mode 100644
index 000..97f32f8
--- /dev/null
+++ b/include/linux/lwtunnel.h
@@ -0,0 +1,6 @@
+#ifndef _LINUX_LWTUNNEL_H_
+#define _LINUX_LWTUNNEL_H_
+
+#include 
+
+#endif /* _LINUX_LWTUNNEL_H_ */
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
new file mode 100644
index 000..df24b36
--- /dev/null
+++ b/include/net/lwtunnel.h
@@ -0,0 +1,132 @@
+#ifndef __NET_LWTUNNEL_H
+#define __NET_LWTUNNEL_H 1
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define LWTUNNEL_HASH_BITS   7
+#define LWTUNNEL_HASH_SIZE   (1 << LWTUNNEL_HASH_BITS)
+
+/* lw tunnel state flags */
+#define LWTUNNEL_STATE_OUTPUT_REDIRECT 0x1
+
+struct lwtunnel_state {
+   __u16   type;
+   __u16   flags;
+   atomic_trefcnt;
+   int len;
+   __u8data[0];
+};
+
+struct lwtunnel_encap_ops {
+   int (*build_state)(struct net_device *dev, struct nlattr *encap,
+  struct lwtunnel_state **ts);
+   int (*output)(struct sock *sk, struct sk_buff *skb);
+   int (*fill_encap)(struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate);
+   int (*get_encap_size)(struct lwtunnel_state *lwtstate);
+   int (*cmp_encap)(struct lwtunnel_state *a, struct lwtunnel_state *b);
+};
+
+extern const struct lwtunnel_encap_ops __rcu *
+   lwtun_encaps[LWTUNNEL_ENCAP_MAX+1];
+
+#ifdef CONFIG_LWTUNNEL
+static inline void lwtunnel_state_get(struct lwtunnel_state *lws)
+{
+   atomic_inc(&lws->refcnt);
+}
+
+static inline void lwtunnel_state_put(struct lwtunnel_state *lws)
+{
+   if (!lws)
+   return;
+
+   if (atomic_dec_and_test(&lws->refcnt))
+   kfree(lws);
+}
+
+static inline bool lwtunnel_output_redirect(struct lwtunnel_state *lwtstate)
+{
+   if (lwtstate && (lwtstate->flags & LWTUNNEL_STATE_OUTPUT_REDIRECT))
+   return true;
+
+   return false;
+}
+
+int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
+  unsigned int num);
+int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+struct nlattr *encap,
+struct lwtunnel_state **lws);
+int lwtunnel_fill_encap(struct sk_buff *skb,
+   struct lwtunnel_state *lwtstate);
+int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate);
+struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len);
+int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
+
+#else
+
+static inline void lwtunnel_state_get(struct lwtunnel_state *lws)
+{
+}
+
+static inline void lwtunnel_state_put(struct lwtunnel_state *lws)
+{
+}
+
+static inline bool lwtunnel_output_redirect(struct lwtunnel_state *lwtstate)
+{
+   return false;
+}
+
+static inline int lwtunnel_encap_add_ops(const struct lwtunnel_encap_ops *op,
+unsigned int num)
+{
+   return -EOPNOTSUPP;
+
+}
+
+static inline int lwtunnel_encap_del_ops(const struct lwtunnel_encap_ops *op,
+unsigned int num)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int lwtunnel_build_state(struct net_device *dev, u16 encap_type,
+  struct nlattr *encap,
+  struct lwtunnel_state **lws)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int lwtunnel_fill_encap(struct sk_buff *skb,
+ struct lwtunnel_state *lwtstate)
+{
+   return 0;
+}
+
+static inline int lwtunnel_get_encap_size(struct lwtunnel_state *lwtstate)
+{
+   return 0;
+}
+
+static inline struct lwtunnel_state *lwtunnel_state_alloc(int hdr_len)
+{
+   return NULL;
+}
+
+static inline int lwtunnel_cmp_encap(struct lwtunnel_state *a,
+struct lwtunnel_state *b)
+{
+   return 0;
+}
+
+#endif
+
+#endif /* __NET_LWTUNNEL_H */
diff --git a/

[ovs-dev] [PATCH net-next 11/22] icmp: Don't leak original dst into ip_route_input()

2015-07-17 Thread Thomas Graf

ip_route_input() unconditionally overwrites the dst. Hide the original
dst attached to the skb by calling skb_dst_set(skb, NULL) prior to
ip_route_input().

Reported-by: Julian Anastasov 
Signed-off-by: Thomas Graf 
---
 net/ipv4/icmp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index f5203fb..c0556f1 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -496,6 +496,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
}
/* Ugh! */
orefdst = skb_in->_skb_refdst; /* save old refdst */
+   skb_dst_set(skb_in, NULL);
err = ip_route_input(skb_in, fl4_dec.daddr, fl4_dec.saddr,
 RT_TOS(tos), rt2->dst.dev);
 
-- 
2.4.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH net-next 12/22] dst: Metadata destinations

2015-07-17 Thread Thomas Graf

Introduces a new dst_metadata which enables to carry per packet metadata
between forwarding and processing elements via the skb->dst pointer.

The structure is set up to be a union. Thus, each separate type of
metadata requires its own dst instance. If demand arises to carry
multiple types of metadata concurrently, metadata dst entries can be
made stackable.

The metadata dst entry is refcnt'ed as expected for now but a non
reference counted use is possible if the reference is forced before
queueing the skb.

In order to allow allocating dsts with variable length, the existing
dst_alloc() is split into a dst_alloc() and dst_init() function. The
existing dst_init() function to initialize the subsystem is being
renamed to dst_subsys_init() to make it clear what is what.

The check before ip_route_input() is changed to ignore metadata dsts
and drop the dst inside the routing function thus allowing to interpret
metadata in a later commit.

Signed-off-by: Thomas Graf 
---
 include/net/dst.h  |  6 +++-
 include/net/dst_metadata.h | 32 ++
 net/core/dev.c |  2 +-
 net/core/dst.c | 84 ++
 net/ipv4/ip_input.c|  3 +-
 net/ipv4/route.c   |  2 ++
 6 files changed, 112 insertions(+), 17 deletions(-)
 create mode 100644 include/net/dst_metadata.h

diff --git a/include/net/dst.h b/include/net/dst.h
index 2bc73f8a..2578811 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -57,6 +57,7 @@ struct dst_entry {
 #define DST_FAKE_RTABLE0x0040
 #define DST_XFRM_TUNNEL0x0080
 #define DST_XFRM_QUEUE 0x0100
+#define DST_METADATA   0x0200
 
unsigned short  pending_confirm;
 
@@ -356,6 +357,9 @@ static inline int dst_discard(struct sk_buff *skb)
 }
 void *dst_alloc(struct dst_ops *ops, struct net_device *dev, int initial_ref,
int initial_obsolete, unsigned short flags);
+void dst_init(struct dst_entry *dst, struct dst_ops *ops,
+ struct net_device *dev, int initial_ref, int initial_obsolete,
+ unsigned short flags);
 void __dst_free(struct dst_entry *dst);
 struct dst_entry *dst_destroy(struct dst_entry *dst);
 
@@ -457,7 +461,7 @@ static inline struct dst_entry *dst_check(struct dst_entry 
*dst, u32 cookie)
return dst;
 }
 
-void dst_init(void);
+void dst_subsys_init(void);
 
 /* Flags for xfrm_lookup flags argument. */
 enum {
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
new file mode 100644
index 000..4f7694f
--- /dev/null
+++ b/include/net/dst_metadata.h
@@ -0,0 +1,32 @@
+#ifndef __NET_DST_METADATA_H
+#define __NET_DST_METADATA_H 1
+
+#include 
+#include 
+#include 
+
+struct metadata_dst {
+   struct dst_entrydst;
+   size_t  opts_len;
+};
+
+static inline struct metadata_dst *skb_metadata_dst(struct sk_buff *skb)
+{
+   struct metadata_dst *md_dst = (struct metadata_dst *) skb_dst(skb);
+
+   if (md_dst && md_dst->dst.flags & DST_METADATA)
+   return md_dst;
+
+   return NULL;
+}
+
+static inline bool skb_valid_dst(const struct sk_buff *skb)
+{
+   struct dst_entry *dst = skb_dst(skb);
+
+   return dst && !(dst->flags & DST_METADATA);
+}
+
+struct metadata_dst *metadata_dst_alloc(u8 optslen, gfp_t flags);
+
+#endif /* __NET_DST_METADATA_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 8810b6b..61e3dcb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7659,7 +7659,7 @@ static int __init net_dev_init(void)
open_softirq(NET_RX_SOFTIRQ, net_rx_action);
 
hotcpu_notifier(dev_cpu_callback, 0);
-   dst_init();
+   dst_subsys_init();
rc = 0;
 out:
return rc;
diff --git a/net/core/dst.c b/net/core/dst.c
index e956ce6..917364f 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -22,6 +22,7 @@
 #include 
 
 #include 
+#include 
 
 /*
  * Theory of operations:
@@ -158,19 +159,10 @@ const u32 dst_default_metrics[RTAX_MAX + 1] = {
[RTAX_MAX] = 0xdeadbeef,
 };
 
-
-void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
-   int initial_ref, int initial_obsolete, unsigned short flags)
+void dst_init(struct dst_entry *dst, struct dst_ops *ops,
+ struct net_device *dev, int initial_ref, int initial_obsolete,
+ unsigned short flags)
 {
-   struct dst_entry *dst;
-
-   if (ops->gc && dst_entries_get_fast(ops) > ops->gc_thresh) {
-   if (ops->gc(ops))
-   return NULL;
-   }
-   dst = kmem_cache_alloc(ops->kmem_cachep, GFP_ATOMIC);
-   if (!dst)
-   return NULL;
dst->child = NULL;
dst->dev = dev;
if (dev)
@@ -200,6 +192,25 @@ void *dst_alloc(struct dst_ops *ops, struct net_device 
*dev,
dst->next = NULL;
if (!(flags & DST_NOCOUNT))
dst_entries_add(ops, 1);
+}
+EXPORT_SYMBOL(dst_init);
+
+void *dst_alloc(str

[ovs-dev] [PATCH net-next 15/22] route: Extend flow representation with tunnel key

2015-07-17 Thread Thomas Graf

Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to
allow routes to match on tunnel metadata. For now, the tunnel id is
added to flowi_tunnel which allows for routes to be bound to specific
virtual tunnels.

Signed-off-by: Thomas Graf 
---
 include/net/flow.h | 7 +++
 net/ipv4/route.c   | 6 ++
 2 files changed, 13 insertions(+)

diff --git a/include/net/flow.h b/include/net/flow.h
index 8109a15..c15fb5e 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -19,6 +19,10 @@
 
 #define LOOPBACK_IFINDEX   1
 
+struct flowi_tunnel {
+   __be64  tun_id;
+};
+
 struct flowi_common {
int flowic_oif;
int flowic_iif;
@@ -30,6 +34,7 @@ struct flowi_common {
 #define FLOWI_FLAG_ANYSRC  0x01
 #define FLOWI_FLAG_KNOWN_NH0x02
__u32   flowic_secid;
+   struct flowi_tunnel flowic_tun_key;
 };
 
 union flowi_uli {
@@ -66,6 +71,7 @@ struct flowi4 {
 #define flowi4_proto   __fl_common.flowic_proto
 #define flowi4_flags   __fl_common.flowic_flags
 #define flowi4_secid   __fl_common.flowic_secid
+#define flowi4_tun_key __fl_common.flowic_tun_key
 
/* (saddr,daddr) must be grouped, same order as in IP header */
__be32  saddr;
@@ -165,6 +171,7 @@ struct flowi {
 #define flowi_protou.__fl_common.flowic_proto
 #define flowi_flagsu.__fl_common.flowic_flags
 #define flowi_secidu.__fl_common.flowic_secid
+#define flowi_tun_key  u.__fl_common.flowic_tun_key
 } __attribute__((__aligned__(BITS_PER_LONG/8)));
 
 static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 4c8e84e..931015c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -91,6 +91,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -110,6 +111,7 @@
 #include 
 #endif
 #include 
+#include 
 
 #define RT_FL_TOS(oldflp4) \
((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
@@ -1673,6 +1675,7 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
 {
struct fib_result res;
struct in_device *in_dev = __in_dev_get_rcu(dev);
+   struct ip_tunnel_info *tun_info;
struct flowi4   fl4;
unsigned intflags = 0;
u32 itag = 0;
@@ -1690,6 +1693,9 @@ static int ip_route_input_slow(struct sk_buff *skb, 
__be32 daddr, __be32 saddr,
   by fib_lookup.
 */
 
+   tun_info = skb_tunnel_info(skb);
+   if (tun_info && tun_info->mode == IP_TUNNEL_INFO_RX)
+   fl4.flowi4_tun_key.tun_id = tun_info->key.tun_id;
skb_dst_drop(skb);
 
if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
-- 
2.4.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH net-next 13/22] arp: Inherit metadata dst when creating ARP requests

2015-07-17 Thread Thomas Graf

If output device wants to see the dst, inherit the dst of the
original skb and pass it on to generate the ARP request.

Signed-off-by: Thomas Graf 
---
 net/ipv4/arp.c | 65 +-
 1 file changed, 37 insertions(+), 28 deletions(-)

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 933a928..1d59e50 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -291,6 +291,40 @@ static void arp_error_report(struct neighbour *neigh, 
struct sk_buff *skb)
kfree_skb(skb);
 }
 
+/* Create and send an arp packet. */
+static void arp_send_dst(int type, int ptype, __be32 dest_ip,
+struct net_device *dev, __be32 src_ip,
+const unsigned char *dest_hw,
+const unsigned char *src_hw,
+const unsigned char *target_hw, struct sk_buff *oskb)
+{
+   struct sk_buff *skb;
+
+   /* arp on this interface. */
+   if (dev->flags & IFF_NOARP)
+   return;
+
+   skb = arp_create(type, ptype, dest_ip, dev, src_ip,
+dest_hw, src_hw, target_hw);
+   if (!skb)
+   return;
+
+   if (oskb)
+   skb_dst_copy(skb, oskb);
+
+   arp_xmit(skb);
+}
+
+void arp_send(int type, int ptype, __be32 dest_ip,
+ struct net_device *dev, __be32 src_ip,
+ const unsigned char *dest_hw, const unsigned char *src_hw,
+ const unsigned char *target_hw)
+{
+   arp_send_dst(type, ptype, dest_ip, dev, src_ip, dest_hw, src_hw,
+target_hw, NULL);
+}
+EXPORT_SYMBOL(arp_send);
+
 static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
 {
__be32 saddr = 0;
@@ -346,8 +380,9 @@ static void arp_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
}
}
 
-   arp_send(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
-dst_hw, dev->dev_addr, NULL);
+   arp_send_dst(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
+dst_hw, dev->dev_addr, NULL,
+dev->priv_flags & IFF_XMIT_DST_RELEASE ? NULL : skb);
 }
 
 static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
@@ -597,32 +632,6 @@ void arp_xmit(struct sk_buff *skb)
 EXPORT_SYMBOL(arp_xmit);
 
 /*
- * Create and send an arp packet.
- */
-void arp_send(int type, int ptype, __be32 dest_ip,
- struct net_device *dev, __be32 src_ip,
- const unsigned char *dest_hw, const unsigned char *src_hw,
- const unsigned char *target_hw)
-{
-   struct sk_buff *skb;
-
-   /*
-*  No arp on this interface.
-*/
-
-   if (dev->flags&IFF_NOARP)
-   return;
-
-   skb = arp_create(type, ptype, dest_ip, dev, src_ip,
-dest_hw, src_hw, target_hw);
-   if (!skb)
-   return;
-
-   arp_xmit(skb);
-}
-EXPORT_SYMBOL(arp_send);
-
-/*
  * Process an arp request.
  */
 
-- 
2.4.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH net-next 18/22] vxlan: Factor out device configuration

2015-07-17 Thread Thomas Graf

This factors out the device configuration out of the RTNL newlink
API which allows for in-kernel creation of VXLAN net_devices.

Signed-off-by: Thomas Graf 
---
 drivers/net/vxlan.c | 332 
 include/net/vxlan.h |  59 ++
 2 files changed, 236 insertions(+), 155 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 23378db..5ae6c0c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -55,10 +55,6 @@
 
 #define PORT_HASH_BITS 8
 #define PORT_HASH_SIZE  (1remote_port && rdst->remote_port != vxlan->dst_port &&
+   if (rdst->remote_port && rdst->remote_port != vxlan->cfg.dst_port &&
nla_put_be16(skb, NDA_PORT, rdst->remote_port))
goto nla_put_failure;
if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
@@ -756,7 +713,8 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
if (!(flags & NLM_F_CREATE))
return -ENOENT;
 
-   if (vxlan->addrmax && vxlan->addrcnt >= vxlan->addrmax)
+   if (vxlan->cfg.addrmax &&
+   vxlan->addrcnt >= vxlan->cfg.addrmax)
return -ENOSPC;
 
/* Disallow replace to add a multicast entry */
@@ -842,7 +800,7 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct 
vxlan_dev *vxlan,
return -EINVAL;
*port = nla_get_be16(tb[NDA_PORT]);
} else {
-   *port = vxlan->dst_port;
+   *port = vxlan->cfg.dst_port;
}
 
if (tb[NDA_VNI]) {
@@ -1028,7 +986,7 @@ static bool vxlan_snoop(struct net_device *dev,
vxlan_fdb_create(vxlan, src_mac, src_ip,
 NUD_REACHABLE,
 NLM_F_EXCL|NLM_F_CREATE,
-vxlan->dst_port,
+vxlan->cfg.dst_port,
 vxlan->default_dst.remote_vni,
 0, NTF_SELF);
spin_unlock(&vxlan->hash_lock);
@@ -1957,7 +1915,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
info = skb_tunnel_info(skb, AF_INET);
 
if (rdst) {
-   dst_port = rdst->remote_port ? rdst->remote_port : 
vxlan->dst_port;
+   dst_port = rdst->remote_port ? rdst->remote_port : 
vxlan->cfg.dst_port;
vni = rdst->remote_vni;
dst = &rdst->remote_ip;
} else {
@@ -1967,7 +1925,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
goto drop;
}
 
-   dst_port = info->key.tp_dst ? : vxlan->dst_port;
+   dst_port = info->key.tp_dst ? : vxlan->cfg.dst_port;
vni = be64_to_cpu(info->key.tun_id);
remote_ip.sin.sin_family = AF_INET;
remote_ip.sin.sin_addr.s_addr = info->key.ipv4_dst;
@@ -1985,16 +1943,16 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 
old_iph = ip_hdr(skb);
 
-   ttl = vxlan->ttl;
+   ttl = vxlan->cfg.ttl;
if (!ttl && vxlan_addr_multicast(dst))
ttl = 1;
 
-   tos = vxlan->tos;
+   tos = vxlan->cfg.tos;
if (tos == 1)
tos = ip_tunnel_get_dsfield(old_iph, skb);
 
-   src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->port_min,
-vxlan->port_max, true);
+   src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min,
+vxlan->cfg.port_max, true);
 
if (dst->sa.sa_family == AF_INET) {
if (info) {
@@ -2020,7 +1978,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
fl4.flowi4_mark = skb->mark;
fl4.flowi4_proto = IPPROTO_UDP;
fl4.daddr = dst->sin.sin_addr.s_addr;
-   fl4.saddr = vxlan->saddr.sin.sin_addr.s_addr;
+   fl4.saddr = vxlan->cfg.saddr.sin.sin_addr.s_addr;
 
rt = ip_route_output_key(vxlan->net, &fl4);
if (IS_ERR(rt)) {
@@ -2076,7 +2034,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
memset(&fl6, 0, sizeof(fl6));
fl6.flowi6_oif = rdst ? rdst->remote_ifindex : 0;
fl6.daddr = dst->sin6.sin6_addr;
-   fl6.saddr = vxlan->saddr.sin6.sin6_addr;
+   fl6.saddr = vxlan->cfg.saddr.sin6.sin6_addr;
fl6.flowi6_mark = skb->mark;
fl6.flowi6_proto = IPPROTO_UDP;
 
@@ -2247,7 +2205,7 @@ static void vxlan_cleanup(unsigned long arg)
if (f->state & NUD_PERMANENT)
continue;
 
-   timeo

[ovs-dev] [PATCH net-next 14/22] vxlan: Flow based tunneling

2015-07-17 Thread Thomas Graf

Allows putting a VXLAN device into a new flow-based mode in which
skbs with a ip_tunnel_info dst metadata attached will be encapsulated
according to the instructions stored in there with the VXLAN device
defaults taken into consideration.

Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
set, the packet processing will populate a ip_tunnel_info struct for
each packet received and attach it to the skb using the new metadata
dst.  The metadata structure will contain the outer header and tunnel
header fields which have been stripped off. Layers further up in the
stack such as routing, tc or netfitler can later match on these fields
and perform forwarding. It is the responsibility of upper layers to
ensure that the flag is set if the metadata is needed. The flag limits
the additional cost of metadata collecting based on demand.

This prepares the VXLAN device to be steered by the routing and other
subsystems which allows to support encapsulation for a large number
of tunnel endpoints and tunnel ids through a single net_device which
improves the scalability.

It also allows for OVS to leverage this mode which in turn allows for
the removal of the OVS specific VXLAN code.

Because the skb is currently scrubed in vxlan_rcv(), the attachment of
the new dst metadata is postponed until after scrubing which requires
the temporary addition of a new member to vxlan_metadata. This member
is removed again in a later commit after the indirect VXLAN receive API
has been removed.

Signed-off-by: Thomas Graf 
Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c  | 155 +--
 include/linux/skbuff.h   |   1 +
 include/net/dst_metadata.h   |  13 
 include/net/ip_tunnels.h |  14 
 include/net/vxlan.h  |  10 ++-
 include/uapi/linux/if_link.h |   1 +
 6 files changed, 171 insertions(+), 23 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 34c519e..994d89c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #endif
+#include 
 
 #define VXLAN_VERSION  "0.1"
 
@@ -140,6 +141,11 @@ struct vxlan_dev {
 static u32 vxlan_salt __read_mostly;
 static struct workqueue_struct *vxlan_wq;
 
+static inline bool vxlan_collect_metadata(struct vxlan_sock *vs)
+{
+   return vs->flags & VXLAN_F_COLLECT_METADATA;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static inline
 bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b)
@@ -1164,10 +1170,13 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff 
*skb, struct vxlanhdr *vh,
 /* Callback from net/ipv4/udp.c to receive packets */
 static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
+   struct metadata_dst *tun_dst = NULL;
+   struct ip_tunnel_info *info;
struct vxlan_sock *vs;
struct vxlanhdr *vxh;
u32 flags, vni;
-   struct vxlan_metadata md = {0};
+   struct vxlan_metadata _md;
+   struct vxlan_metadata *md = &_md;
 
/* Need Vxlan and inner Ethernet header to be present */
if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1202,6 +1211,33 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
vni &= VXLAN_VNI_MASK;
}
 
+   if (vxlan_collect_metadata(vs)) {
+   const struct iphdr *iph = ip_hdr(skb);
+
+   tun_dst = metadata_dst_alloc(sizeof(*md), GFP_ATOMIC);
+   if (!tun_dst)
+   goto drop;
+
+   info = &tun_dst->u.tun_info;
+   info->key.ipv4_src = iph->saddr;
+   info->key.ipv4_dst = iph->daddr;
+   info->key.ipv4_tos = iph->tos;
+   info->key.ipv4_ttl = iph->ttl;
+   info->key.tp_src = udp_hdr(skb)->source;
+   info->key.tp_dst = udp_hdr(skb)->dest;
+
+   info->mode = IP_TUNNEL_INFO_RX;
+   info->key.tun_flags = TUNNEL_KEY;
+   info->key.tun_id = cpu_to_be64(vni >> 8);
+   if (udp_hdr(skb)->check != 0)
+   info->key.tun_flags |= TUNNEL_CSUM;
+
+   md = ip_tunnel_info_opts(info, sizeof(*md));
+   md->tun_dst = tun_dst;
+   } else {
+   memset(md, 0, sizeof(*md));
+   }
+
/* For backwards compatibility, only allow reserved fields to be
 * used by VXLAN extensions if explicitly requested.
 */
@@ -1209,13 +1245,16 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
struct vxlanhdr_gbp *gbp;
 
gbp = (struct vxlanhdr_gbp *)vxh;
-   md.gbp = ntohs(gbp->policy_id);
+   md->gbp = ntohs(gbp->policy_id);
+
+   if (tun_dst)
+   info->key.tun_flags |= TUNNEL_VXLAN_OPT;
 
if (gbp->dont_learn)
-   md.gbp |= VXLAN_GBP_DONT_LEARN;
+   md->gbp |= VXLAN_GBP_DONT_LEARN;

[ovs-dev] [PATCH net-next 17/22] fib: Add fib rule match on tunnel id

2015-07-17 Thread Thomas Graf

This add the ability to select a routing table based on the tunnel
id which allows to maintain separate routing tables for each virtual
tunnel network.

ip rule add from all tunnel-id 100 lookup 100
ip rule add from all tunnel-id 200 lookup 200

A new static key controls the collection of metadata at tunnel level
upon demand.

Signed-off-by: Thomas Graf 
---
 drivers/net/vxlan.c|  3 ++-
 include/net/fib_rules.h|  1 +
 include/net/ip_tunnels.h   | 11 +++
 include/uapi/linux/fib_rules.h |  2 +-
 net/core/fib_rules.c   | 24 ++--
 net/ipv4/ip_tunnel_core.c  | 16 
 6 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index a350afb..23378db 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -143,7 +143,8 @@ static struct workqueue_struct *vxlan_wq;
 
 static inline bool vxlan_collect_metadata(struct vxlan_sock *vs)
 {
-   return vs->flags & VXLAN_F_COLLECT_METADATA;
+   return vs->flags & VXLAN_F_COLLECT_METADATA ||
+  ip_tunnel_collect_metadata();
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 903a55e..4e8f804 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -19,6 +19,7 @@ struct fib_rule {
u8  action;
/* 3 bytes hole, try to use */
u32 target;
+   __be64  tun_id;
struct fib_rule __rcu   *ctarget;
struct net  *fr_net;
 
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 0b7e18c..0a5a776 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -303,6 +303,17 @@ static inline struct ip_tunnel_info *lwt_tun_info(struct 
lwtunnel_state *lwtstat
return (struct ip_tunnel_info *)lwtstate->data;
 }
 
+extern struct static_key ip_tunnel_metadata_cnt;
+
+/* Returns > 0 if metadata should be collected */
+static inline int ip_tunnel_collect_metadata(void)
+{
+   return static_key_false(&ip_tunnel_metadata_cnt);
+}
+
+void ip_tunnel_need_metadata(void);
+void ip_tunnel_unneed_metadata(void);
+
 #endif /* CONFIG_INET */
 
 #endif /* __NET_IP_TUNNELS_H */
diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
index 2b82d7e..96161b8 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -43,7 +43,7 @@ enum {
FRA_UNUSED5,
FRA_FWMARK, /* mark */
FRA_FLOW,   /* flow/class id */
-   FRA_UNUSED6,
+   FRA_TUN_ID,
FRA_SUPPRESS_IFGROUP,
FRA_SUPPRESS_PREFIXLEN,
FRA_TABLE,  /* Extended table id */
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 9a12668..ae8306e 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int fib_default_rule_add(struct fib_rules_ops *ops,
 u32 pref, u32 table, u32 flags)
@@ -186,6 +187,9 @@ static int fib_rule_match(struct fib_rule *rule, struct 
fib_rules_ops *ops,
if ((rule->mark ^ fl->flowi_mark) & rule->mark_mask)
goto out;
 
+   if (rule->tun_id && (rule->tun_id != fl->flowi_tun_key.tun_id))
+   goto out;
+
ret = ops->match(rule, fl, flags);
 out:
return (rule->flags & FIB_RULE_INVERT) ? !ret : ret;
@@ -330,6 +334,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
if (tb[FRA_FWMASK])
rule->mark_mask = nla_get_u32(tb[FRA_FWMASK]);
 
+   if (tb[FRA_TUN_ID])
+   rule->tun_id = nla_get_be64(tb[FRA_TUN_ID]);
+
rule->action = frh->action;
rule->flags = frh->flags;
rule->table = frh_get_table(frh, tb);
@@ -407,6 +414,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
if (unresolved)
ops->unresolved_rules++;
 
+   if (rule->tun_id)
+   ip_tunnel_need_metadata();
+
notify_rule_change(RTM_NEWRULE, rule, ops, nlh, NETLINK_CB(skb).portid);
flush_route_cache(ops);
rules_ops_put(ops);
@@ -473,6 +483,10 @@ static int fib_nl_delrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
(rule->mark_mask != nla_get_u32(tb[FRA_FWMASK])))
continue;
 
+   if (tb[FRA_TUN_ID] &&
+   (rule->tun_id != nla_get_be64(tb[FRA_TUN_ID])))
+   continue;
+
if (!ops->compare(rule, frh, tb))
continue;
 
@@ -487,6 +501,9 @@ static int fib_nl_delrule(struct sk_buff *skb, struct 
nlmsghdr* nlh)
goto errout;
}
 
+   if (rule->tun_id)
+   ip_tunnel_unneed_metadata();
+
list_del_rcu(&rule->list);
 
if (rule->action == FR_ACT_GOTO) {
@@ -535,7 +552,8 @@ static inline size

[ovs-dev] [PATCH net-next 16/22] route: Per route IP tunnel metadata via lightweight tunnel

2015-07-17 Thread Thomas Graf

This introduces a new IP tunnel lightweight tunnel type which allows
to specify IP tunnel instructions per route. Only IPv4 is supported
at this point.

Signed-off-by: Thomas Graf 
---
 drivers/net/vxlan.c|  10 +++-
 include/net/dst_metadata.h |  12 -
 include/net/ip_tunnels.h   |   7 ++-
 include/uapi/linux/lwtunnel.h  |   1 +
 include/uapi/linux/rtnetlink.h |  15 ++
 net/ipv4/ip_tunnel_core.c  | 114 +
 net/ipv4/route.c   |   2 +-
 net/openvswitch/vport.h|   1 +
 8 files changed, 157 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 994d89c..a350afb 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1935,7 +1935,7 @@ static void vxlan_encap_bypass(struct sk_buff *skb, 
struct vxlan_dev *src_vxlan,
 static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
   struct vxlan_rdst *rdst, bool did_rsc)
 {
-   struct ip_tunnel_info *info = skb_tunnel_info(skb);
+   struct ip_tunnel_info *info;
struct vxlan_dev *vxlan = netdev_priv(dev);
struct sock *sk = vxlan->vn_sock->sock->sk;
struct rtable *rt = NULL;
@@ -1952,6 +1952,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
int err;
u32 flags = vxlan->flags;
 
+   /* FIXME: Support IPv6 */
+   info = skb_tunnel_info(skb, AF_INET);
+
if (rdst) {
dst_port = rdst->remote_port ? rdst->remote_port : 
vxlan->dst_port;
vni = rdst->remote_vni;
@@ -2141,12 +2144,15 @@ tx_free:
 static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct vxlan_dev *vxlan = netdev_priv(dev);
-   const struct ip_tunnel_info *info = skb_tunnel_info(skb);
+   const struct ip_tunnel_info *info;
struct ethhdr *eth;
bool did_rsc = false;
struct vxlan_rdst *rdst, *fdst = NULL;
struct vxlan_fdb *f;
 
+   /* FIXME: Support IPv6 */
+   info = skb_tunnel_info(skb, AF_INET);
+
skb_reset_mac_header(skb);
eth = eth_hdr(skb);
 
diff --git a/include/net/dst_metadata.h b/include/net/dst_metadata.h
index e843937..7b03068 100644
--- a/include/net/dst_metadata.h
+++ b/include/net/dst_metadata.h
@@ -23,13 +23,23 @@ static inline struct metadata_dst *skb_metadata_dst(struct 
sk_buff *skb)
return NULL;
 }
 
-static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
+static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb,
+int family)
 {
struct metadata_dst *md_dst = skb_metadata_dst(skb);
+   struct rtable *rt;
 
if (md_dst)
return &md_dst->u.tun_info;
 
+   switch (family) {
+   case AF_INET:
+   rt = (struct rtable *)skb_dst(skb);
+   if (rt && rt->rt_lwtstate)
+   return lwt_tun_info(rt->rt_lwtstate);
+   break;
+   }
+
return NULL;
 }
 
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index d11530f..0b7e18c 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -9,9 +9,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
+#include 
 
 #if IS_ENABLED(CONFIG_IPV6)
 #include 
@@ -298,6 +298,11 @@ static inline void *ip_tunnel_info_opts(struct 
ip_tunnel_info *info, size_t n)
return info + 1;
 }
 
+static inline struct ip_tunnel_info *lwt_tun_info(struct lwtunnel_state 
*lwtstate)
+{
+   return (struct ip_tunnel_info *)lwtstate->data;
+}
+
 #endif /* CONFIG_INET */
 
 #endif /* __NET_IP_TUNNELS_H */
diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
index aa611d9..31377bb 100644
--- a/include/uapi/linux/lwtunnel.h
+++ b/include/uapi/linux/lwtunnel.h
@@ -6,6 +6,7 @@
 enum lwtunnel_encap_types {
LWTUNNEL_ENCAP_NONE,
LWTUNNEL_ENCAP_MPLS,
+   LWTUNNEL_ENCAP_IP,
__LWTUNNEL_ENCAP_MAX,
 };
 
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 0d3d3cc..47d24cb 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -286,6 +286,21 @@ enum rt_class_t {
 
 /* Routing message attributes */
 
+enum ip_tunnel_t {
+   IP_TUN_UNSPEC,
+   IP_TUN_ID,
+   IP_TUN_DST,
+   IP_TUN_SRC,
+   IP_TUN_TTL,
+   IP_TUN_TOS,
+   IP_TUN_SPORT,
+   IP_TUN_DPORT,
+   IP_TUN_FLAGS,
+   __IP_TUN_MAX,
+};
+
+#define IP_TUN_MAX (__IP_TUN_MAX - 1)
+
 enum rtattr_type_t {
RTA_UNSPEC,
RTA_DST,
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 6a51a71..025b76e 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -190,3 +190,117 @@ struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct 
net_device *dev,
return tot;
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64);
+
+static const stru

[ovs-dev] [PATCH net-next 20/22] openvswitch: Move dev pointer into vport itself

2015-07-17 Thread Thomas Graf

This is the first step in representing all OVS vports as regular
struct net_devices. Move the net_device pointer into the vport
structure itself to get rid of struct vport_netdev.

Signed-off-by: Thomas Graf 
Signed-off-by: Pravin B Shelar 
---
 net/openvswitch/datapath.c   |  7 +--
 net/openvswitch/dp_notify.c  |  5 +--
 net/openvswitch/vport-internal_dev.c | 37 +++-
 net/openvswitch/vport-netdev.c   | 86 
 net/openvswitch/vport-netdev.h   | 12 -
 net/openvswitch/vport.h  |  3 +-
 6 files changed, 59 insertions(+), 91 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 0208210..19df28e 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -188,7 +188,7 @@ static int get_dpifindex(const struct datapath *dp)
 
local = ovs_vport_rcu(dp, OVSP_LOCAL);
if (local)
-   ifindex = netdev_vport_priv(local)->dev->ifindex;
+   ifindex = local->dev->ifindex;
else
ifindex = 0;
 
@@ -2219,13 +2219,10 @@ static void __net_exit list_vports_from_net(struct net 
*net, struct net *dnet,
struct vport *vport;
 
hlist_for_each_entry(vport, &dp->ports[i], 
dp_hash_node) {
-   struct netdev_vport *netdev_vport;
-
if (vport->ops->type != OVS_VPORT_TYPE_INTERNAL)
continue;
 
-   netdev_vport = netdev_vport_priv(vport);
-   if (dev_net(netdev_vport->dev) == dnet)
+   if (dev_net(vport->dev) == dnet)
list_add(&vport->detach_list, head);
}
}
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index 2c631fe..a7a80a6 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -58,13 +58,10 @@ void ovs_dp_notify_wq(struct work_struct *work)
struct hlist_node *n;
 
hlist_for_each_entry_safe(vport, n, &dp->ports[i], 
dp_hash_node) {
-   struct netdev_vport *netdev_vport;
-
if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
continue;
 
-   netdev_vport = netdev_vport_priv(vport);
-   if (!(netdev_vport->dev->priv_flags & 
IFF_OVS_DATAPATH))
+   if (!(vport->dev->priv_flags & 
IFF_OVS_DATAPATH))
dp_detach_port_notify(vport);
}
}
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index 6a55f71..a2c205d 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -156,49 +156,44 @@ static void do_setup(struct net_device *netdev)
 static struct vport *internal_dev_create(const struct vport_parms *parms)
 {
struct vport *vport;
-   struct netdev_vport *netdev_vport;
struct internal_dev *internal_dev;
int err;
 
-   vport = ovs_vport_alloc(sizeof(struct netdev_vport),
-   &ovs_internal_vport_ops, parms);
+   vport = ovs_vport_alloc(0, &ovs_internal_vport_ops, parms);
if (IS_ERR(vport)) {
err = PTR_ERR(vport);
goto error;
}
 
-   netdev_vport = netdev_vport_priv(vport);
-
-   netdev_vport->dev = alloc_netdev(sizeof(struct internal_dev),
-parms->name, NET_NAME_UNKNOWN,
-do_setup);
-   if (!netdev_vport->dev) {
+   vport->dev = alloc_netdev(sizeof(struct internal_dev),
+ parms->name, NET_NAME_UNKNOWN, do_setup);
+   if (!vport->dev) {
err = -ENOMEM;
goto error_free_vport;
}
 
-   dev_net_set(netdev_vport->dev, ovs_dp_get_net(vport->dp));
-   internal_dev = internal_dev_priv(netdev_vport->dev);
+   dev_net_set(vport->dev, ovs_dp_get_net(vport->dp));
+   internal_dev = internal_dev_priv(vport->dev);
internal_dev->vport = vport;
 
/* Restrict bridge port to current netns. */
if (vport->port_no == OVSP_LOCAL)
-   netdev_vport->dev->features |= NETIF_F_NETNS_LOCAL;
+   vport->dev->features |= NETIF_F_NETNS_LOCAL;
 
rtnl_lock();
-   err = register_netdevice(netdev_vport->dev);
+   err = register_netdevice(vport->dev);
if (err)
goto error_free_netdev;
 
-   dev_set_promiscuity(netdev_vport->dev, 1);
+   dev_set_promiscuity(vport->dev, 1);
rtnl_unlock();
-   netif_start_queue(netdev_vport->dev);
+   netif_start_queue(vport->dev);

[ovs-dev] [PATCH net-next 22/22] openvswitch: Use regular VXLAN net_device device

2015-07-17 Thread Thomas Graf

This gets rid of all OVS specific VXLAN code in the receive and
transmit path by using a VXLAN net_device to represent the vport.
Only a small shim layer remains which takes care of handling the
VXLAN specific OVS Netlink configuration.

Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb()
since they are no longer needed.

Signed-off-by: Thomas Graf 
Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c| 242 +++
 include/net/rtnetlink.h|   1 +
 include/net/vxlan.h|  24 +--
 net/core/rtnetlink.c   |  26 ++--
 net/openvswitch/Kconfig|  12 --
 net/openvswitch/Makefile   |   1 -
 net/openvswitch/flow_netlink.c |   6 +-
 net/openvswitch/vport-netdev.c | 201 -
 net/openvswitch/vport-vxlan.c  | 322 -
 net/openvswitch/vport-vxlan.h  |  11 --
 10 files changed, 339 insertions(+), 507 deletions(-)
 delete mode 100644 net/openvswitch/vport-vxlan.c
 delete mode 100644 net/openvswitch/vport-vxlan.h

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 5ae6c0c..76466ef 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -75,6 +75,9 @@ static struct rtnl_link_ops vxlan_link_ops;
 
 static const u8 all_zeros_mac[ETH_ALEN];
 
+static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
+bool no_share, u32 flags);
+
 /* per-network namespace private data for this module */
 struct vxlan_net {
struct list_head  vxlan_list;
@@ -1027,7 +1030,7 @@ static bool vxlan_group_used(struct vxlan_net *vn, struct 
vxlan_dev *dev)
return false;
 }
 
-void vxlan_sock_release(struct vxlan_sock *vs)
+static void vxlan_sock_release(struct vxlan_sock *vs)
 {
struct sock *sk = vs->sock->sk;
struct net *net = sock_net(sk);
@@ -1043,7 +1046,6 @@ void vxlan_sock_release(struct vxlan_sock *vs)
 
queue_work(vxlan_wq, &vs->del_work);
 }
-EXPORT_SYMBOL_GPL(vxlan_sock_release);
 
 /* Update multicast group membership when first VNI on
  * multicast address is brought up
@@ -1126,6 +1128,102 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff 
*skb, struct vxlanhdr *vh,
return vh;
 }
 
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+ struct vxlan_metadata *md, u32 vni,
+ struct metadata_dst *tun_dst)
+{
+   struct iphdr *oip = NULL;
+   struct ipv6hdr *oip6 = NULL;
+   struct vxlan_dev *vxlan;
+   struct pcpu_sw_netstats *stats;
+   union vxlan_addr saddr;
+   int err = 0;
+   union vxlan_addr *remote_ip;
+
+   /* For flow based devices, map all packets to VNI 0 */
+   if (vs->flags & VXLAN_F_FLOW_BASED)
+   vni = 0;
+
+   /* Is this VNI defined? */
+   vxlan = vxlan_vs_find_vni(vs, vni);
+   if (!vxlan)
+   goto drop;
+
+   remote_ip = &vxlan->default_dst.remote_ip;
+   skb_reset_mac_header(skb);
+   skb_scrub_packet(skb, !net_eq(vxlan->net, dev_net(vxlan->dev)));
+   skb->protocol = eth_type_trans(skb, vxlan->dev);
+   skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
+
+   /* Ignore packet loops (and multicast echo) */
+   if (ether_addr_equal(eth_hdr(skb)->h_source, vxlan->dev->dev_addr))
+   goto drop;
+
+   /* Re-examine inner Ethernet packet */
+   if (remote_ip->sa.sa_family == AF_INET) {
+   oip = ip_hdr(skb);
+   saddr.sin.sin_addr.s_addr = oip->saddr;
+   saddr.sa.sa_family = AF_INET;
+#if IS_ENABLED(CONFIG_IPV6)
+   } else {
+   oip6 = ipv6_hdr(skb);
+   saddr.sin6.sin6_addr = oip6->saddr;
+   saddr.sa.sa_family = AF_INET6;
+#endif
+   }
+
+   if (tun_dst) {
+   skb_dst_set(skb, (struct dst_entry *)tun_dst);
+   tun_dst = NULL;
+   }
+
+   if ((vxlan->flags & VXLAN_F_LEARN) &&
+   vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source))
+   goto drop;
+
+   skb_reset_network_header(skb);
+   /* In flow-based mode, GBP is carried in dst_metadata */
+   if (!(vs->flags & VXLAN_F_FLOW_BASED))
+   skb->mark = md->gbp;
+
+   if (oip6)
+   err = IP6_ECN_decapsulate(oip6, skb);
+   if (oip)
+   err = IP_ECN_decapsulate(oip, skb);
+
+   if (unlikely(err)) {
+   if (log_ecn_error) {
+   if (oip6)
+   net_info_ratelimited("non-ECT from %pI6\n",
+&oip6->saddr);
+   if (oip)
+   net_info_ratelimited("non-ECT from %pI4 with 
TOS=%#x\n",
+&oip->saddr, oip->tos);
+   }
+   if (err > 1) {
+   ++vxlan->dev->stats.rx_frame_errors;
+   ++vxlan->d

[ovs-dev] [PATCH net-next 19/22] openvswitch: Make tunnel set action attach a metadata dst

2015-07-17 Thread Thomas Graf

Utilize the new metadata dst to attach encapsulation instructions to
the skb. The existing egress_tun_info via the OVS_CB() is left in
place until all tunnel vports have been converted to the new method.

Signed-off-by: Thomas Graf 
Signed-off-by: Pravin B Shelar 
---
 net/openvswitch/actions.c  | 10 ++-
 net/openvswitch/datapath.c |  8 +++---
 net/openvswitch/flow.h |  5 
 net/openvswitch/flow_netlink.c | 64 +-
 net/openvswitch/flow_netlink.h |  1 +
 net/openvswitch/flow_table.c   |  4 ++-
 6 files changed, 79 insertions(+), 13 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 27c1687..cf04c2f 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -733,7 +733,15 @@ static int execute_set_action(struct sk_buff *skb,
 {
/* Only tunnel set execution is supported without a mask. */
if (nla_type(a) == OVS_KEY_ATTR_TUNNEL_INFO) {
-   OVS_CB(skb)->egress_tun_info = nla_data(a);
+   struct ovs_tunnel_info *tun = nla_data(a);
+
+   skb_dst_drop(skb);
+   dst_hold((struct dst_entry *)tun->tun_dst);
+   skb_dst_set(skb, (struct dst_entry *)tun->tun_dst);
+
+   /* FIXME: Remove when all vports have been converted */
+   OVS_CB(skb)->egress_tun_info = &tun->tun_dst->u.tun_info;
+
return 0;
}
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index ff8c4a4..0208210 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1018,7 +1018,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
}
ovs_unlock();
 
-   ovs_nla_free_flow_actions(old_acts);
+   ovs_nla_free_flow_actions_rcu(old_acts);
ovs_flow_free(new_flow, false);
}
 
@@ -1030,7 +1030,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
 err_kfree_acts:
-   kfree(acts);
+   ovs_nla_free_flow_actions(acts);
 err_kfree_flow:
ovs_flow_free(new_flow, false);
 error:
@@ -1157,7 +1157,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
if (reply)
ovs_notify(&dp_flow_genl_family, reply, info);
if (old_acts)
-   ovs_nla_free_flow_actions(old_acts);
+   ovs_nla_free_flow_actions_rcu(old_acts);
 
return 0;
 
@@ -1165,7 +1165,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
 err_kfree_acts:
-   kfree(acts);
+   ovs_nla_free_flow_actions(acts);
 error:
return error;
 }
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index cadc6c5..b62cdb3 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct sk_buff;
 
@@ -45,6 +46,10 @@ struct sk_buff;
 #define TUN_METADATA_OPTS(flow_key, opt_len) \
((void *)((flow_key)->tun_opts + TUN_METADATA_OFFSET(opt_len)))
 
+struct ovs_tunnel_info {
+   struct metadata_dst *tun_dst;
+};
+
 #define OVS_SW_FLOW_KEY_METADATA_SIZE  \
(offsetof(struct sw_flow_key, recirc_id) +  \
FIELD_SIZEOF(struct sw_flow_key, recirc_id))
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index ecfa530..e7906df 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1548,11 +1548,48 @@ static struct sw_flow_actions 
*nla_alloc_flow_actions(int size, bool log)
return sfa;
 }
 
+static void ovs_nla_free_set_action(const struct nlattr *a)
+{
+   const struct nlattr *ovs_key = nla_data(a);
+   struct ovs_tunnel_info *ovs_tun;
+
+   switch (nla_type(ovs_key)) {
+   case OVS_KEY_ATTR_TUNNEL_INFO:
+   ovs_tun = nla_data(ovs_key);
+   dst_release((struct dst_entry *)ovs_tun->tun_dst);
+   break;
+   }
+}
+
+void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+{
+   const struct nlattr *a;
+   int rem;
+
+   if (!sf_acts)
+   return;
+
+   nla_for_each_attr(a, sf_acts->actions, sf_acts->actions_len, rem) {
+   switch (nla_type(a)) {
+   case OVS_ACTION_ATTR_SET:
+   ovs_nla_free_set_action(a);
+   break;
+   }
+   }
+
+   kfree(sf_acts);
+}
+
+static void __ovs_nla_free_flow_actions(struct rcu_head *head)
+{
+   ovs_nla_free_flow_actions(container_of(head, struct sw_flow_actions, 
rcu));
+}
+
 /* Schedules 'sf_acts' to be freed after the next RCU grace period.
  * The caller must hold rcu_read_lock for this to be sensible. */
-void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *sf_acts)
 {
-   kfree_rcu(sf_acts, rcu);
+   call_rcu(&sf_acts->rcu, __ovs_nla_free_flow_actions);

[ovs-dev] [PATCH net-next 21/22] openvswitch: Abstract vport name through ovs_vport_name()

2015-07-17 Thread Thomas Graf

This allows to get rid of the get_name() vport ops later on.

Signed-off-by: Thomas Graf 
---
 net/openvswitch/datapath.c   | 4 ++--
 net/openvswitch/vport-internal_dev.c | 1 -
 net/openvswitch/vport-netdev.c   | 6 --
 net/openvswitch/vport-netdev.h   | 1 -
 net/openvswitch/vport.c  | 4 ++--
 net/openvswitch/vport.h  | 5 +
 6 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 19df28e..ffe984f 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -176,7 +176,7 @@ static inline struct datapath *get_dp(struct net *net, int 
dp_ifindex)
 const char *ovs_dp_name(const struct datapath *dp)
 {
struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL);
-   return vport->ops->get_name(vport);
+   return ovs_vport_name(vport);
 }
 
 static int get_dpifindex(const struct datapath *dp)
@@ -1800,7 +1800,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, 
struct sk_buff *skb,
if (nla_put_u32(skb, OVS_VPORT_ATTR_PORT_NO, vport->port_no) ||
nla_put_u32(skb, OVS_VPORT_ATTR_TYPE, vport->ops->type) ||
nla_put_string(skb, OVS_VPORT_ATTR_NAME,
-  vport->ops->get_name(vport)))
+  ovs_vport_name(vport)))
goto nla_put_failure;
 
ovs_vport_get_stats(vport, &vport_stats);
diff --git a/net/openvswitch/vport-internal_dev.c 
b/net/openvswitch/vport-internal_dev.c
index a2c205d..c058bbf 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -242,7 +242,6 @@ static struct vport_ops ovs_internal_vport_ops = {
.type   = OVS_VPORT_TYPE_INTERNAL,
.create = internal_dev_create,
.destroy= internal_dev_destroy,
-   .get_name   = ovs_netdev_get_name,
.send   = internal_dev_recv,
 };
 
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 1c96966..e682bdc 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -171,11 +171,6 @@ static void netdev_destroy(struct vport *vport)
call_rcu(&vport->rcu, free_port_rcu);
 }
 
-const char *ovs_netdev_get_name(const struct vport *vport)
-{
-   return vport->dev->name;
-}
-
 static unsigned int packet_length(const struct sk_buff *skb)
 {
unsigned int length = skb->len - ETH_HLEN;
@@ -223,7 +218,6 @@ static struct vport_ops ovs_netdev_vport_ops = {
.type   = OVS_VPORT_TYPE_NETDEV,
.create = netdev_create,
.destroy= netdev_destroy,
-   .get_name   = ovs_netdev_get_name,
.send   = netdev_send,
 };
 
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 1c52aed..684fb88 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -26,7 +26,6 @@
 
 struct vport *ovs_netdev_get_vport(struct net_device *dev);
 
-const char *ovs_netdev_get_name(const struct vport *);
 void ovs_netdev_detach_dev(struct vport *);
 
 int __init ovs_netdev_init(void);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index af23ba0..d14f594 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -113,7 +113,7 @@ struct vport *ovs_vport_locate(const struct net *net, const 
char *name)
struct vport *vport;
 
hlist_for_each_entry_rcu(vport, bucket, hash_node)
-   if (!strcmp(name, vport->ops->get_name(vport)) &&
+   if (!strcmp(name, ovs_vport_name(vport)) &&
net_eq(ovs_dp_get_net(vport->dp), net))
return vport;
 
@@ -226,7 +226,7 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
}
 
bucket = hash_bucket(ovs_dp_get_net(vport->dp),
-vport->ops->get_name(vport));
+ovs_vport_name(vport));
hlist_add_head_rcu(&vport->hash_node, bucket);
return vport;
}
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index e05ec68..1a689c2 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -237,6 +237,11 @@ static inline void ovs_skb_postpush_rcsum(struct sk_buff 
*skb,
skb->csum = csum_add(skb->csum, csum_partial(start, len, 0));
 }
 
+static inline const char *ovs_vport_name(struct vport *vport)
+{
+   return vport->dev ? vport->dev->name : vport->ops->get_name(vport);
+}
+
 int ovs_vport_ops_register(struct vport_ops *ops);
 void ovs_vport_ops_unregister(struct vport_ops *ops);
 
-- 
2.4.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [RFC v2 02/11] ovn: Add bridge mappings to ovn-controller.

2015-07-17 Thread Russell Bryant

On 07/16/2015 07:56 PM, Ben Pfaff wrote:
> On Thu, Jul 16, 2015 at 04:55:11PM -0700, Ben Pfaff wrote:
>> On Thu, Jul 16, 2015 at 06:06:10PM -0400, Russell Bryant wrote:
>>> Add a new OVN configuration entry in the Open_vSwitch database called
>>> "ovn-bridge-mappings".  This allows the configuration of mappings
>>> between a physical network name and an OVS bridge that provides
>>> connectivity to that network.
>>>
>>> For example, if you wanted to configure "physnet1" to map to "br-eth0"
>>> and "physnet2" to map to "br-eth1", the configuration would be:
>>>
>>>   $ ovs-vsctl set open . \
>>>   > external-ids:ovn-bridge-mappings=physnet1:br-eth0,physnet2:br-eth1
>>>
>>> In this patch, the configuration option is only parsed and validated
>>> to make sure the referenced bridges actually exist.  Later patches
>>> will make use of the bridge mappings.
>>>
>>> Signed-off-by: Russell Bryant 
>>
>> Do later patches add documentation?  That will be important.
>>
>> I wonder whether we should document the ovn configuration external-ids
>> in vswitch.xml, or whether that would indicate too intimate a
>> relationship between OVN and OVS.x
> 
> Oh, also it doesn't compile:
> 
> ../ovn/controller/ovn-controller.c:118:13: error: undefined identifier 
> 'create_patch_ports'
> ../ovn/controller/ovn-controller.c: In function ‘parse_bridge_mappings’:
> ../ovn/controller/ovn-controller.c:118:9: error: implicit declaration of 
> function ‘create_patch_ports’ [-Werror=implicit-function-declaration]
> 

Oops ... I've done a bunch of rebasing, cherry-picks, etc ... I can fix
this up next time around.

-- 
Russell Bryant
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] ovs-ofctl mod-table commands supporting OF1.4 Eviction and Vacancy-Events

2015-07-17 Thread Ben Pfaff

On Fri, Jul 17, 2015 at 05:54:57PM +0530, Saloni Jain wrote:
> The main problem in the whole implementation which I am facing currently is 
> in encoding and decoding of table-mod config value.
> For table-config, as per the specification we can send only three values - 
> OFPTC14_EVICTION, OFPTC14_VACANCY_EVENTS and 0.

No, it's a bitmap.  You can use any combination of flags.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup until we don't get any failure.

2015-07-17 Thread Stokes, Ian

Hi All,

If this solution is acceptable and pushed to master, would it be possible 
to have it included as part of the OVS 2.4 release branch? (As these are 
bug fixes and not new features). 

Otherwise OVS with DPDK will be limited in its deployment use cases.

Regards
Ian Stokes

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Stokes, Ian
> Sent: Friday, July 17, 2015 12:08 PM
> To: Daniele Di Proietto; dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup
> until we don't get any failure.
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di
> > Proietto
> > Sent: Thursday, July 16, 2015 7:48 PM
> > To: dev@openvswitch.org
> > Subject: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup
> > until we don't get any failure.
> >
> > It has been observed that some DPDK device (e.g intel xl710) report an
> > high number of queues but make some of them available only for special
> > functions (SRIOV).  Therefore the queues will be counted in
> > rte_eth_dev_info_get(), but rte_eth_tx_queue_setup() will fail.
> >
> > This commit works around the issue by retrying the device
> initialization
> > with a smaller number of queues, if a queue fails to setup.
> >
> > Reported-by: Ian Stokes 
> > Signed-off-by: Daniele Di Proietto 
> > ---
> >  lib/netdev-dpdk.c | 100 +++--
> --
> > ---
> >  1 file changed, 73 insertions(+), 27 deletions(-)
> >
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 5ae805e..3444bb1 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -423,52 +423,98 @@ dpdk_watchdog(void *dummy OVS_UNUSED)
> >  }
> >
> >  static int
> > +dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int
> n_txq)
> > +{
> > +int diag = 0;
> > +int i;
> > +
> > +/* A device may report more queues than it makes available (this
> > has
> > + * been observed for Intel xl710, which reserves some of them for
> > + * SRIOV):  rte_eth_*_queue_setup will fail if a queue is not
> > + * available.  When this happens we can retry the configuration
> > + * and request less queues */
> > +while (n_rxq && n_txq) {
> > +if (diag) {
> > +VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq,
> > n_txq);
> > +}
> > +
> > +diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq,
> > &port_conf);
> > +if (diag) {
> > +break;
> > +}
> > +
> > +for (i = 0; i < n_txq; i++) {
> > +diag = rte_eth_tx_queue_setup(dev->port_id, i,
> > NIC_PORT_TX_Q_SIZE,
> > +  dev->socket_id, NULL);
> > +if (diag) {
> > +VLOG_INFO("Interface %s txq(%d) setup error: %s",
> > +  dev->up.name, i, rte_strerror(-diag));
> > +break;
> > +}
> > +}
> > +
> > +if (i != n_txq) {
> > +/* Retry with less tx queues */
> > +n_txq = i;
> > +continue;
> > +}
> > +
> > +for (i = 0; i < n_rxq; i++) {
> > +diag = rte_eth_rx_queue_setup(dev->port_id, i,
> > NIC_PORT_RX_Q_SIZE,
> > +  dev->socket_id, NULL,
> > +  dev->dpdk_mp->mp);
> > +if (diag) {
> > +VLOG_INFO("Interface %s rxq(%d) setup error: %s",
> > +  dev->up.name, i, rte_strerror(-diag));
> > +break;
> > +}
> > +}
> > +
> > +if (i != n_rxq) {
> > +/* Retry with less rx queues */
> > +n_rxq = i;
> > +continue;
> > +}
> > +
> > +dev->up.n_rxq = n_rxq;
> > +dev->real_n_txq = n_txq;
> > +
> > +return 0;
> > +}
> > +
> > +return diag;
> > +}
> > +
> > +
> > +static int
> >  dpdk_eth_dev_init(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex)
> >  {
> >  struct rte_pktmbuf_pool_private *mbp_priv;
> >  struct rte_eth_dev_info info;
> >  struct ether_addr eth_addr;
> >  int diag;
> > -int i;
> > +int n_rxq, n_txq;
> >
> >  if (dev->port_id < 0 || dev->port_id >= rte_eth_dev_count()) {
> >  return ENODEV;
> >  }
> >
> >  rte_eth_dev_info_get(dev->port_id, &info);
> > -dev->up.n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
> > -dev->real_n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> >
> > -diag = rte_eth_dev_configure(dev->port_id, dev->up.n_rxq, dev-
> > >real_n_txq,
> > - &port_conf);
> > +n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq);
> > +n_txq = MIN(info.max_tx_queues, dev->up.n_txq);
> > +
> > +diag = dpdk_eth_dev_queue_setup(dev, n_rxq, n_txq);
> >  if (diag) {
> > -VLOG_ERR("eth dev config error %d. rxq:%d txq:%d", diag, dev-

Re: [ovs-dev] [PATCH v4] flow: Split miniflow's map.

2015-07-17 Thread Ben Pfaff

On Thu, Jul 16, 2015 at 05:47:00PM -0700, Jarno Rajahalme wrote:
> Use two maps in miniflow to allow for expansion of struct flow past
> 512 bytes.  We now have one map for tunnel related fields, and another
> for the rest of the packet metadata and actual packet header fields.
> This split has the benefit that for non-tunneled packets the overhead
> should be minimal.
> 
> Some miniflow utilities now exist in two variants, new ones operating
> over all the data, and the old ones operating only on a single 64-bit
> map at a time.  The old ones require doubling of code but should
> execute faster, so those are used in the datapath and classifier's
> lookup path.
> 
> Signed-off-by: Jarno Rajahalme 

This version passes tests and does not cause any sparse warnings.  Thank
you!

I am a little surprised to see two named bitmaps instead of an array of
two elements.  Names are nice for some things, but other times it is
convenient to be able to use loops to iterate, and of course arrays
generalize better.

This change to dpif-netdev.c looks like an independent bug fix to me:

@@ -1892,10 +1913,11 @@ dpif_netdev_mask_from_nlattrs(const struct nlattr *key, 
uint32_t key_len,
 memset(mask, 0x0, sizeof *mask);
 
 for (id = 0; id < MFF_N_IDS; ++id) {
 /* Skip registers and metadata. */
 if (!(id >= MFF_REG0 && id < MFF_REG0 + FLOW_N_REGS)
+&& !(id >= MFF_XREG0 && id < MFF_XREG0 + FLOW_N_XREGS)
 && id != MFF_METADATA) {
 const struct mf_field *mf = mf_from_id(id);
 if (mf_are_prereqs_ok(mf, flow)) {
 mf_mask_field(mf, mask);
 }

Acked-by: Ben Pfaff 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v4] flow: Split miniflow's map.

2015-07-17 Thread Ben Pfaff

On Fri, Jul 17, 2015 at 08:51:47AM -0700, Ben Pfaff wrote:
> On Thu, Jul 16, 2015 at 05:47:00PM -0700, Jarno Rajahalme wrote:
> > Use two maps in miniflow to allow for expansion of struct flow past
> > 512 bytes.  We now have one map for tunnel related fields, and another
> > for the rest of the packet metadata and actual packet header fields.
> > This split has the benefit that for non-tunneled packets the overhead
> > should be minimal.
> > 
> > Some miniflow utilities now exist in two variants, new ones operating
> > over all the data, and the old ones operating only on a single 64-bit
> > map at a time.  The old ones require doubling of code but should
> > execute faster, so those are used in the datapath and classifier's
> > lookup path.
> > 
> > Signed-off-by: Jarno Rajahalme 
> 
> This version passes tests and does not cause any sparse warnings.  Thank
> you!
> 
> I am a little surprised to see two named bitmaps instead of an array of
> two elements.  Names are nice for some things, but other times it is
> convenient to be able to use loops to iterate, and of course arrays
> generalize better.
> 
> This change to dpif-netdev.c looks like an independent bug fix to me:
> 
> @@ -1892,10 +1913,11 @@ dpif_netdev_mask_from_nlattrs(const struct nlattr 
> *key, uint32_t key_len,
>  memset(mask, 0x0, sizeof *mask);
>  
>  for (id = 0; id < MFF_N_IDS; ++id) {
>  /* Skip registers and metadata. */
>  if (!(id >= MFF_REG0 && id < MFF_REG0 + FLOW_N_REGS)
> +&& !(id >= MFF_XREG0 && id < MFF_XREG0 + FLOW_N_XREGS)
>  && id != MFF_METADATA) {
>  const struct mf_field *mf = mf_from_id(id);
>  if (mf_are_prereqs_ok(mf, flow)) {
>  mf_mask_field(mf, mask);
>  }
> 
> Acked-by: Ben Pfaff 

Oh, here are some comment suggestions:

diff --git a/lib/classifier-private.h b/lib/classifier-private.h
index c4c6ce9..3a150ab 100644
--- a/lib/classifier-private.h
+++ b/lib/classifier-private.h
@@ -226,7 +226,11 @@ struct trie_node {
  * These are only used by the classifier, so place them here to allow
  * for better optimization. */
 
-/* TODO: Ensure that 'start' and 'end' are compile-time constants. */
+/* Initializes 'map->tnl_map' and 'map->pkt_map' with a subset of 'miniflow'
+ * that includes only the portions with u64-offset 'i' such that start <= i <
+ * end.  Does not copy any data from 'miniflow' to 'map'.
+ *
+ * TODO: Ensure that 'start' and 'end' are compile-time constants. */
 static inline unsigned int /* offset */
 miniflow_get_map_in_range(const struct miniflow *miniflow,
   uint8_t start, uint8_t end, struct miniflow *map)
diff --git a/lib/flow.h b/lib/flow.h
index 85a9792..96aa4aa 100644
--- a/lib/flow.h
+++ b/lib/flow.h
@@ -393,8 +393,8 @@ BUILD_ASSERT_DECL(FLOW_U64S - FLOW_TNL_U64S <= 64);
  * 0-bit indicates that the corresponding uint64_t is zero, each 1-bit that it
  * *may* be nonzero (see below how this applies to minimasks).
  *
- * The values indicated by 'tnl_map' and 'pkt_map' always follow the 'map' in
- * memory.  The user of the miniflow is responsible for always having enough
+ * The values indicated by 'tnl_map' and 'pkt_map' always follow the miniflow
+ * in memory.  The user of the miniflow is responsible for always having enough
  * storage after the struct miniflow corresponding to the number of 1-bits in
  * maps.
  *
@@ -409,7 +409,9 @@ BUILD_ASSERT_DECL(FLOW_U64S - FLOW_TNL_U64S <= 64);
 struct miniflow {
 uint64_t tnl_map;
 uint64_t pkt_map;
-/* uint64_t values[];   Storage follows 'map' in memory. */
+/* Followed by:
+ * uint64_t values[n];
+ * where 'n' is miniflow_n_values(miniflow). */
 };
 BUILD_ASSERT_DECL(sizeof(struct miniflow) == 2 * sizeof(uint64_t));
 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 1/3] tests: Check for core files before exiting.

2015-07-17 Thread Ben Pfaff

On Thu, Jul 16, 2015 at 05:22:25PM -0700, Jarno Rajahalme wrote:
> 
> > On Jul 16, 2015, at 4:33 PM, Ben Pfaff  wrote:
> > 
> > On Thu, Jul 16, 2015 at 03:15:52PM -0700, Jarno Rajahalme wrote:
> >> I've seen core files appear and then be automatically removed as the
> >> test case was successful.  Such success is highly doubtful, so fail
> >> the test cases if any core files exist at the end of the test.
> >> 
> >> Signed-off-by: Jarno Rajahalme 
> > 
> > I proposed a similar patch in May 2014:
> >http://openvswitch.org/pipermail/dev/2014-May/040497.html
> > but you didn't like it:
> >http://openvswitch.org/pipermail/dev/2014-May/040857.html
> 
> My comment at the time was that I did not see the result of the line
> 
> echo "$core: core dumped during test"
> anywhere, but now I see that this is simply due to the fact that the test 
> case failed on an earlier AT_CHECK and never got to checking the cores.

I made the same argument before too:
http://openvswitch.org/pipermail/dev/2014-June/041301.html
;-)

> So I see that this patch has the same limitation. Do you have any idea
> how to check and report for core files regardless of the success or
> failure of the test case? I think this would be important as I’ve seen
> cores in both cases. In success case we currently lose the fact that
> there even was a core dump, and this likely happens also in the
> failure case if we blindly run a —recheck and by chance succeed that
> time.
> 
> Right now I habitually run “find . -name core -print” from a shell
> after each “make check” that has any failures before a —recheck. I’d
> like to automate this somehow! And this doesn’t even catch the cores
> of successful test cases. The only reason I know they exist was due to
> running the find command multiple times while “make check” was
> running, and I saw some core files that had disappeared in later find
> runs.

Well, something like this would do it:

diff --git a/tests/atlocal.in b/tests/atlocal.in
index 5946a3c..5baa9ec 100644
--- a/tests/atlocal.in
+++ b/tests/atlocal.in
@@ -110,3 +110,11 @@ fi
 if test "$IS_WIN32" = "yes"; then
 HAVE_PYTHON="no"
 fi
+
+trap '
+if find "$at_suite_dir" -name core\* -print | grep .; then
+echo
+echo "WARNING: See above for list of core dumps produced by tests."
+echo
+fi
+' 0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] [OVN] Add QoS to NB schema

2015-07-17 Thread Ben Pfaff

[Adding back ovs-dev, hope that's OK]

Hmm, shutting down your QoS effort is not my goal.

What if we consider this patch as the start of an RFC series for the NB
part of QoS and then you work on the implications of pushing it down
through ovn-northd into SB and ovn-controller/OpenFlow?  That's where I
think the tricky parts are going to be and the NB schema details aren't
too important for now.

On Fri, Jul 17, 2015 at 07:43:43AM +0300, Gal Sagie wrote:
> Ok, np.
> 
> I really just wanted to contribute and help more... also inside OVN and not
> just in Openstack
> 
> (I felt this could be a nice feature handled in parallel, especially since
> the features are already
> agreed upon in Neutron)
> 
> On Fri, Jul 17, 2015 at 2:48 AM, Ben Pfaff  wrote:
> 
> > I guess I mean that the design has to be skeletal for now because we
> > don't know what Neutron is going to do, so we might as well wait for
> > Neutron to do what it's going to do and then use it.
> >
> > On Thu, Jul 16, 2015 at 11:26:14PM +0300, Gal Sagie wrote:
> > > So you prefer that i will not work on QoS at this time?
> > > Or did you mean something else?
> > >
> > > On Thu, Jul 16, 2015 at 11:14 PM, Ben Pfaff  wrote:
> > >
> > > > On Thu, Jul 16, 2015 at 09:47:19PM +0300, Gal Sagie wrote:
> > > > > If i understand your concern right, you worry that the NB schema is
> > > > getting
> > > > > too big.
> > > > > I can't think of a better way to do this both use full and smaller,
> > if
> > > > you
> > > > > feel that we don't want to support
> > > > > QoS at this point, thats fine. but this is another limit for API
> > which
> > > > will
> > > > > be supported by OpenStack reference implementation
> > > > > and not by OVN, and if we aim for high adoption of OVN, we need to
> > make
> > > > > sure this gap is as small
> > > > > as possible (at least thats how i see it)
> > > >
> > > > OVSDB really shines when it comes to upgrades that add columns or
> > > > tables.  We do it routinely in OVS.  There's basically no reason not to
> > > > do it later.
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards ,
> > >
> > > The G.
> >
> 
> 
> 
> -- 
> Best Regards ,
> 
> The G.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] rhel: fix ifup-ovs to delete ports first

2015-07-17 Thread Flavio Leitner

When ifdown isn't executed (system didn't shut down properly),
the interfaces remain in the openvswitch's database.  For the
internal ports or devices that are available when openvswitch
service starts that's not an issue.

However, ovs-vsctl won't do anything for devices created later
(Linux vlan devices for instance) since they are already in the
database. That leaves a inconsistency behind because they will
be left out of the kernel's datapath.

The ifup/ifdown will operate only on configured interfaces, so
this patch fixes the issue by deleting the interface from the
database before attempt to configure it.

Signed-off-by: Flavio Leitner 
---
 rhel/etc_sysconfig_network-scripts_ifup-ovs | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/rhel/etc_sysconfig_network-scripts_ifup-ovs 
b/rhel/etc_sysconfig_network-scripts_ifup-ovs
index 05f70f6..478c5c3 100755
--- a/rhel/etc_sysconfig_network-scripts_ifup-ovs
+++ b/rhel/etc_sysconfig_network-scripts_ifup-ovs
@@ -117,7 +117,11 @@ case "$TYPE" in
OVSPort)
ifup_ovs_bridge
${OTHERSCRIPT} ${CONFIG} ${2}
-   ovs-vsctl -t ${TIMEOUT} -- --may-exist add-port "$OVS_BRIDGE" 
"$DEVICE" $OVS_OPTIONS ${OVS_EXTRA+-- $OVS_EXTRA}
+   # The port might be already in the database but not yet
+   # in the datapath.  So, remove the stale interface first.
+   ovs-vsctl -t ${TIMEOUT} \
+   -- --if-exists del-port "$OVS_BRIDGE" "$DEVICE" \
+   -- add-port "$OVS_BRIDGE" "$DEVICE" $OVS_OPTIONS 
${OVS_EXTRA+-- $OVS_EXTRA}
OVSINTF=${DEVICE} /sbin/ifup "$OVS_BRIDGE"
;;
OVSIntPort)
-- 
2.1.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] About one abort in vswitchd

2015-07-17 Thread Alex Wang

On Thu, Jul 16, 2015 at 3:07 PM, Alex Wang  wrote:

> Sorry for this very delayed reply,
>
> I think I found the issue, in branch 2.1.*
>
> Will send out a fix soon,
>
>
Sorry I found my theory was wrong, things seem to be trickier~

Want to know if you have modified the source code?

Also, could you provide backtraces for all other threads?

Thanks,
Alex Wang,




> At the same time, it branch >= 2.3 do not have this issue due to the use of
> ovs-rcu...  And since branch 2.3 is the LTS branch, I would really
> recommend
> you to switch to 2.3+
>
> Thanks,
> Alex Wang,
>
> On Sun, Jul 12, 2015 at 6:31 AM, 马啸  wrote:
>
>> Hi,all
>>
>> Is anybody know what the problem caused by and how to solve it ?
>>
>> Look forward for your replying.
>>
>>
>>
>>
>>
>>
>> At 2015-07-10 18:32:07, "马啸"  wrote:
>>
>> Hi,all
>>
>> The version is 2.1.2 .   openvswitch-2.1.2-26.el6.x86_64
>>
>> The information as bellow:
>>
>> 
>> =
>>
>> Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock
>> -vconsole:emer -vsyslog:err -vfi'.
>>
>> Program terminated with signal 6, Aborted.
>>
>> #0  0x7f78d4073925 in ?? () from /lib64/libc.so.6
>>
>>
>> ……….
>>
>>
>> Thread 2 (Thread 0x7f78d2067700 (LWP 31449)):
>>
>> #0  0x7f78d465c40d in aio_suspend (list=0x7f78d20554b8,
>> nent=-771402644, timeout=0x0) at ../sysdeps/pthread/aio_suspend.c:172
>>
>> #1  0x004cea46 in async_append_wait (ap=0xc741a0) at
>> lib/async-append-aio.c:106
>>
>> #2  0x004ceaa8 in async_append_flush (ap=0xc741a0) at
>> lib/async-append-aio.c:161
>>
>> #3  0x004ab618 in vlog_valist (module=0x75c620, level=VLL_EMER,
>> message=0x4fbf88 "%s: assertion %s failed in %s()", args=0x7f78d20555c0) at
>> lib/vlog.c:914
>>
>> #4  0x004ab64b in vlog_abort_valist (module_=> out>, message=0x4fbf88 "%s: assertion %s failed in %s()",
>> args=0x7f78d20555c0) at lib/vlog.c:991
>>
>> #5  0x004ab6e6 in vlog_abort (module=0x7f78d205546c, message=0x80
>> ) at lib/vlog.c:1006
>>
>> #6  0x004a7161 in ovs_assert_failure (where=> out>, function=, condition=) at
>> lib/util.c:68
>>
>> #7  0x0042a903 in dpif_sflow_ref (ds_=) at
>> ofproto/ofproto-dpif-sflow.c:342
>>
>> #8  0x0042ee8a in xlate_receive (backer=0xd9e420,
>> packet=0x7f7878000900, key=, key_len=> out>, flow=0x7f78d2066ad0, fitnessp=0x7f78d2055850,
>>
>> ofproto=0x7f78d20668a0, ipfix=0x7f78d2066890, sflow=0x7f78d2066898,
>> netflow=0x0, odp_in_port=0x7f78d20668ac) at ofproto/ofproto-dpif-xlate.c:593
>>
>> #9  0x0042cf67 in handle_upcalls (handler=,
>> upcalls=0x7f78d2066bd0) at ofproto/ofproto-dpif-upcall.c:960
>>
>> #10 0x0042dad9 in udpif_upcall_handler (arg=0xe7a0a8) at
>> ofproto/ofproto-dpif-upcall.c:680
>>
>> #11 0x7f78d3e2b9d1 in start_thread (arg=0x7f78d2067700) at
>> pthread_create.c:301
>>
>> #12 0x7f78d4129b6d in epoll_pwait (epfd=,
>> events=, maxevents=,
>> timeout=, set=)
>>
>> at ../sysdeps/unix/sysv/linux/epoll_pwait.c:50
>>
>> #13 0x in ?? ()
>>
>> (gdb) t 2
>>
>> [Switching to thread 2 (Thread 0x7f78d2067700 (LWP 31449))]#0
>> 0x7f78d465c40d in aio_suspend (list=0x7f78d20554b8, nent=-771402644,
>> timeout=0x0) at ../sysdeps/pthread/aio_suspend.c:172
>>
>> 172   AIO_MISC_WAIT (result, cntr, timeout, 1);
>>
>> (gdb) f 6
>>
>> #6  0x004a7161 in ovs_assert_failure (where=> out>, function=, condition=) at
>> lib/util.c:68
>>
>> 68  VLOG_ABORT("%s: assertion %s failed in %s()",
>>
>> (gdb) f 7
>>
>> #7  0x0042a903 in dpif_sflow_ref (ds_=) at
>> ofproto/ofproto-dpif-sflow.c:342
>>
>> 342 ovs_assert(orig > 0);
>>
>> (gdb) p orig
>>
>> $1 = 0
>>
>> (gdb)
>> 
>> =
>>
>> Thanks!
>>
>>
>>
>>
>> At 2015-07-10 11:54:00, "Jesse Gross"  wrote:
>> >This could be related to b953042214201e2693a485a8ba8b19f69e5bdf34
>> >("datapath: simplify sample action implementation"). I would check
>> >that you are using OVS 2.3.2 for anything related to sampling.
>> >
>> >On Thu, Jul 9, 2015 at 8:43 PM, Alex Wang  wrote:
>> >> Hey,
>> >>
>> >> Could you send the core dump info (did not see any attachment)?  I assume
>> >> you
>> >> mean the gdb printout showing what causes the crash.  Also, could you
>> >> provide
>> >> the ovs version you are using?
>> >>
>> >> I'm trying to debug an ipfix related crash, could we related,
>> >>
>> >> Thanks, 谢谢,
>> >> Alex Wang,
>> >>
>> >> On Thu, Jul 9, 2015 at 8:01 PM, 马啸  wrote:
>> >>
>> >>> Hi,all
>> >>>
>> >>>
>> >>>   I am one engineer from UnitedStack, one OpenStack Provider.
>> >>>   We are using OpenvSwitch as the software-switch in OpenStack Compute 
>> >>> and
>> >>> Network Node, and we enabled sflow to monitor the traffic.
>> >>>  And one crash happened. The core-dump information is attached, could
>> >>> anybody help us to solve the problem?
>> >>>
>>

Re: [ovs-dev] [RFC net-next 22/22] openvswitch: Use regular GRE net_device instead of vport

2015-07-17 Thread Pravin Shelar

On Fri, Jul 17, 2015 at 3:58 AM, Thomas Graf  wrote:
> On 07/16/15 at 02:36pm, Pravin Shelar wrote:
>> On Thu, Jul 16, 2015 at 7:52 AM, Thomas Graf  wrote:
>> > I'm inclined to change this and use an in-kernel API as well to
>> > create the net_device just like VXLAN does in patch 21.
>> >
>> > Pravin, what do you think?
>>
>> About the vxlan APIs we also need to direct netlink interface for
>> userspace to configure vxlan device. This will allow us to remove
>> vxlan compat code from ovs vport-netdev.c in future.
>
> Do you mean creating the tunnel devices from user space? This would
> break existing users of the OVS Netlink interface. How do you want
> to prevent that?
To handle old interface there is compat code in netdev-vport in patch 22.

OVS userspace should be able to create any type of tunneling device
and then add it as netdev type vport. so that OVS has two types of
vport i.e. netdev and internal, rather than vport for each type of
tunnel.
This way we can keep compat code simple. All enhancements can be
directly done to new interface.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] type-props: Avoid a MSVC warning.

2015-07-17 Thread Gurucharan Shetty

Currently, MSVC complains when you have a macro of the
form TYPE_MAXIMUM(uint64_t) because a part of macro becomes
~(uint64_t)0 << 64 with a warning:

C4293: '<<' : shift count negative or too big, undefined behavior.

This commit makes changes to the macro to prevent that warning.

Suggested-by: Ben Pfaff 
Signed-off-by: Gurucharan Shetty 
---
 lib/type-props.h |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/type-props.h b/lib/type-props.h
index 8c83ea6..3c908a7 100644
--- a/lib/type-props.h
+++ b/lib/type-props.h
@@ -23,10 +23,10 @@
 #define TYPE_IS_SIGNED(TYPE) ((TYPE) 1 > (TYPE) -1)
 #define TYPE_VALUE_BITS(TYPE) (sizeof(TYPE) * CHAR_BIT - TYPE_IS_SIGNED(TYPE))
 #define TYPE_MINIMUM(TYPE) (TYPE_IS_SIGNED(TYPE) \
-? ~(TYPE)0 << TYPE_VALUE_BITS(TYPE) \
+? ~(TYPE)0 << (sizeof(TYPE) * 8 - 1) \
 : 0)
 #define TYPE_MAXIMUM(TYPE) (TYPE_IS_SIGNED(TYPE) \
-? ~(~(TYPE)0 << TYPE_VALUE_BITS(TYPE)) \
+? ~(~(TYPE)0 << (sizeof(TYPE) * 8 - 1)) \
 : (TYPE)-1)
 
 /* Number of decimal digits required to format an integer of the given TYPE.
-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 4/4] ovn: Fix extra token detection.

2015-07-17 Thread Joe Stringer

On 16 July 2015 at 13:52, Ben Pfaff  wrote:
> On Wed, Jul 15, 2015 at 10:18:29PM -0700, Joe Stringer wrote:
>> This code attempts to first check whether another error was detected for
>> the string it is parsing, then if it's not at the end of the tokens,
>> report an error. However, 'errorp' is always a valid pointer to a
>> 'char *', so the first check in this statement always evaluates false.
>>
>> Furthermore, this behaviour may be optimised out by modern compilers
>> due to the prior dereference in expr_parse(). Fix this to check the
>> actual value of *errorp.
>>
>> Found by MIT STACK analyzer.
>>
>> Signed-off-by: Joe Stringer 
>
> The tests should have caught this bug but I forgot to put in a test!
>
> Therefore please squash in the following:
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 261e32a..d1696de 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -255,6 +255,8 @@ eth.src > 00:00:00:00:11:11/00:00:00:00:ff:ff => Only == 
> and != operators may be
>  ip4.src == ::1 => 128-bit constant is not compatible with 32-bit field 
> ip4.src.
>
>  1 == eth.type == 2 => Range expressions must have the form `x < field < y' 
> or `x > field > y', with each `<' optionally replaced by `<=' or `>' by `>=').
> +
> +eth.dst[40] x => Extra tokens at end of input.
>  ]])
>  sed 's/ =>.*//' test-cases.txt > input.txt
>  sed 's/.* => //' test-cases.txt > expout
>
> Acked-by: Ben Pfaff 

Thanks, I rolled in your test and applied this series on top of master.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] acinclude: Silence OVS_FIND_FIELD_IFELSE.

2015-07-17 Thread Joe Stringer

Signed-off-by: Joe Stringer 
---
 acinclude.m4 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/acinclude.m4 b/acinclude.m4
index 14907ab..4f1e66c 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -271,7 +271,7 @@ dnl translated to uppercase.
 AC_DEFUN([OVS_FIND_FIELD_IFELSE], [
   AC_MSG_CHECKING([whether $2 has member $3 in $1])
   if test -f $1; then
-awk '/$2.{/,/^}/' $1 2>/dev/null | grep '$3'
+awk '/$2.{/,/^}/' $1 2>/dev/null | grep -q '$3'
 status=$?
 case $status in
   0)
-- 
2.1.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH net-next 14/22] vxlan: Flow based tunneling

2015-07-17 Thread Alexei Starovoitov


On 7/17/15 5:55 AM, Thomas Graf wrote:

@@ -2373,6 +2470,12 @@ static void vxlan_setup(struct net_device *dev)
netif_keep_dst(dev);
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;

+   /* If in flow based mode, keep the dst including encapsulation
+* instructions for vxlan_xmit().
+*/
+   if (vxlan->flags & VXLAN_F_FLOW_BASED)
+   netif_keep_dst(dev);


hmm, isn't this done already few lines above? ;)
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [RFC v2 04/11] ovn: Add patch ports for ovn bridge mappings.

2015-07-17 Thread Russell Bryant

On 07/16/2015 08:07 PM, Ben Pfaff wrote:
> On Thu, Jul 16, 2015 at 06:06:12PM -0400, Russell Bryant wrote:
>> While parsing the OVN bridge mapping configuration, ensure that patch
>> ports exist between the OVN integration bridge and the physical
>> network bridge.  If they do not exist, create them automatically.
>>
>> Signed-off-by: Russell Bryant 
> 
> Now it compiles again.

:-)

I moved the misplaced hunk to this patch.

> This raises a philosophical issue.  Currently OVN requires something
> else in the system, that runs before ovn-controller starts, to create
> the integration bridge.  This seems reasonable enough, but
> ovn-controller could do it itself.  Similarly, OVN could require
> something else in the system to add the ports to the integration bridge
> before it starts up.  That would put a little more burden on startup
> scripts, but it would also be more flexible (the ports wouldn't have to
> be patch ports, if something else is appropriate, for example).  It
> would also mean that ovn-controller itself would need less
> configuration, although that would presumably get shifted somewhere else
> so it's net zero.
> 
> What is your opinion?

I think that the more we can make OVN "just work", the more successful
it will be.  With that said, some amount of optional flexibility is
nice.  Specifically:

I think it makes sense for ovn-controller to create the integration
bridge if it does not already exist.  Create it if you want to (or have
some reason to need to), but otherwise ovn-controller should create it.
 That seems like a low hanging "just works" capability.

Regarding this patch, to be honest, the choice of "bridge mappings" is
just borrowed from the existing OVS support in OpenStack.  What's
implemented here matches how that works.  It expects the bridge to be
created already, but automatically creates patch ports to/from the
integration bridge.  I don't feel experienced enough with OVS to really
feel confident in suggesting the proper balance between "just works" and
"useful flexibility".

In case this helps the discussion, one other thing needed that's not yet
implemented in this series is VLAN support.  We also need the ability
for the Neutron adminisrator to optionally specify a VLAN ID.  How to do
this is probably obvious to you but I haven't tried to build it yet.  I
was imagining ovn-controller creating additional ports between the
integration bridge and the network access bridge, one for each VLAN used
on that network.

-- 
Russell Bryant
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [v3 3/3] ovsdb: Add per transaction commit instruction counter

2015-07-17 Thread Andy Zhou

Pushed. Thanks for the review and catching the test failure!

On Thu, Jul 16, 2015 at 11:58 AM, Ben Pfaff  wrote:
> Tests pass for me now with v3, so please push these when you are
> satisfied with them.  Thank you!
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] type-props: Avoid a MSVC warning.

2015-07-17 Thread Ben Pfaff

On Fri, Jul 17, 2015 at 09:26:23AM -0700, Gurucharan Shetty wrote:
> Currently, MSVC complains when you have a macro of the
> form TYPE_MAXIMUM(uint64_t) because a part of macro becomes
> ~(uint64_t)0 << 64 with a warning:
> 
> C4293: '<<' : shift count negative or too big, undefined behavior.
> 
> This commit makes changes to the macro to prevent that warning.
> 
> Suggested-by: Ben Pfaff 
> Signed-off-by: Gurucharan Shetty 

Acked-by: Ben Pfaff 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] acinclude: Silence OVS_FIND_FIELD_IFELSE.

2015-07-17 Thread Ben Pfaff

On Fri, Jul 17, 2015 at 11:23:31AM -0700, Joe Stringer wrote:
> Signed-off-by: Joe Stringer 
> ---
>  acinclude.m4 | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/acinclude.m4 b/acinclude.m4
> index 14907ab..4f1e66c 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -271,7 +271,7 @@ dnl translated to uppercase.
>  AC_DEFUN([OVS_FIND_FIELD_IFELSE], [
>AC_MSG_CHECKING([whether $2 has member $3 in $1])
>if test -f $1; then
> -awk '/$2.{/,/^}/' $1 2>/dev/null | grep '$3'
> +awk '/$2.{/,/^}/' $1 2>/dev/null | grep -q '$3'

The autoconf manual recommends avoiding -q, so can we redirect to
/dev/null instead?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [RFC v2 04/11] ovn: Add patch ports for ovn bridge mappings.

2015-07-17 Thread Ben Pfaff

On Fri, Jul 17, 2015 at 03:20:00PM -0400, Russell Bryant wrote:
> On 07/16/2015 08:07 PM, Ben Pfaff wrote:
> > On Thu, Jul 16, 2015 at 06:06:12PM -0400, Russell Bryant wrote:
> >> While parsing the OVN bridge mapping configuration, ensure that patch
> >> ports exist between the OVN integration bridge and the physical
> >> network bridge.  If they do not exist, create them automatically.
> >>
> >> Signed-off-by: Russell Bryant 
> > 
> > Now it compiles again.
> 
> :-)
> 
> I moved the misplaced hunk to this patch.
> 
> > This raises a philosophical issue.  Currently OVN requires something
> > else in the system, that runs before ovn-controller starts, to create
> > the integration bridge.  This seems reasonable enough, but
> > ovn-controller could do it itself.  Similarly, OVN could require
> > something else in the system to add the ports to the integration bridge
> > before it starts up.  That would put a little more burden on startup
> > scripts, but it would also be more flexible (the ports wouldn't have to
> > be patch ports, if something else is appropriate, for example).  It
> > would also mean that ovn-controller itself would need less
> > configuration, although that would presumably get shifted somewhere else
> > so it's net zero.
> > 
> > What is your opinion?
> 
> I think that the more we can make OVN "just work", the more successful
> it will be.

Absolutely.

> With that said, some amount of optional flexibility is
> nice.  Specifically:
> 
> I think it makes sense for ovn-controller to create the integration
> bridge if it does not already exist.  Create it if you want to (or have
> some reason to need to), but otherwise ovn-controller should create it.
>  That seems like a low hanging "just works" capability.

I agree.

> Regarding this patch, to be honest, the choice of "bridge mappings" is
> just borrowed from the existing OVS support in OpenStack.  What's
> implemented here matches how that works.  It expects the bridge to be
> created already, but automatically creates patch ports to/from the
> integration bridge.  I don't feel experienced enough with OVS to really
> feel confident in suggesting the proper balance between "just works" and
> "useful flexibility".

I didn't know there was precedent here.  How is the configuration
conveyed to that existing plugin?  I mean, where does it get the
configuration from (presumably it's not from the same external-ids key
but if it is then so much the better).

Looking at an installation manual, it looks like it's configured through
essentially an old-school Windows INI file with a .conf extension.  I
don't know whether that's better or worse than the OVS DB.

> In case this helps the discussion, one other thing needed that's not yet
> implemented in this series is VLAN support.  We also need the ability
> for the Neutron adminisrator to optionally specify a VLAN ID.  How to do
> this is probably obvious to you but I haven't tried to build it yet.  I
> was imagining ovn-controller creating additional ports between the
> integration bridge and the network access bridge, one for each VLAN used
> on that network.

I think you just need to set the "tag" column on the patch port that is
added to the physical bridge to the desired VLAN ID.  Yes, if there's
more than one VLAN then you'd want multiple pairs of patch ports.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/4] db-ctl-base: make cmd_show_table private

2015-07-17 Thread Andy Zhou

> We have an issue here.  The 'show->recurse' is used to prevent dependency
> loop in the users defined 'struct cmd_show_table' array.  So, we cannot copy
> the element here.
>

You are right.  I will restructure this patch.

Given that other patches in the series are acked. I will drop this
patch, and push the other three.   I will repost my next attempt of
making
cmd_show_table private in a separate patch.  Thanks for the review.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH net-next 15/22] route: Extend flow representation with tunnel key

2015-07-17 Thread Julian Anastasov


Hello,

On Fri, 17 Jul 2015, Thomas Graf wrote:

> Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to
> allow routes to match on tunnel metadata. For now, the tunnel id is
> added to flowi_tunnel which allows for routes to be bound to specific
> virtual tunnels.
> 
> Signed-off-by: Thomas Graf 
> ---

> +struct flowi_tunnel {
> + __be64  tun_id;
> +};
> +
>  struct flowi_common {
>   int flowic_oif;
>   int flowic_iif;
> @@ -30,6 +34,7 @@ struct flowi_common {
>  #define FLOWI_FLAG_ANYSRC0x01
>  #define FLOWI_FLAG_KNOWN_NH  0x02
>   __u32   flowic_secid;
> + struct flowi_tunnel flowic_tun_key;

New input key for fib rules? It should be initialized
at some places, we should not provide random values:

- flowi4_init_output

- fib_compute_spec_dst

- __fib_validate_source

- ip_route_input_slow: with 0 if no tun_info, like below

>  };
>  
>  union flowi_uli {
> @@ -66,6 +71,7 @@ struct flowi4 {
>  #define flowi4_proto __fl_common.flowic_proto
>  #define flowi4_flags __fl_common.flowic_flags
>  #define flowi4_secid __fl_common.flowic_secid
> +#define flowi4_tun_key   __fl_common.flowic_tun_key

> @@ -1690,6 +1693,9 @@ static int ip_route_input_slow(struct sk_buff *skb, 
> __be32 daddr, __be32 saddr,
>  by fib_lookup.
>*/
>  
> + tun_info = skb_tunnel_info(skb);
> + if (tun_info && tun_info->mode == IP_TUNNEL_INFO_RX)
> + fl4.flowi4_tun_key.tun_id = tun_info->key.tun_id;

else
fl4.flowi4_tun_key.tun_id = 0;

>   skb_dst_drop(skb);
>  
>   if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
> -- 
> 2.4.3

Regards

--
Julian Anastasov 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 1/2] Include datapath actions with sampled-packet upcall to userspace.

2015-07-17 Thread Pravin Shelar

On Thu, Jun 11, 2015 at 9:43 AM, Neil McKee  wrote:
> If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
> OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
> in the upcall.
>
> This Directly associates the sampled packet with the path it takes
> through the virtual switch. Path information currently includes mangling,
> encapsulation and decapsulation actions for tunneling protocols GRE,
> VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
> changes to accommodate datapath actions that may be added in the
> future.
>
> Adding path information enhances visibility into complex virtual
> networks.
>
> Signed-off-by: Neil McKee 

Pushed patch to master. Thanks for the patch.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [RFC v2 04/11] ovn: Add patch ports for ovn bridge mappings.

2015-07-17 Thread Russell Bryant

On 07/17/2015 04:04 PM, Ben Pfaff wrote:
> On Fri, Jul 17, 2015 at 03:20:00PM -0400, Russell Bryant wrote:
>> On 07/16/2015 08:07 PM, Ben Pfaff wrote:
>>> On Thu, Jul 16, 2015 at 06:06:12PM -0400, Russell Bryant wrote:
 While parsing the OVN bridge mapping configuration, ensure that patch
 ports exist between the OVN integration bridge and the physical
 network bridge.  If they do not exist, create them automatically.

 Signed-off-by: Russell Bryant 
>>>
>>> Now it compiles again.
>>
>> :-)
>>
>> I moved the misplaced hunk to this patch.
>>
>>> This raises a philosophical issue.  Currently OVN requires something
>>> else in the system, that runs before ovn-controller starts, to create
>>> the integration bridge.  This seems reasonable enough, but
>>> ovn-controller could do it itself.  Similarly, OVN could require
>>> something else in the system to add the ports to the integration bridge
>>> before it starts up.  That would put a little more burden on startup
>>> scripts, but it would also be more flexible (the ports wouldn't have to
>>> be patch ports, if something else is appropriate, for example).  It
>>> would also mean that ovn-controller itself would need less
>>> configuration, although that would presumably get shifted somewhere else
>>> so it's net zero.
>>>
>>> What is your opinion?
>>
>> I think that the more we can make OVN "just work", the more successful
>> it will be.
> 
> Absolutely.
> 
>> With that said, some amount of optional flexibility is
>> nice.  Specifically:
>>
>> I think it makes sense for ovn-controller to create the integration
>> bridge if it does not already exist.  Create it if you want to (or have
>> some reason to need to), but otherwise ovn-controller should create it.
>>  That seems like a low hanging "just works" capability.
> 
> I agree.

Yay.  Consider that on my todo list then.

>> Regarding this patch, to be honest, the choice of "bridge mappings" is
>> just borrowed from the existing OVS support in OpenStack.  What's
>> implemented here matches how that works.  It expects the bridge to be
>> created already, but automatically creates patch ports to/from the
>> integration bridge.  I don't feel experienced enough with OVS to really
>> feel confident in suggesting the proper balance between "just works" and
>> "useful flexibility".
> 
> I didn't know there was precedent here.  How is the configuration
> conveyed to that existing plugin?  I mean, where does it get the
> configuration from (presumably it's not from the same external-ids key
> but if it is then so much the better).
> 
> Looking at an installation manual, it looks like it's configured through
> essentially an old-school Windows INI file with a .conf extension.  I
> don't know whether that's better or worse than the OVS DB.

Yes, it's in an ini conf file.  Similar style conf files are used for
all OpenStack services.

I actually think a conf file would be easier than ovsdb for
ovn-controller config.  That seems easier to manage from config
management tools (puppet, chef, ...).

>> In case this helps the discussion, one other thing needed that's not yet
>> implemented in this series is VLAN support.  We also need the ability
>> for the Neutron adminisrator to optionally specify a VLAN ID.  How to do
>> this is probably obvious to you but I haven't tried to build it yet.  I
>> was imagining ovn-controller creating additional ports between the
>> integration bridge and the network access bridge, one for each VLAN used
>> on that network.
> 
> I think you just need to set the "tag" column on the patch port that is
> added to the physical bridge to the desired VLAN ID.  Yes, if there's
> more than one VLAN then you'd want multiple pairs of patch ports.

Great, that's what I thought.  Thanks for confirming!

-- 
Russell Bryant
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] kernel module testing

2015-07-17 Thread Jesse Gross

On Thu, Jul 16, 2015 at 12:04 PM, Kyle Mestery  wrote:
> On Thu, Jul 16, 2015 at 12:55 PM, Ben Pfaff  wrote:
>
>> On Tue, Jul 14, 2015 at 07:45:42AM +, Pritesh Kothari (pritkoth) wrote:
>> >
>> > > On Jul 13, 2015, at 9:40 PM, Ben Pfaff  wrote:
>> > >
>> > > On Tue, Jul 14, 2015 at 12:34:14AM +, Pritesh Kothari (pritkoth)
>> wrote:
>> > >> How about automating this using travis and gerrit, so no commit gets
>> in
>> > >> unless it passes sanity tests? This also simplifies review process as
>> well.
>> > >
>> > > Travis doesn't test the kernel module, and as far as I know it can’t.
>> >
>> > weird, i saw one patch few days ago doing it [1], anyways I may be
>> mistaken.
>>
>> It's fantastic if it does, but I don't think that it does:
>> https://github.com/travis-ci/travis-ci/issues/2291
>>
>> The link that you reference appears to be loading a kernel module inside
>> a User-Mode Linux instance that it runs in travis.  That's an approach I
>> hadn't considered; maybe it would work.
>>
>> > > I am the wrong person to evangelize Gerrit to:
>> > >http://benpfaff.org/writings/gerrit.html
>> >
>> > This seems to be all about web interface, any chance you happen to use
>> the
>> > cli for gerrit mainly git review [2] -d [3] or -m [3] or gerritmander
>> [4]? both of
>> > them are really good utilities and you never have to really leave your
>> > command line tools to use them.
>>
>> Thanks for the information.  Whenever I've brought the issues on this
>> page previously with people who use Gerrit, they've shrugged and said
>> "Yeah, the UI and email sucks" but no one has ever actually pointed out
>> specific ways to work around them with the CLI.  The CLIs aren't exactly
>> promoted: https://www.gerritcodereview.com/ defines Gerrit by saying
>> "Gerrit provides web based code review and repository management".  Now,
>> if I have to deal with it, I'll know to go to the CLIs first.
>>
>
> You're right that gerrit's CLI sucks rocks. This is the precise reason why
> the OpenStack infra folks created gertty [1] which is a GREAT CLI interface
> for gerrit. I'd encourage you to give it a try. I've found between this and
> the customer gerrit dashboard creator [2] (also done by OpenStack infra
> folks), gerrit is incredibly useable and I enjoy working with it.

For what it's worth, I also think that something like Gerrit would be
useful given the number of platforms that OVS is running on. Right
now, it's seems like we're doing the human-powered version, which is
Guru, Daniele, or Ben complain when something breaks Windows, DPDK,
32-bit respectively. It also effectively provides the features of
Patchwork in a way that is more maintainable.

I agree that the Gerrit UI sucks (I haven't tried the OpenStack
interface) and maybe there are alternatives, like Github's set of
tools. But I think the status quo that we have isn't all that great
either and I also would like to avoid having a collection of
independent tools that fall apart over time.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/2] Extend sFlow agent to report tunnel and MPLS structures

2015-07-17 Thread Ben Pfaff

On Thu, Jun 11, 2015 at 09:43:59AM -0700, Neil McKee wrote:
> Packets are still sampled at ingress only, so the egress
> tunnel and/or MPLS structures are only included when there is just 1 output
> port.  The actions are either provided by the datapath in the sample upcall
> or looked up in the userspace cache.  The former is preferred because it is
> more reliable and does not present any new demands or constraints on the
> userspace cache, however the code falls back on the userspace lookup so that
> this solution can work with existing kernel datapath modules. If the lookup
> fails it is not critical: the compiled user-action-cookie is still available
> and provides the essential output port and output VLAN forwarding information
> just as before.
> 
> The openvswitch actions can express almost any tunneling/mangling so the only
> totally faithful representation would be to somehow encode the whole list of
> flow actions in the sFlow output.  However the standard sFlow tunnel 
> structures
> can express most common real-world scenarios, so in parsing the actions we
> look for those and skip the encoding if we see anything unusual. For example,
> a single set(tunnel()) or tnl_push() is interpreted,  but if a second such
> action is encountered then the egress tunnel reporting is suppressed.
> 
> The sFlow standard allows "best effort" encoding so that if a field is not
> knowable or too onerous to look up then it can be left out. This is often
> the case for the layer-4 source port or even the src ip address of a tunnel.
> The assumption is that monitoring is enabled everywhere so a missing field
> can typically be seen at ingress to the next switch in the path.
> 
> This patch also adds unit tests to check the sFlow encoding of set(tunnel()),
> tnl_push() and push_mpls() actions.
> 
> The netlink attribute to request that actions be included in the upcall
> from the datapath is inserted for sFlow sampling only.  To make that option
> be explicit would require further changes to the printing and parsing of
> actions in lib/odp-util.c, and to scripts in the test suite.
> 
> Further enhancements to report on 802.1AD QinQ, 64-bit tunnel IDs, and NAT
> transformations can follow in future patches that make only incremental
> changes.
> 
> Signed-off-by: Neil McKee 

Neil, would you mind posting a new version of this patch rebased against
current master?  I'll review it next week.

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 1/3] tests: Check for core files before exiting.

2015-07-17 Thread Jarno Rajahalme


> On Jul 17, 2015, at 9:33 AM, Ben Pfaff  wrote:
> 
> On Thu, Jul 16, 2015 at 05:22:25PM -0700, Jarno Rajahalme wrote:
>> 
>>> On Jul 16, 2015, at 4:33 PM, Ben Pfaff  wrote:
>>> 
>>> On Thu, Jul 16, 2015 at 03:15:52PM -0700, Jarno Rajahalme wrote:
 I've seen core files appear and then be automatically removed as the
 test case was successful.  Such success is highly doubtful, so fail
 the test cases if any core files exist at the end of the test.
 
 Signed-off-by: Jarno Rajahalme 
>>> 
>>> I proposed a similar patch in May 2014:
>>>   http://openvswitch.org/pipermail/dev/2014-May/040497.html
>>> but you didn't like it:
>>>   http://openvswitch.org/pipermail/dev/2014-May/040857.html
>> 
>> My comment at the time was that I did not see the result of the line
>> 
>> echo "$core: core dumped during test"
>> anywhere, but now I see that this is simply due to the fact that the test 
>> case failed on an earlier AT_CHECK and never got to checking the cores.
> 
> I made the same argument before too:
>http://openvswitch.org/pipermail/dev/2014-June/041301.html 
> 
> ;-)
> 
>> So I see that this patch has the same limitation. Do you have any idea
>> how to check and report for core files regardless of the success or
>> failure of the test case? I think this would be important as I’ve seen
>> cores in both cases. In success case we currently lose the fact that
>> there even was a core dump, and this likely happens also in the
>> failure case if we blindly run a —recheck and by chance succeed that
>> time.
>> 
>> Right now I habitually run “find . -name core -print” from a shell
>> after each “make check” that has any failures before a —recheck. I’d
>> like to automate this somehow! And this doesn’t even catch the cores
>> of successful test cases. The only reason I know they exist was due to
>> running the find command multiple times while “make check” was
>> running, and I saw some core files that had disappeared in later find
>> runs.
> 
> Well, something like this would do it:
> 
> diff --git a/tests/atlocal.in b/tests/atlocal.in
> index 5946a3c..5baa9ec 100644
> --- a/tests/atlocal.in
> +++ b/tests/atlocal.in
> @@ -110,3 +110,11 @@ fi
> if test "$IS_WIN32" = "yes"; then
> HAVE_PYTHON="no"
> fi
> +
> +trap '
> +if find "$at_suite_dir" -name core\* -print | grep .; then
> +echo
> +echo "WARNING: See above for list of core dumps produced by tests."
> +echo
> +fi
> +’ 0

I tested this by adding the new lines to atlocal.in, and making miniflow tests 
artificially “core”:

diff --git a/tests/classifier.at b/tests/classifier.at
index 3520acd..68156ad 100644
--- a/tests/classifier.at
+++ b/tests/classifier.at
@@ -25,6 +25,7 @@ m4_foreach(
[minimask_combine]],
   [AT_SETUP([miniflow - m4_bpatsubst(testname, [-], [ ])])
AT_CHECK([ovstest test-classifier testname], [0], [], [])
+   touch core.foo.bar
AT_CLEANUP])])
 
 AT_BANNER([flow classifier lookup segmentation])

After this “make -k -j6 "TESTSUITEFLAGS=-k miniflow" check” still succeeds:

miniflow unit tests

 90: miniflow - miniflow ok
 91: miniflow - minimask_has_extra   ok
 92: miniflow - minimask_combine ok

I don’t know how the trap is supposed to work, so maybe I am missing something?

  Jarno


___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/2] Extend sFlow agent to report tunnel and MPLS structures

2015-07-17 Thread Neil McKee

Sure,  I'll try to get to that tonight.

Neil


--
Neil McKee
InMon Corp.
http://www.inmon.com

On Fri, Jul 17, 2015 at 4:31 PM, Ben Pfaff  wrote:

> On Thu, Jun 11, 2015 at 09:43:59AM -0700, Neil McKee wrote:
> > Packets are still sampled at ingress only, so the egress
> > tunnel and/or MPLS structures are only included when there is just 1
> output
> > port.  The actions are either provided by the datapath in the sample
> upcall
> > or looked up in the userspace cache.  The former is preferred because it
> is
> > more reliable and does not present any new demands or constraints on the
> > userspace cache, however the code falls back on the userspace lookup so
> that
> > this solution can work with existing kernel datapath modules. If the
> lookup
> > fails it is not critical: the compiled user-action-cookie is still
> available
> > and provides the essential output port and output VLAN forwarding
> information
> > just as before.
> >
> > The openvswitch actions can express almost any tunneling/mangling so the
> only
> > totally faithful representation would be to somehow encode the whole
> list of
> > flow actions in the sFlow output.  However the standard sFlow tunnel
> structures
> > can express most common real-world scenarios, so in parsing the actions
> we
> > look for those and skip the encoding if we see anything unusual. For
> example,
> > a single set(tunnel()) or tnl_push() is interpreted,  but if a second
> such
> > action is encountered then the egress tunnel reporting is suppressed.
> >
> > The sFlow standard allows "best effort" encoding so that if a field is
> not
> > knowable or too onerous to look up then it can be left out. This is often
> > the case for the layer-4 source port or even the src ip address of a
> tunnel.
> > The assumption is that monitoring is enabled everywhere so a missing
> field
> > can typically be seen at ingress to the next switch in the path.
> >
> > This patch also adds unit tests to check the sFlow encoding of
> set(tunnel()),
> > tnl_push() and push_mpls() actions.
> >
> > The netlink attribute to request that actions be included in the upcall
> > from the datapath is inserted for sFlow sampling only.  To make that
> option
> > be explicit would require further changes to the printing and parsing of
> > actions in lib/odp-util.c, and to scripts in the test suite.
> >
> > Further enhancements to report on 802.1AD QinQ, 64-bit tunnel IDs, and
> NAT
> > transformations can follow in future patches that make only incremental
> > changes.
> >
> > Signed-off-by: Neil McKee 
>
> Neil, would you mind posting a new version of this patch rebased against
> current master?  I'll review it next week.
>
> Thanks,
>
> Ben.
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 1/3] tests: Check for core files before exiting.

2015-07-17 Thread Ben Pfaff

OK, try this then:

diff --git a/tests/daemon-py.at b/tests/daemon-py.at
index cafa8df..b0f1236 100644
--- a/tests/daemon-py.at
+++ b/tests/daemon-py.at
@@ -59,7 +59,7 @@ AT_CHECK(
 AT_CHECK([kill `cat pid`], [0], [], [ignore], [kill `cat parent`])
 OVS_WAIT_WHILE([kill -0 `cat parent` || kill -0 `cat newpid` || test -e pid],
   [kill `cat parent`])
-AT_CLEANUP
+AT_CLEANUP_IGNORE_CORES
 
 AT_SETUP([daemon --monitor restart exit code - Python])
 AT_SKIP_IF([test $HAVE_PYTHON = no])
diff --git a/tests/daemon.at b/tests/daemon.at
index 51d56c5..6dac76a 100644
--- a/tests/daemon.at
+++ b/tests/daemon.at
@@ -66,7 +66,7 @@ AT_CHECK(
 AT_CHECK([kill `cat pid`], [0], [], [ignore], [kill `cat parent`])
 OVS_WAIT_WHILE([kill -0 `cat parent` || kill -0 `cat newpid` || test -e pid],
   [kill `cat parent`])
-AT_CLEANUP
+AT_CLEANUP_IGNORE_CORES
 
 AT_SETUP([daemon --detach])
 AT_CAPTURE_FILE([pid])
@@ -89,7 +89,7 @@ else
 fi
 OVS_WAIT_WHILE([kill -0 `cat saved-pid`])
 AT_CHECK([test ! -e pid])
-AT_CLEANUP
+AT_CLEANUP_IGNORE_CORES
 
 AT_SETUP([daemon --detach --monitor])
 AT_SKIP_IF([test "$IS_WIN32" = "yes"])
@@ -139,7 +139,7 @@ OVS_WAIT_WHILE(
   [kill -0 `cat monitor` || kill -0 `cat newdaemon` || test -e daemon],
   [kill `cat monitor newdaemon`])
 m4_undefine([CHECK])
-AT_CLEANUP
+AT_CLEANUP_IGNORE_CORES
 
 AT_SETUP([daemon --detach startup errors])
 AT_CAPTURE_FILE([pid])
diff --git a/tests/library.at b/tests/library.at
index 9bd6d81..e5e02c5 100644
--- a/tests/library.at
+++ b/tests/library.at
@@ -195,7 +195,7 @@ AT_CHECK([sed 's/.*: //
   [assertion false failed in test_assert()
 ])
 
-AT_CLEANUP
+AT_CLEANUP_IGNORE_CORES
 
 AT_SETUP([snprintf])
 AT_CHECK([ovstest test-util snprintf])
diff --git a/tests/ovsdb-server.at b/tests/ovsdb-server.at
index 8fce70e..b147e06 100644
--- a/tests/ovsdb-server.at
+++ b/tests/ovsdb-server.at
@@ -284,7 +284,7 @@ AT_CHECK([ovs-appctl -t ovsdb-server ovsdb-server/list-dbs],
   [0], [constraints
 ordinals
 ])
-AT_CLEANUP
+AT_CLEANUP_IGNORE_CORES
 
 AT_SETUP([ovsdb-server/add-db and remove-db with --monitor])
 AT_KEYWORDS([ovsdb server positive])
@@ -315,7 +315,7 @@ OVS_WAIT_UNTIL(
 AT_CHECK([ovs-appctl -t ovsdb-server ovsdb-server/list-dbs],
   [0], [ordinals
 ])
-AT_CLEANUP
+AT_CLEANUP_IGNORE_CORES
 
 AT_SETUP([--remote=db: implementation])
 AT_KEYWORDS([ovsdb server positive])
@@ -465,7 +465,7 @@ OVS_WAIT_WHILE([kill -0 `cat old.pid`])
 OVS_WAIT_UNTIL(
   [test -s ovsdb-server.pid && test `cat ovsdb-server.pid` != `cat old.pid`])
 OVS_WAIT_UNTIL([test -S socket1])
-AT_CLEANUP
+AT_CLEANUP_IGNORE_CORES
 
 AT_SETUP([ovsdb-server/add-remote and remove-remote with --monitor])
 AT_KEYWORDS([ovsdb server positive])
@@ -500,7 +500,7 @@ OVS_WAIT_WHILE([kill -0 `cat old.pid`])
 OVS_WAIT_UNTIL(
   [test -s ovsdb-server.pid && test `cat ovsdb-server.pid` != `cat old.pid`])
 AT_CHECK([test ! -e socket1])
-AT_CLEANUP
+AT_CLEANUP_IGNORE_CORES
 
 AT_SETUP([SSL db: implementation])
 AT_KEYWORDS([ovsdb server positive ssl $5])
diff --git a/tests/testsuite.at b/tests/testsuite.at
index 92b788b..ec847c7 100644
--- a/tests/testsuite.at
+++ b/tests/testsuite.at
@@ -14,6 +14,16 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express 
or implied.
 See the License for the specific language governing permissions and
 limitations under the License.])
 
+dnl This modifies Autotest so that AT_CLEANUP checks for core dumps
+dnl and fails the test if any of them are present.  At the same time,
+dnl some of our tests intentionally kill processes with signals that
+dnl can cause core dumps, so this introduces AT_CLEANUP_IGNORE_CORES
+dnl to ignore core dumps in that case.
+m4_rename([AT_CLEANUP], [AT_CLEANUP_IGNORE_CORES])
+m4_define([AT_CLEANUP], [dnl
+AT_CHECK([find . -name "core*" -print])
+AT_CLEANUP_IGNORE_CORES])
+
 m4_include([tests/ovs-macros.at])
 m4_include([tests/ovsdb-macros.at])
 m4_include([tests/ofproto-macros.at])
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 1/2] db-ctl-base: remove the recurse member from struct cmd_show_table

2015-07-17 Thread Andy Zhou

The 'recurse' is used during run time to suppress duplicated prints.
It is not essential to describe how show command should work.

This patch remove the 'recurse' member. Duplicated prints is now
suppressed by maintaining an 'sset' of tables that have been printed
at run time.

Signed-off-by: Andy Zhou 

 Please enter the commit message for your changes. Lines starting
---
 lib/db-ctl-base.c | 20 ++--
 lib/db-ctl-base.h |  4 
 utilities/ovs-vsctl.c | 26 +-
 vtep/vtep-ctl.c   | 22 +++---
 4 files changed, 38 insertions(+), 34 deletions(-)

diff --git a/lib/db-ctl-base.c b/lib/db-ctl-base.c
index e3c0373..3e2d551 100644
--- a/lib/db-ctl-base.c
+++ b/lib/db-ctl-base.c
@@ -34,6 +34,7 @@
 #include "ovsdb-idl.h"
 #include "ovsdb-idl-provider.h"
 #include "shash.h"
+#include "sset.h"
 #include "string.h"
 #include "table.h"
 #include "util.h"
@@ -1648,9 +1649,12 @@ cmd_show_find_table_by_name(const char *name)
 return NULL;
 }
 
+/* 'shown' records the tables that has been displayed by the current
+ * command to avoid duplicated prints.
+ */
 static void
 cmd_show_row(struct ctl_context *ctx, const struct ovsdb_idl_row *row,
- int level)
+ int level, struct sset *shown)
 {
 struct cmd_show_table *show = cmd_show_find_table_by_row(row);
 size_t i;
@@ -1667,11 +1671,11 @@ cmd_show_row(struct ctl_context *ctx, const struct 
ovsdb_idl_row *row,
 }
 ds_put_char(&ctx->output, '\n');
 
-if (!show || show->recurse) {
+if (!show || sset_find(shown, show->table->name)) {
 return;
 }
 
-show->recurse = true;
+sset_add(shown, show->table->name);
 for (i = 0; i < ARRAY_SIZE(show->columns); i++) {
 const struct ovsdb_idl_column *column = show->columns[i];
 const struct ovsdb_datum *datum;
@@ -1696,7 +1700,7 @@ cmd_show_row(struct ctl_context *ctx, const struct 
ovsdb_idl_row *row,
  ref_show->table,
  &datum->keys[j].uuid);
 if (ref_row) {
-cmd_show_row(ctx, ref_row, level + 1);
+cmd_show_row(ctx, ref_row, level + 1, shown);
 }
 }
 continue;
@@ -1749,18 +1753,22 @@ cmd_show_row(struct ctl_context *ctx, const struct 
ovsdb_idl_row *row,
 ds_put_char(&ctx->output, '\n');
 }
 }
-show->recurse = false;
+sset_find_and_delete_assert(shown, show->table->name);
 }
 
 static void
 cmd_show(struct ctl_context *ctx)
 {
 const struct ovsdb_idl_row *row;
+struct sset shown = SSET_INITIALIZER(&shown);
 
 for (row = ovsdb_idl_first_row(ctx->idl, cmd_show_tables[0].table);
  row; row = ovsdb_idl_next_row(row)) {
-cmd_show_row(ctx, row, 0);
+cmd_show_row(ctx, row, 0, &shown);
 }
+
+ovs_assert(sset_is_empty(&shown));
+sset_destroy(&shown);
 }
 
 
diff --git a/lib/db-ctl-base.h b/lib/db-ctl-base.h
index 9220ece..aff242b 100644
--- a/lib/db-ctl-base.h
+++ b/lib/db-ctl-base.h
@@ -157,15 +157,11 @@ struct ctl_command *ctl_parse_commands(int argc, char 
*argv[],
  *
  * - 'columns[]' allows user to specify the print of additional columns
  *   in 'table'.
- *
- * - 'recurse' is used to avoid duplicate print.
- *
  * */
 struct cmd_show_table {
 const struct ovsdb_idl_table_class *table;
 const struct ovsdb_idl_column *name_column;
 const struct ovsdb_idl_column *columns[3]; /* Seems like a good number. */
-bool recurse;
 };
 
 /* This array defines the 'show' command output format.  User can check the
diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c
index 4fb88b1..ce05c47 100644
--- a/utilities/ovs-vsctl.c
+++ b/utilities/ovs-vsctl.c
@@ -980,45 +980,45 @@ struct cmd_show_table cmd_show_tables[] = {
  NULL,
  {&ovsrec_open_vswitch_col_manager_options,
   &ovsrec_open_vswitch_col_bridges,
-  &ovsrec_open_vswitch_col_ovs_version},
- false},
+  &ovsrec_open_vswitch_col_ovs_version}
+},
 
 {&ovsrec_table_bridge,
  &ovsrec_bridge_col_name,
  {&ovsrec_bridge_col_controller,
   &ovsrec_bridge_col_fail_mode,
-  &ovsrec_bridge_col_ports},
- false},
+  &ovsrec_bridge_col_ports}
+},
 
 {&ovsrec_table_port,
  &ovsrec_port_col_name,
  {&ovsrec_port_col_tag,
   &ovsrec_port_col_trunks,
-  &ovsrec_port_col_interfaces},
- false},
+  &ovsrec_port_col_interfaces}
+},
 
 {&ovsrec_table_interface,
  &ovsrec_interface_col_name,
  {&ovsrec_interface_col_type,
   &ovsrec_interface_col_options,
-  &ovsrec_interface_col_error},
- false},
+  &ovsrec_interface_col_error}
+},
 
 {&ovsrec_table_controller,
  &ovsrec_controller_col_target,
  {&ovsrec_controller_col_is_connected,
   NULL,
-  NULL},
- false},
+  NULL}
+},
 
 {&ovs

[ovs-dev] [PATCH 2/2] db-ctl-base: make cmd_show_table private

2015-07-17 Thread Andy Zhou

Instead of requiring user to declare a global variable, pass the value
via ctl_init().

Signed-off-by: Andy Zhou 
---
 lib/db-ctl-base.c | 20 +++-
 lib/db-ctl-base.h | 17 +++--
 utilities/ovs-vsctl.c |  4 ++--
 vtep/vtep-ctl.c   |  4 ++--
 4 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/lib/db-ctl-base.c b/lib/db-ctl-base.c
index 3e2d551..dd3786f 100644
--- a/lib/db-ctl-base.c
+++ b/lib/db-ctl-base.c
@@ -52,7 +52,7 @@ VLOG_DEFINE_THIS_MODULE(db_ctl_base);
  * when ctl_init() is called.
  *
  * */
-extern struct cmd_show_table cmd_show_tables[];
+const struct cmd_show_table *cmd_show_tables;
 
 /* ctl_exit() is called by ctl_fatal(). User can optionally supply an exit
  * function ctl_exit_func() via ctl_init. If supplied, this function will
@@ -1605,7 +1605,7 @@ parse_command(int argc, char *argv[], struct shash 
*local_options,
 static void
 pre_cmd_show(struct ctl_context *ctx)
 {
-struct cmd_show_table *show;
+const struct cmd_show_table *show;
 
 for (show = cmd_show_tables; show->table; show++) {
 size_t i;
@@ -1623,10 +1623,10 @@ pre_cmd_show(struct ctl_context *ctx)
 }
 }
 
-static struct cmd_show_table *
+static const struct cmd_show_table *
 cmd_show_find_table_by_row(const struct ovsdb_idl_row *row)
 {
-struct cmd_show_table *show;
+const struct cmd_show_table *show;
 
 for (show = cmd_show_tables; show->table; show++) {
 if (show->table == row->table->class) {
@@ -1636,10 +1636,10 @@ cmd_show_find_table_by_row(const struct ovsdb_idl_row 
*row)
 return NULL;
 }
 
-static struct cmd_show_table *
+static const struct cmd_show_table *
 cmd_show_find_table_by_name(const char *name)
 {
-struct cmd_show_table *show;
+const struct cmd_show_table *show;
 
 for (show = cmd_show_tables; show->table; show++) {
 if (!strcmp(show->table->name, name)) {
@@ -1656,7 +1656,7 @@ static void
 cmd_show_row(struct ctl_context *ctx, const struct ovsdb_idl_row *row,
  int level, struct sset *shown)
 {
-struct cmd_show_table *show = cmd_show_find_table_by_row(row);
+const struct cmd_show_table *show = cmd_show_find_table_by_row(row);
 size_t i;
 
 ds_put_char_multiple(&ctx->output, ' ', level * 4);
@@ -1687,7 +1687,7 @@ cmd_show_row(struct ctl_context *ctx, const struct 
ovsdb_idl_row *row,
 datum = ovsdb_idl_read(row, column);
 if (column->type.key.type == OVSDB_TYPE_UUID &&
 column->type.key.u.uuid.refTableName) {
-struct cmd_show_table *ref_show;
+const struct cmd_show_table *ref_show;
 size_t j;
 
 ref_show = cmd_show_find_table_by_name(
@@ -1708,7 +1708,7 @@ cmd_show_row(struct ctl_context *ctx, const struct 
ovsdb_idl_row *row,
 } else if (ovsdb_type_is_map(&column->type) &&
column->type.value.type == OVSDB_TYPE_UUID &&
column->type.value.u.uuid.refTableName) {
-struct cmd_show_table *ref_show;
+const struct cmd_show_table *ref_show;
 size_t j;
 
 /* Prints the key to ref'ed table name map if the ref'ed table
@@ -2013,9 +2013,11 @@ ctl_register_commands(const struct ctl_command_syntax 
*commands)
 /* Registers the 'db_ctl_commands' to 'all_commands'. */
 void
 ctl_init(const struct ctl_table_class tables_[],
+ const struct cmd_show_table cmd_show_tables_[],
  void (*ctl_exit_func_)(int status))
 {
 tables = tables_;
+cmd_show_tables = cmd_show_tables_;
 ctl_exit_func = ctl_exit_func_;
 ctl_register_commands(db_ctl_commands);
 }
diff --git a/lib/db-ctl-base.h b/lib/db-ctl-base.h
index aff242b..00e86f8 100644
--- a/lib/db-ctl-base.h
+++ b/lib/db-ctl-base.h
@@ -33,8 +33,6 @@ struct table;
  * (structs, commands and functions).  To utilize this module, user must
  * define the following:
  *
- * - the 'cmd_show_tables'.  (See 'struct cmd_show_table' for more info).
- *
  * - the command syntaxes for each command.  (See 'struct ctl_command_syntax'
  *   for more info)  and regiters them using ctl_register_commands().
  *
@@ -47,8 +45,10 @@ struct table;
 #define ovs_fatal please_use_ctl_fatal_instead_of_ovs_fatal
 
 struct ctl_table_class;
+struct cmd_show_table;
 void ctl_init(const struct ctl_table_class *tables,
- void (*ctl_exit_func)(int status));
+  const struct cmd_show_table *cmd_show_tables,
+  void (*ctl_exit_func)(int status));
 char *ctl_default_db(void);
 OVS_NO_RETURN void ctl_fatal(const char *, ...) OVS_PRINTF_FORMAT(1, 2);
 
@@ -164,17 +164,6 @@ struct cmd_show_table {
 const struct ovsdb_idl_column *columns[3]; /* Seems like a good number. */
 };
 
-/* This array defines the 'show' command output format.  User can check the
- * definition in utilities/ovs-vsctl.c as reference.
- *
- * Particularly, if an element in 'columns[]' represents a reference to
- * another table, the referred table must also be defined a

Re: [ovs-dev] [PATCH v3 1/3] tests: Check for core files before exiting.

2015-07-17 Thread Jarno Rajahalme

Ben,

This works, the core file names can be found from the testsuite.log!

It would be super nice to have the cores reported on the “Test Results” section 
of the make check output, though:

## - ##
## Test results. ##
## - ##

ERROR: All 7 tests were run,
3 failed unexpectedly.

If the presence of core files was mentioned here, I would know not to blindly 
issues a recheck. Or better yet, maybe “recheck” should automatically fail if 
any of the failed tests produced a core file?

  Jarno

> On Jul 17, 2015, at 4:52 PM, Ben Pfaff  wrote:
> 
> OK, try this then:
> 
> diff --git a/tests/daemon-py.at b/tests/daemon-py.at
> index cafa8df..b0f1236 100644
> --- a/tests/daemon-py.at
> +++ b/tests/daemon-py.at
> @@ -59,7 +59,7 @@ AT_CHECK(
> AT_CHECK([kill `cat pid`], [0], [], [ignore], [kill `cat parent`])
> OVS_WAIT_WHILE([kill -0 `cat parent` || kill -0 `cat newpid` || test -e pid],
>   [kill `cat parent`])
> -AT_CLEANUP
> +AT_CLEANUP_IGNORE_CORES
> 
> AT_SETUP([daemon --monitor restart exit code - Python])
> AT_SKIP_IF([test $HAVE_PYTHON = no])
> diff --git a/tests/daemon.at b/tests/daemon.at
> index 51d56c5..6dac76a 100644
> --- a/tests/daemon.at
> +++ b/tests/daemon.at
> @@ -66,7 +66,7 @@ AT_CHECK(
> AT_CHECK([kill `cat pid`], [0], [], [ignore], [kill `cat parent`])
> OVS_WAIT_WHILE([kill -0 `cat parent` || kill -0 `cat newpid` || test -e pid],
>   [kill `cat parent`])
> -AT_CLEANUP
> +AT_CLEANUP_IGNORE_CORES
> 
> AT_SETUP([daemon --detach])
> AT_CAPTURE_FILE([pid])
> @@ -89,7 +89,7 @@ else
> fi
> OVS_WAIT_WHILE([kill -0 `cat saved-pid`])
> AT_CHECK([test ! -e pid])
> -AT_CLEANUP
> +AT_CLEANUP_IGNORE_CORES
> 
> AT_SETUP([daemon --detach --monitor])
> AT_SKIP_IF([test "$IS_WIN32" = "yes"])
> @@ -139,7 +139,7 @@ OVS_WAIT_WHILE(
>   [kill -0 `cat monitor` || kill -0 `cat newdaemon` || test -e daemon],
>   [kill `cat monitor newdaemon`])
> m4_undefine([CHECK])
> -AT_CLEANUP
> +AT_CLEANUP_IGNORE_CORES
> 
> AT_SETUP([daemon --detach startup errors])
> AT_CAPTURE_FILE([pid])
> diff --git a/tests/library.at b/tests/library.at
> index 9bd6d81..e5e02c5 100644
> --- a/tests/library.at
> +++ b/tests/library.at
> @@ -195,7 +195,7 @@ AT_CHECK([sed 's/.*: //
>   [assertion false failed in test_assert()
> ])
> 
> -AT_CLEANUP
> +AT_CLEANUP_IGNORE_CORES
> 
> AT_SETUP([snprintf])
> AT_CHECK([ovstest test-util snprintf])
> diff --git a/tests/ovsdb-server.at b/tests/ovsdb-server.at
> index 8fce70e..b147e06 100644
> --- a/tests/ovsdb-server.at
> +++ b/tests/ovsdb-server.at
> @@ -284,7 +284,7 @@ AT_CHECK([ovs-appctl -t ovsdb-server 
> ovsdb-server/list-dbs],
>   [0], [constraints
> ordinals
> ])
> -AT_CLEANUP
> +AT_CLEANUP_IGNORE_CORES
> 
> AT_SETUP([ovsdb-server/add-db and remove-db with --monitor])
> AT_KEYWORDS([ovsdb server positive])
> @@ -315,7 +315,7 @@ OVS_WAIT_UNTIL(
> AT_CHECK([ovs-appctl -t ovsdb-server ovsdb-server/list-dbs],
>   [0], [ordinals
> ])
> -AT_CLEANUP
> +AT_CLEANUP_IGNORE_CORES
> 
> AT_SETUP([--remote=db: implementation])
> AT_KEYWORDS([ovsdb server positive])
> @@ -465,7 +465,7 @@ OVS_WAIT_WHILE([kill -0 `cat old.pid`])
> OVS_WAIT_UNTIL(
>   [test -s ovsdb-server.pid && test `cat ovsdb-server.pid` != `cat old.pid`])
> OVS_WAIT_UNTIL([test -S socket1])
> -AT_CLEANUP
> +AT_CLEANUP_IGNORE_CORES
> 
> AT_SETUP([ovsdb-server/add-remote and remove-remote with --monitor])
> AT_KEYWORDS([ovsdb server positive])
> @@ -500,7 +500,7 @@ OVS_WAIT_WHILE([kill -0 `cat old.pid`])
> OVS_WAIT_UNTIL(
>   [test -s ovsdb-server.pid && test `cat ovsdb-server.pid` != `cat old.pid`])
> AT_CHECK([test ! -e socket1])
> -AT_CLEANUP
> +AT_CLEANUP_IGNORE_CORES
> 
> AT_SETUP([SSL db: implementation])
> AT_KEYWORDS([ovsdb server positive ssl $5])
> diff --git a/tests/testsuite.at b/tests/testsuite.at
> index 92b788b..ec847c7 100644
> --- a/tests/testsuite.at
> +++ b/tests/testsuite.at
> @@ -14,6 +14,16 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either 
> express or implied.
> See the License for the specific language governing permissions and
> limitations under the License.])
> 
> +dnl This modifies Autotest so that AT_CLEANUP checks for core dumps
> +dnl and fails the test if any of them are present.  At the same time,
> +dnl some of our tests intentionally kill processes with signals that
> +dnl can cause core dumps, so this introduces AT_CLEANUP_IGNORE_CORES
> +dnl to ignore core dumps in that case.
> +m4_rename([AT_CLEANUP], [AT_CLEANUP_IGNORE_CORES])
> +m4_define([AT_CLEANUP], [dnl
> +AT_CHECK([find . -name "core*" -print])
> +AT_CLEANUP_IGNORE_CORES])
> +
> m4_include([tests/ovs-macros.at])
> m4_include([tests/ovsdb-macros.at])
> m4_include([tests/ofproto-macros.at])

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] Extend sFlow agent to report tunnel and MPLS structures

2015-07-17 Thread Neil McKee

Packets are still sampled at ingress only, so the egress
tunnel and/or MPLS structures are only included when there is just 1 output
port.  The actions are either provided by the datapath in the sample upcall
or looked up in the userspace cache.  The former is preferred because it is
more reliable and does not present any new demands or constraints on the
userspace cache, however the code falls back on the userspace lookup so that
this solution can work with existing kernel datapath modules. If the lookup
fails it is not critical: the compiled user-action-cookie is still available
and provides the essential output port and output VLAN forwarding information
just as before.

The openvswitch actions can express almost any tunneling/mangling so the only
totally faithful representation would be to somehow encode the whole list of
flow actions in the sFlow output.  However the standard sFlow tunnel structures
can express most common real-world scenarios, so in parsing the actions we
look for those and skip the encoding if we see anything unusual. For example,
a single set(tunnel()) or tnl_push() is interpreted,  but if a second such
action is encountered then the egress tunnel reporting is suppressed.

The sFlow standard allows "best effort" encoding so that if a field is not
knowable or too onerous to look up then it can be left out. This is often
the case for the layer-4 source port or even the src ip address of a tunnel.
The assumption is that monitoring is enabled everywhere so a missing field
can typically be seen at ingress to the next switch in the path.

This patch also adds unit tests to check the sFlow encoding of set(tunnel()),
tnl_push() and push_mpls() actions.

The netlink attribute to request that actions be included in the upcall
from the datapath is inserted for sFlow sampling only.  To make that option
be explicit would require further changes to the printing and parsing of
actions in lib/odp-util.c, and to scripts in the test suite.

Further enhancements to report on 802.1AD QinQ, 64-bit tunnel IDs, and NAT
transformations can follow in future patches that make only incremental
changes.

Signed-off-by: Neil McKee 
---
 lib/dpif-netlink.c|   2 +
 lib/dpif.h|   1 +
 lib/odp-util.c|  25 +-
 lib/odp-util.h|   1 +
 ofproto/ofproto-dpif-sflow.c  | 574 +-
 ofproto/ofproto-dpif-sflow.h  |  30 ++-
 ofproto/ofproto-dpif-upcall.c |  38 ++-
 ofproto/ofproto-dpif-xlate.c  |  16 +-
 tests/ofproto-dpif.at | 264 ++-
 tests/test-sflow.c|  32 +++
 10 files changed, 958 insertions(+), 25 deletions(-)

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index 3650682..8884a9f 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -1969,6 +1969,7 @@ parse_odp_packet(const struct dpif_netlink *dpif, struct 
ofpbuf *buf,
 /* OVS_PACKET_CMD_ACTION only. */
 [OVS_PACKET_ATTR_USERDATA] = { .type = NL_A_UNSPEC, .optional = true },
 [OVS_PACKET_ATTR_EGRESS_TUN_KEY] = { .type = NL_A_NESTED, .optional = 
true },
+[OVS_PACKET_ATTR_ACTIONS] = { .type = NL_A_NESTED, .optional = true },
 };
 
 struct ovs_header *ovs_header;
@@ -2005,6 +2006,7 @@ parse_odp_packet(const struct dpif_netlink *dpif, struct 
ofpbuf *buf,
 dpif_flow_hash(&dpif->dpif, upcall->key, upcall->key_len, &upcall->ufid);
 upcall->userdata = a[OVS_PACKET_ATTR_USERDATA];
 upcall->out_tun_key = a[OVS_PACKET_ATTR_EGRESS_TUN_KEY];
+upcall->actions = a[OVS_PACKET_ATTR_ACTIONS];
 
 /* Allow overwriting the netlink attribute header without reallocating. */
 dp_packet_use_stub(&upcall->packet,
diff --git a/lib/dpif.h b/lib/dpif.h
index ba5d597..ea9caf8 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -784,6 +784,7 @@ struct dpif_upcall {
 /* DPIF_UC_ACTION only. */
 struct nlattr *userdata;/* Argument to OVS_ACTION_ATTR_USERSPACE. */
 struct nlattr *out_tun_key;/* Output tunnel key. */
+struct nlattr *actions;/* Argument to OVS_ACTION_ATTR_USERSPACE. */
 };
 
 /* A callback to process an upcall, currently implemented only by dpif-netdev.
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 0e82b12..c798491 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -251,10 +251,13 @@ format_odp_userspace_action(struct ds *ds, const struct 
nlattr *attr)
   .optional = true },
 [OVS_USERSPACE_ATTR_EGRESS_TUN_PORT] = { .type = NL_A_U32,
  .optional = true },
+[OVS_USERSPACE_ATTR_ACTIONS] = { .type = NL_A_UNSPEC,
+ .optional = true },
 };
 struct nlattr *a[ARRAY_SIZE(ovs_userspace_policy)];
 const struct nlattr *userdata_attr;
 const struct nlattr *tunnel_out_port_attr;
+const struct nlattr *actions_attr;
 
 if (!nl_parse_nested(attr, ovs_userspace_policy, a, ARRAY_SIZE(a))) {

58 matches

Mail list logo