Re: [ovs-dev] [BUG] broad-/multicast & SLB bonding -> FAIL

2013-01-23 Thread Markus Schuster
On Tuesday 22 January 2013 22:26:33 Ben Pfaff wrote:
> On Tue, Jan 22, 2013 at 07:39:01PM +0100, Markus Schuster wrote:
> > Open vSwitch sends out the multicast frames on ALL physical interfaces
> > belonging to the SLB bond. That causes a lot of confusion on the physical
> > switches that XCP hosts are connected to (VM MAC addresses jumping
> > between ports multiple times a second).
> 
> It shouldn't be doing that.  How did you determine that this is
> happening?

First of all a little overview of the components involved, so we're actually 
talking about the same thing:
http://www.picpaste.com/pics/overview-4xoHXl1v.1358932433.png
All XCP hosts are Blades, connected to the BladeCenter internal switches 
(Nortel L2-3 switching modules) which are connected to our switching backbone 
(via LACP aggregated links, but shouldn't matter for this issue). 

How did we determine this is happening: First we had a look in the switching 
table on Backbone Switch 1, seeing that the MAC address of the VM in question 
running on the XCP Blade jumps between the uplink ports to BC switch 1 and BC 
switch 2 like crazy. 
Next, we configured port mirroring on BC switches 1 & 2 to see all traffic of 
the BladeCenter internal port that connects the XCP Blade to the BC internal 
switches. We dumped traffic with tcpdump and had a look with Wireshark 
afterwards. What we saw was the exact same frames carrying the IP multicast 
packets on both BC switches at the very same time. 
Next, we connected directly to the XCP Blade, started tcpdump on the two 
physical interfaces forming the SLB bond and filtering for broad-/multicast 
traffic and the MAC of the VM in question. Again, we saw broad- and multicast 
traffic going out on both interfaces. 

That's the story, hope that helps you somehow. 

Regards,
Markus
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] datapath: Add pre_tunnel support

2013-01-23 Thread Jarno Rajahalme

On Jan 22, 2013, at 20:36 , ext Kyle Mestery wrote:

> Add support to the tunneling code for a "pre_tunnel"
> function. This allows the tunneling code to perform
> operations on the packet before the outer IP header is
> added.
> 
> A tunneling protocol such as LISP may require this, as it needs
> to remove the MAC header before applying the LISP header.
> 

It seems to me that the same can be achieved with your own version of
ovs_tnl_send() that first removes the MAC header and then calls ovs_tnl_send().
Your vport_ops table already has the hook for this, so there is no overhead in
doing this.


> Signed-off-by: Kyle Mestery 
> ---
> datapath/tunnel.c   | 5 +
> datapath/tunnel.h   | 7 +++
> datapath/vport-capwap.c | 1 +
> datapath/vport-gre.c| 2 ++
> datapath/vport-vxlan.c  | 1 +
> 5 files changed, 16 insertions(+)
> 
> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
> index d03b708..f1dd8d3 100644
> --- a/datapath/tunnel.c
> +++ b/datapath/tunnel.c
> @@ -988,6 +988,11 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff 
> *skb)
>   if (unlikely(vlan_deaccel_tag(skb)))
>   goto next;
> 
> + /* Pre tunnel header work done by tunneling layer. */
> + if (tnl_vport->tnl_ops->pre_tunnel)
> + skb = tnl_vport->tnl_ops->pre_tunnel(vport, mutable,
> +  skb);
> +
>   skb_push(skb, tunnel_hlen);
>   skb_reset_network_header(skb);
>   skb_set_transport_header(skb, sizeof(struct iphdr));
> diff --git a/datapath/tunnel.h b/datapath/tunnel.h
> index b7de7a9..12e7f19 100644
> --- a/datapath/tunnel.h
> +++ b/datapath/tunnel.h
> @@ -135,6 +135,13 @@ struct tnl_ops {
>   int (*hdr_len)(const struct tnl_mutable_config *,
>  const struct ovs_key_ipv4_tunnel *);
>   /*
> +  * Some tunnels may need to perform actions on the packet before
> +  * appending the outer IP header of the tunneled packet.
> +  */
> + struct sk_buff *(*pre_tunnel)(const struct vport *,
> +   const struct tnl_mutable_config *,
> +   struct sk_buff *);
> + /*
>* Returns a linked list of SKBs with tunnel headers (multiple
>* packets may be generated in the event of fragmentation).  Space
>* will have already been allocated at the start of the packet equal
> diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
> index f45d349..9055b1d 100644
> --- a/datapath/vport-capwap.c
> +++ b/datapath/vport-capwap.c
> @@ -358,6 +358,7 @@ static const struct tnl_ops capwap_tnl_ops = {
>   .tunnel_type= TNL_T_PROTO_CAPWAP,
>   .ipproto= IPPROTO_UDP,
>   .hdr_len= capwap_hdr_len,
> + .pre_tunnel = NULL,
>   .build_header   = capwap_build_header,
> };
> 
> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
> index 8ce8a35..80f805a 100644
> --- a/datapath/vport-gre.c
> +++ b/datapath/vport-gre.c
> @@ -422,6 +422,7 @@ static const struct tnl_ops gre_tnl_ops = {
>   .tunnel_type= TNL_T_PROTO_GRE,
>   .ipproto= IPPROTO_GRE,
>   .hdr_len= gre_hdr_len,
> + .pre_tunnel = NULL,
>   .build_header   = gre_build_header,
> };
> 
> @@ -439,6 +440,7 @@ static const struct tnl_ops gre64_tnl_ops = {
>   .tunnel_type= TNL_T_PROTO_GRE64,
>   .ipproto= IPPROTO_GRE,
>   .hdr_len= gre_hdr_len,
> + .pre_tunnel = NULL,
>   .build_header   = gre_build_header,
> };
> 
> diff --git a/datapath/vport-vxlan.c b/datapath/vport-vxlan.c
> index f72b95f..4922486 100644
> --- a/datapath/vport-vxlan.c
> +++ b/datapath/vport-vxlan.c
> @@ -329,6 +329,7 @@ static const struct tnl_ops ovs_vxlan_tnl_ops = {
>   .tunnel_type= TNL_T_PROTO_VXLAN,
>   .ipproto= IPPROTO_UDP,
>   .hdr_len= vxlan_hdr_len,
> + .pre_tunnel = NULL,
>   .build_header   = vxlan_build_header,
> };
> 
> -- 
> 1.7.11.7
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] datapath: Add support for LISP tunneling

2013-01-23 Thread Jarno Rajahalme
Please find my comments below,

  Jarno

On Jan 22, 2013, at 20:36 , ext Kyle Mestery wrote:

> From: Lorand Jakab 
> 
...
> +Flows on br0 are configured as follows:
> +
> +priority=3,dl_dst=02:00:00:00:00:00,action=mod_dl_dst:,NORMAL
> +priority=2,in_port=1,dl_type=0x0806,action=NORMAL
> +
> priority=1,in_port=1,dl_type=0x0800,vlan_tci=0,nw_src=,action=output:3
> +priority=0,action=NORMAL

I'll be referring to this example below.

> diff --git a/datapath/vport-lisp.c b/datapath/vport-lisp.c
> new file mode 100644
> index 000..558e5aa
> --- /dev/null
> +++ b/datapath/vport-lisp.c
> @@ -0,0 +1,351 @@
> +/*
> + * Copyright (c) 2011 Nicira, Inc.
> + * Copyright (c) 2013 Cisco Systems, Inc.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of version 2 of the GNU General Public
> + * License as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> + * 02110-1301, USA
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,26)
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +#include "datapath.h"
> +#include "tunnel.h"
> +#include "vport.h"
> +
> +#define LISP_DST_PORT 4341  /* Well known UDP port for LISP data packets. */
> +
> +struct lisp_net {
> + struct socket *lisp_rcv_socket;
> + int n_tunnels;
> +};
> +static struct lisp_net lisp_net;

How does this one shared global instance of lisp_net work with multiple 
networking
namespaces?  As it is, all namespaces seem to be sharing the same socket?

> +
> +
> +/*
> + *  LISP encapsulation header:
> + *
> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *  |N|L|E|V|I|flags|Nonce/Map-Version  |
> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *  | Instance ID/Locator Status Bits   |
> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *
> + */
> +
> +/**
> + * struct lisphdr - LISP header
> + * @nonce_present: Flag indicating the presence of a 24 bit nonce value.
> + * @lsb: Flag indicating the presence of Locator Status Bits (LSB).
> + * @echo_nonce: Flag indicating the use of the echo noncing mechanism.
> + * @map_version: Flag indicating the use of mapping versioning.
> + * @instance_id: Flag indicating the presence of a 24 bit Instance ID (IID).
> + * @rflags: 3 bits reserved for future flags.
> + * @nonce: 24 bit nonce value.
> + * @lsb_bits: 32 bit Locator Status Bits
> + */
> +struct lisphdr {
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> + __u8 rflags:3;
> + __u8 instance_id:1;
> + __u8 map_version:1;
> + __u8 echo_nonce:1;
> + __u8 lsb:1;
> + __u8 nonce_present:1;
> +#else
> + __u8 nonce_present:1;
> + __u8 lsb:1;
> + __u8 echo_nonce:1;
> + __u8 map_version:1;
> + __u8 instance_id:1;
> + __u8 rflags:3;
> +#endif
> + union {
> + __u8 nonce[3];
> + __u8 map_version[3];
> + } u1;
> + union {
> + __be32 lsb_bits;
> + __be32 iid;
> + } u2;
> +};

The code below would be more readable if the flags names could not be
confused with the value. E.g., "instance_id" could be called "have_instance_id",
and "u2.iid" could be called "instance_id"?

> +
> +#define LISP_HLEN (sizeof(struct udphdr) + sizeof(struct lisphdr))
> +
> +static inline int lisp_hdr_len(const struct tnl_mutable_config *mutable,
> +const struct ovs_key_ipv4_tunnel *tun_key)
> +{
> + return LISP_HLEN;
> +}
> +
> +static inline struct lisphdr *lisp_hdr(const struct sk_buff *skb)
> +{
> + return (struct lisphdr *)(udp_hdr(skb) + 1);
> +}
> +
> +/* Compute source port for outgoing packet.
> + * Currently we use the flow hash.
> + */
> +static u16 get_src_port(struct sk_buff *skb)
> +{
> + int low;
> + int high;
> + unsigned int range;
> + u32 hash = OVS_CB(skb)->flow->hash;
> +
> + inet_get_local_port_range(&low, &high);
> + range = (high - low) + 1;
> + return (((u64) hash * range) >> 32) + low;
> +}
> +
> +static struct sk_buff *lisp_pre_tunnel(const struct vport *vport,
> +const struct tnl_mutable_config *mutable,
> +struct sk_buff *skb)
> +{
> + /* Pop off "inner" Ethernet header */
> + skb_pull(skb, ETH_HLEN);
> + return skb;
> +}

Here it would 

Re: [ovs-dev] [BUG] broad-/multicast & SLB bonding -> FAIL

2013-01-23 Thread Ben Pfaff
On Wed, Jan 23, 2013 at 10:33:49AM +0100, Markus Schuster wrote:
> On Tuesday 22 January 2013 22:26:33 Ben Pfaff wrote:
> > On Tue, Jan 22, 2013 at 07:39:01PM +0100, Markus Schuster wrote:
> > > Open vSwitch sends out the multicast frames on ALL physical interfaces
> > > belonging to the SLB bond. That causes a lot of confusion on the physical
> > > switches that XCP hosts are connected to (VM MAC addresses jumping
> > > between ports multiple times a second).
> > 
> > It shouldn't be doing that.  How did you determine that this is
> > happening?
> 
> First of all a little overview of the components involved, so we're actually 
> talking about the same thing:
> http://www.picpaste.com/pics/overview-4xoHXl1v.1358932433.png
> All XCP hosts are Blades, connected to the BladeCenter internal switches 
> (Nortel L2-3 switching modules) which are connected to our switching backbone 
> (via LACP aggregated links, but shouldn't matter for this issue). 

It looks like you have each interface on an SLB bond connected to two
different switches.  That's not a supported configuration and won't work
reliably.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [BUG] broad-/multicast & SLB bonding -> FAIL

2013-01-23 Thread Markus Schuster
On Wednesday 23 January 2013 17:49:09 Ben Pfaff wrote:
> It looks like you have each interface on an SLB bond connected to two
> different switches.  

Correct. 


> That's not a supported configuration and won't work reliably.

OK, any explanation why? Or to say it the other way: I can't see any  
technical reason why it should work on the same switch but not on two 
different? The MAC will jump between the ports the very same way and the VM 
will suffer from the same connectivity issues, no matter what. 

btw: XCP/XenServer used classical Linux bonding (SLB implemented on top) 
before - connecting the bond slaves to different switches was a supported 
configuration and recommended for redundancy reasons. 

I hope you can give me some technical explanation so I understand your point. 

Regards,
Markus
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] datapath: Add support for LISP tunneling

2013-01-23 Thread Lori Jakab
Hi Jarno,

Thanks for reviewing! Replies inline:

On 01/23/13 16:32, Jarno Rajahalme wrote:
> Please find my comments below,
> 
>   Jarno
> 
> On Jan 22, 2013, at 20:36 , ext Kyle Mestery wrote:
> 
>> From: Lorand Jakab 
>>
> ...
>> +Flows on br0 are configured as follows:
>> +
>> +priority=3,dl_dst=02:00:00:00:00:00,action=mod_dl_dst:,NORMAL
>> +priority=2,in_port=1,dl_type=0x0806,action=NORMAL
>> +
>> priority=1,in_port=1,dl_type=0x0800,vlan_tci=0,nw_src=,action=output:3
>> +priority=0,action=NORMAL
> 
> I'll be referring to this example below.
> 
>> diff --git a/datapath/vport-lisp.c b/datapath/vport-lisp.c
>> new file mode 100644
>> index 000..558e5aa
>> --- /dev/null
>> +++ b/datapath/vport-lisp.c
>> @@ -0,0 +1,351 @@
>> +/*
>> + * Copyright (c) 2011 Nicira, Inc.
>> + * Copyright (c) 2013 Cisco Systems, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of version 2 of the GNU General Public
>> + * License as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful, but
>> + * WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>> + * 02110-1301, USA
>> + */
>> +
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +
>> +#include 
>> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,6,26)
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "datapath.h"
>> +#include "tunnel.h"
>> +#include "vport.h"
>> +
>> +#define LISP_DST_PORT 4341  /* Well known UDP port for LISP data packets. */
>> +
>> +struct lisp_net {
>> +struct socket *lisp_rcv_socket;
>> +int n_tunnels;
>> +};
>> +static struct lisp_net lisp_net;
> 
> How does this one shared global instance of lisp_net work with multiple 
> networking
> namespaces?  As it is, all namespaces seem to be sharing the same socket?

Yes they are. Would it be better to use the same approach as the capwap
tunnel, which uses the struct ovs_net->vport_net to store struct capwap_net?

> 
>> +
>> +
>> +/*
>> + *  LISP encapsulation header:
>> + *
>> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> + *  |N|L|E|V|I|flags|Nonce/Map-Version  |
>> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> + *  | Instance ID/Locator Status Bits   |
>> + *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>> + *
>> + */
>> +
>> +/**
>> + * struct lisphdr - LISP header
>> + * @nonce_present: Flag indicating the presence of a 24 bit nonce value.
>> + * @lsb: Flag indicating the presence of Locator Status Bits (LSB).
>> + * @echo_nonce: Flag indicating the use of the echo noncing mechanism.
>> + * @map_version: Flag indicating the use of mapping versioning.
>> + * @instance_id: Flag indicating the presence of a 24 bit Instance ID (IID).
>> + * @rflags: 3 bits reserved for future flags.
>> + * @nonce: 24 bit nonce value.
>> + * @lsb_bits: 32 bit Locator Status Bits
>> + */
>> +struct lisphdr {
>> +#ifdef __LITTLE_ENDIAN_BITFIELD
>> +__u8 rflags:3;
>> +__u8 instance_id:1;
>> +__u8 map_version:1;
>> +__u8 echo_nonce:1;
>> +__u8 lsb:1;
>> +__u8 nonce_present:1;
>> +#else
>> +__u8 nonce_present:1;
>> +__u8 lsb:1;
>> +__u8 echo_nonce:1;
>> +__u8 map_version:1;
>> +__u8 instance_id:1;
>> +__u8 rflags:3;
>> +#endif
>> +union {
>> +__u8 nonce[3];
>> +__u8 map_version[3];
>> +} u1;
>> +union {
>> +__be32 lsb_bits;
>> +__be32 iid;
>> +} u2;
>> +};
> 
> The code below would be more readable if the flags names could not be
> confused with the value. E.g., "instance_id" could be called 
> "have_instance_id",
> and "u2.iid" could be called "instance_id"?

Agreed, poor naming choice on my part. Will fix.

> 
>> +
>> +#define LISP_HLEN (sizeof(struct udphdr) + sizeof(struct lisphdr))
>> +
>> +static inline int lisp_hdr_len(const struct tnl_mutable_config *mutable,
>> +   const struct ovs_key_ipv4_tunnel *tun_key)
>> +{
>> +return LISP_HLEN;
>> +}
>> +
>> +static inline struct lisphdr *lisp_hdr(const struct sk_buff *skb)
>> +{
>> +return (struct lisphdr *)(udp_hdr(skb) + 1);
>> +}
>> +
>> +/* Compute source port for outgoing packet.
>> + * Currently we use the flow hash.
>> + */
>> +static u16 get_src_port(struct sk_buff *skb)
>> +{
>> +int low;
>> +int high;
>> +unsigned int range;
>> +u32 hash = OVS_CB(skb)->flow->hash;
>> +
>> +inet_get_local_port_range(&l

Re: [ovs-dev] [PATCH 1/2] datapath: Add pre_tunnel support

2013-01-23 Thread Lori Jakab
On 01/23/13 16:01, Jarno Rajahalme wrote:
> 
> On Jan 22, 2013, at 20:36 , ext Kyle Mestery wrote:
> 
>> Add support to the tunneling code for a "pre_tunnel"
>> function. This allows the tunneling code to perform
>> operations on the packet before the outer IP header is
>> added.
>>
>> A tunneling protocol such as LISP may require this, as it needs
>> to remove the MAC header before applying the LISP header.
>>
> 
> It seems to me that the same can be achieved with your own version of
> ovs_tnl_send() that first removes the MAC header and then calls 
> ovs_tnl_send().
> Your vport_ops table already has the hook for this, so there is no overhead in
> doing this.

Indeed, we missed this. Will use the ovs_tnl_send hook in the next version.

Thanks for reviewing!

-Lori

> 
> 
>> Signed-off-by: Kyle Mestery 
>> ---
>> datapath/tunnel.c   | 5 +
>> datapath/tunnel.h   | 7 +++
>> datapath/vport-capwap.c | 1 +
>> datapath/vport-gre.c| 2 ++
>> datapath/vport-vxlan.c  | 1 +
>> 5 files changed, 16 insertions(+)
>>
>> diff --git a/datapath/tunnel.c b/datapath/tunnel.c
>> index d03b708..f1dd8d3 100644
>> --- a/datapath/tunnel.c
>> +++ b/datapath/tunnel.c
>> @@ -988,6 +988,11 @@ int ovs_tnl_send(struct vport *vport, struct sk_buff 
>> *skb)
>>  if (unlikely(vlan_deaccel_tag(skb)))
>>  goto next;
>>
>> +/* Pre tunnel header work done by tunneling layer. */
>> +if (tnl_vport->tnl_ops->pre_tunnel)
>> +skb = tnl_vport->tnl_ops->pre_tunnel(vport, mutable,
>> + skb);
>> +
>>  skb_push(skb, tunnel_hlen);
>>  skb_reset_network_header(skb);
>>  skb_set_transport_header(skb, sizeof(struct iphdr));
>> diff --git a/datapath/tunnel.h b/datapath/tunnel.h
>> index b7de7a9..12e7f19 100644
>> --- a/datapath/tunnel.h
>> +++ b/datapath/tunnel.h
>> @@ -135,6 +135,13 @@ struct tnl_ops {
>>  int (*hdr_len)(const struct tnl_mutable_config *,
>> const struct ovs_key_ipv4_tunnel *);
>>  /*
>> + * Some tunnels may need to perform actions on the packet before
>> + * appending the outer IP header of the tunneled packet.
>> + */
>> +struct sk_buff *(*pre_tunnel)(const struct vport *,
>> +  const struct tnl_mutable_config *,
>> +  struct sk_buff *);
>> +/*
>>   * Returns a linked list of SKBs with tunnel headers (multiple
>>   * packets may be generated in the event of fragmentation).  Space
>>   * will have already been allocated at the start of the packet equal
>> diff --git a/datapath/vport-capwap.c b/datapath/vport-capwap.c
>> index f45d349..9055b1d 100644
>> --- a/datapath/vport-capwap.c
>> +++ b/datapath/vport-capwap.c
>> @@ -358,6 +358,7 @@ static const struct tnl_ops capwap_tnl_ops = {
>>  .tunnel_type= TNL_T_PROTO_CAPWAP,
>>  .ipproto= IPPROTO_UDP,
>>  .hdr_len= capwap_hdr_len,
>> +.pre_tunnel = NULL,
>>  .build_header   = capwap_build_header,
>> };
>>
>> diff --git a/datapath/vport-gre.c b/datapath/vport-gre.c
>> index 8ce8a35..80f805a 100644
>> --- a/datapath/vport-gre.c
>> +++ b/datapath/vport-gre.c
>> @@ -422,6 +422,7 @@ static const struct tnl_ops gre_tnl_ops = {
>>  .tunnel_type= TNL_T_PROTO_GRE,
>>  .ipproto= IPPROTO_GRE,
>>  .hdr_len= gre_hdr_len,
>> +.pre_tunnel = NULL,
>>  .build_header   = gre_build_header,
>> };
>>
>> @@ -439,6 +440,7 @@ static const struct tnl_ops gre64_tnl_ops = {
>>  .tunnel_type= TNL_T_PROTO_GRE64,
>>  .ipproto= IPPROTO_GRE,
>>  .hdr_len= gre_hdr_len,
>> +.pre_tunnel = NULL,
>>  .build_header   = gre_build_header,
>> };
>>
>> diff --git a/datapath/vport-vxlan.c b/datapath/vport-vxlan.c
>> index f72b95f..4922486 100644
>> --- a/datapath/vport-vxlan.c
>> +++ b/datapath/vport-vxlan.c
>> @@ -329,6 +329,7 @@ static const struct tnl_ops ovs_vxlan_tnl_ops = {
>>  .tunnel_type= TNL_T_PROTO_VXLAN,
>>  .ipproto= IPPROTO_UDP,
>>  .hdr_len= vxlan_hdr_len,
>> +.pre_tunnel = NULL,
>>  .build_header   = vxlan_build_header,
>> };
>>
>> -- 
>> 1.7.11.7
>>
>> ___
>> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
> 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
> 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] RFC: Pass more packet and flow key info to userspace.

2013-01-23 Thread Jesse Gross
On Tue, Jan 22, 2013 at 9:48 PM, Jarno Rajahalme
 wrote:
> Add OVS_PACKET_ATTR_KEY_INFO to relieve userspace from re-computing
> data already computed within the kernel datapath.  In the typical
> case of an upcall with perfect key fitness between kernel and
> userspace this eliminates flow_extract() and flow_hash() calls in
> handle_miss_upcalls().
>
> Additional bookkeeping within the kernel datapath is minimal.
> Kernel flow insertion also saves one flow key hash computation.
>
> Removed setting the packet's l7 pointer for ICMP packets, as this was
> never used.
>
> Signed-off-by: Jarno Rajahalme 
> ---
>
> This likely requires some discussion, but it took a while for me to
> understand why each packet miss upcall would require flow_extract()
> right after the flow key has been obtained from odp attributes.

Do you have any performance numbers to share?  Since this is an
optimization it's important to understand if the benefit is worth the
extra complexity.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] ofproto: Optimise OpenFlow flow expiry

2013-01-23 Thread Ben Pfaff
On Thu, Jan 17, 2013 at 09:42:29AM +0900, Simon Horman wrote:
> On Wed, Jan 16, 2013 at 01:59:21PM -0800, Ben Pfaff wrote:
> > On Tue, Jan 15, 2013 at 01:20:57PM +0900, Simon Horman wrote:
> > > Optimise OpenFlow flow expiry by placing expirable flows on a list.
> > > This optimises scanning of flows for expiry in two ways:
> > > 
> > > * Empirically list traversal appears faster than the code it replaces.
> > > 
> > >   With 1,000,000 flows present an otherwise idle system I observed CPU
> > >   utilisation of around 20% with the existing code but around 10% with
> > >   this new code.
> > > 
> > > * Flows that will never expire are not traversed.
> > > 
> > >   This addresses a case seen in the field.
> > 
> > This version looks better.  I still have a few comments, but before
> > that, may I ask a little bit about the situation in which the
> > performance improvement was observed?  In this situation, about how
> > many of the 1,000,000 flows were actually expirable, that is, had
> > either a hard timeout or an idle timeout?  That is, is the performance
> > improvement due more to the first or the second bullet you list above?
> > If none of the flows were expirable, then I guess it was the second;
> > if all of them were, then I guess it was the first; and otherwise it
> > is something in between.
> > 
> > Basically I'm wondering if we should do something to make flow table
> > traversal faster, independent of expiration.
> 
> Hi Ben,
> 
> the primary aim of this patch was to address a performance issue that
> was noticed when inserting 100,000 flows none of which were expirable.
> I have been told this is representative of an expected use-case.
> 
> During my testing I used 1,000,000 flows instead of 100,000 in order to
> make the CPU utilisation more pronounced and easier to observe.
> 
> In the course of my testing I tested 1,000,000 flows none of which were
> expirable and in that case CPU utilisation was dramatically reduced to
> approximately 0. This seems to be a good outcome for the use-case
> originally reported.
> 
> In the course of testing I also tested 1,000,000 flows all of which
> were expirable. This was primarily to see if there were any regressions.
> In the course of this test I noticed that there seemed to be some
> reduction in CPU utilisation. However this was just a side effect and
> not an aim of my work. I should have placed it as the second bullet
> in my commit message and noted that it was a side effect.

OK, I made a few stylistic and comment changes to the patch, and I've
updated the commit message to reflect what you said above.  I'm happy
with it now, but I don't want to apply it without your approval since I
changed the meaning of your commit message.  Please review?

--8<--cut here-->8--

>From 5eee5262555b30c03676975c31543e19e893b1b7 Mon Sep 17 00:00:00 2001
From: Simon Horman 
Date: Tue, 15 Jan 2013 13:20:57 +0900
Subject: [PATCH] ofproto: Optimise OpenFlow flow expiry

Optimise OpenFlow flow expiry by placing expirable flows on a list.
This optimises scanning of flows for expiry in two ways:

* Flows that will never expire are not traversed.

  This addresses a case seen in the field.  With 1,000,000 flows that
  are not expirable, this dramatically reduces CPU utilization to
  approximately zero.

* Empirically list traversal appears faster than the code it replaces.

  With 1,000,000 expirable flows present an otherwise idle system I
  observed CPU utilisation of around 20% with the existing code but
  around 10% with this new code.

Signed-off-by: Simon Horman 
Signed-off-by: Ben Pfaff 
---
 ofproto/ofproto-dpif.c |   13 -
 ofproto/ofproto-provider.h |8 
 ofproto/ofproto.c  |   10 ++
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 2c216fe..69da618 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3726,8 +3726,7 @@ expire(struct dpif_backer *backer)
 update_stats(backer);
 
 HMAP_FOR_EACH (ofproto, all_ofproto_dpifs_node, &all_ofproto_dpifs) {
-struct rule_dpif *rule, *next_rule;
-struct oftable *table;
+struct rule *rule, *next_rule;
 int dp_max_idle;
 
 if (ofproto->backer != backer) {
@@ -3742,13 +3741,9 @@ expire(struct dpif_backer *backer)
 
 /* Expire OpenFlow flows whose idle_timeout or hard_timeout
  * has passed. */
-OFPROTO_FOR_EACH_TABLE (table, &ofproto->up) {
-struct cls_cursor cursor;
-
-cls_cursor_init(&cursor, &table->cls, NULL);
-CLS_CURSOR_FOR_EACH_SAFE (rule, next_rule, up.cr, &cursor) {
-rule_expire(rule);
-}
+LIST_FOR_EACH_SAFE (rule, next_rule, expirable,
+&ofproto->up.expirable) {
+rule_expire(rule_dpif_cast(rule));
 }
 
 /* All outstanding data in existing flows has

Re: [ovs-dev] [PATCH 0/2] Add support for LISP into Open vSwitch

2013-01-23 Thread Jesse Gross
On Tue, Jan 22, 2013 at 10:36 AM, Kyle Mestery  wrote:
> The following two patches provide support for the LISP tunneling protocol into
> Open vSwitch. See the latest IETF draft for LISP here:
>
> http://tools.ietf.org/html/draft-ietf-lisp-24
>
> Kyle Mestery (1):
>   Add support to the tunneling code for a "pre_tunnel" function.
> This allows the tunneling code to perform operations on the
> packet before the outer IP header is added.
>
> Lorand Jakab (1):
>   Add support for LISP tunneling

Hi Kyle,

I'm curious if you can share your long term plan for LISP.  In
particular there are three areas that I was wondering about:
 * L3 support.  Obviously OVS is very Ethernet centric at this point,
resulting in the need to play games with MAC addresses.
 * Additional LISP data plane support.  LISP encodes more control
information in the protocol itself compared to the existing OVS tunnel
implementations, which basically only have the tunnel ID.  It looks
like this implementation generates nonces on transmit but otherwise
doesn't try to handle the other pieces.
 * LISP control plane components.

What do you guys see as the ideal end result?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch_ports (rebased) 0/4]

2013-01-23 Thread Ethan Jackson
Ben asked that I repost the entire series rebased against master.
The first patch of the series should be looked at again as it's
changed significantly.  The third and fourth patches haven't been
reviewed yet.

Ben Pfaff (1):
  netdev-dummy: Test LACP negotiation.

Ethan Jackson (3):
  ofproto-dpif: Scope revalidation state to dpif_backers.
  ofproto-dpif: Refresh stats before ovs-appctl dpif/dump-flows.
  ofproto-dpif: Implement patch ports in userspace.

 FAQ|   25 ++-
 NEWS   |1 +
 lib/netdev-provider.h  |1 +
 lib/netdev-vport.c |  280 ---
 lib/netdev-vport.h |   10 +-
 lib/netdev.c   |8 +
 lib/netdev.h   |1 +
 ofproto/ofproto-dpif.c |  348 +++
 tests/lacp.at  |  430 
 tests/ofproto-dpif.at  |   57 +++
 10 files changed, 914 insertions(+), 247 deletions(-)

-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch_ports (rebased) 2/4] ofproto-dpif: Refresh stats before ovs-appctl dpif/dump-flows.

2013-01-23 Thread Ethan Jackson
As a matter of convenience, this patch refreshes the statistics
when ovs-appctl dpif/dump-flows is called.  Hopefully this will
prevent users from being confused because statistics they were
expecting to see haven't made it from the datapath into the switch
yet.

Signed-off-by: Ethan Jackson 
---
 ofproto/ofproto-dpif.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index ff342a0..5c5d261 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -7634,6 +7634,8 @@ ofproto_unixctl_dpif_dump_flows(struct unixctl_conn *conn,
 return;
 }
 
+update_stats(ofproto->backer);
+
 HMAP_FOR_EACH (subfacet, hmap_node, &ofproto->subfacets) {
 struct odputil_keybuf keybuf;
 struct ofpbuf key;
-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [patch_ports (rebased) 1/4] ofproto-dpif: Scope revalidation state to dpif_backers.

2013-01-23 Thread Ethan Jackson
Before this patch, the "need_revalidate" flag and the
"revalidate_set" tag_set where maintained separately for each
ofproto.  This won't work in future patches when a flow table
change in one ofproto can require revalidation in another entirely
separate ofproto.  For this reason, this patch scopes these flags
to the dpif_backers.

Signed-off-by: Ethan Jackson 
---
 ofproto/ofproto-dpif.c |  173 +---
 1 file changed, 92 insertions(+), 81 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 2c216fe..ff342a0 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -619,6 +619,10 @@ struct dpif_backer {
 struct dpif *dpif;
 struct timer next_expiration;
 struct hmap odp_to_ofport_map; /* ODP port to ofport mapping. */
+
+/* Facet revalidation flags applying to facets which use this backer. */
+enum revalidate_reason need_revalidate; /* Revalidate every facet. */
+struct tag_set revalidate_set; /* Revalidate only matching facets. */
 };
 
 /* All existing ofproto_backer instances, indexed by ofproto->up.type. */
@@ -655,8 +659,6 @@ struct ofproto_dpif {
 
 /* Revalidation. */
 struct table_dpif tables[N_TABLES];
-enum revalidate_reason need_revalidate;
-struct tag_set revalidate_set;
 
 /* Support for debugging async flow mods. */
 struct list completions;
@@ -824,6 +826,40 @@ type_run(const char *type)
 
 dpif_run(backer->dpif);
 
+if (backer->need_revalidate
+|| !tag_set_is_empty(&backer->revalidate_set)) {
+struct ofproto_dpif *ofproto;
+
+switch (backer->need_revalidate) {
+case REV_RECONFIGURE:   COVERAGE_INC(rev_reconfigure);   break;
+case REV_STP:   COVERAGE_INC(rev_stp);   break;
+case REV_PORT_TOGGLED:  COVERAGE_INC(rev_port_toggled);  break;
+case REV_FLOW_TABLE:COVERAGE_INC(rev_flow_table);break;
+case REV_INCONSISTENCY: COVERAGE_INC(rev_inconsistency); break;
+}
+
+HMAP_FOR_EACH (ofproto, all_ofproto_dpifs_node, &all_ofproto_dpifs) {
+struct facet *facet;
+
+if (ofproto->backer != backer) {
+continue;
+}
+
+/* Clear the revalidation flags. */
+tag_set_init(&backer->revalidate_set);
+backer->need_revalidate = 0;
+
+HMAP_FOR_EACH (facet, hmap_node, &ofproto->facets) {
+if (backer->need_revalidate
+|| tag_set_intersects(&backer->revalidate_set,
+  facet->tags)) {
+facet_revalidate(facet);
+}
+}
+}
+
+}
+
 if (timer_expired(&backer->next_expiration)) {
 int delay = expire(backer);
 timer_set_duration(&backer->next_expiration, delay);
@@ -1030,6 +1066,8 @@ open_dpif_backer(const char *type, struct dpif_backer 
**backerp)
 backer->refcount = 1;
 hmap_init(&backer->odp_to_ofport_map);
 timer_set_duration(&backer->next_expiration, 1000);
+backer->need_revalidate = 0;
+tag_set_init(&backer->revalidate_set);
 *backerp = backer;
 
 dpif_flow_flush(backer->dpif);
@@ -1107,8 +1145,6 @@ construct(struct ofproto *ofproto_)
 table->other_table = NULL;
 table->basis = random_uint32();
 }
-ofproto->need_revalidate = 0;
-tag_set_init(&ofproto->revalidate_set);
 
 list_init(&ofproto->completions);
 
@@ -1317,44 +1353,18 @@ run(struct ofproto *ofproto_)
 }
 
 stp_run(ofproto);
-mac_learning_run(ofproto->ml, &ofproto->revalidate_set);
-
-/* Now revalidate if there's anything to do. */
-if (ofproto->need_revalidate
-|| !tag_set_is_empty(&ofproto->revalidate_set)) {
-struct tag_set revalidate_set = ofproto->revalidate_set;
-bool revalidate_all = ofproto->need_revalidate;
-struct facet *facet;
-
-switch (ofproto->need_revalidate) {
-case REV_RECONFIGURE:   COVERAGE_INC(rev_reconfigure);   break;
-case REV_STP:   COVERAGE_INC(rev_stp);   break;
-case REV_PORT_TOGGLED:  COVERAGE_INC(rev_port_toggled);  break;
-case REV_FLOW_TABLE:COVERAGE_INC(rev_flow_table);break;
-case REV_INCONSISTENCY: COVERAGE_INC(rev_inconsistency); break;
-}
-
-/* Clear the revalidation flags. */
-tag_set_init(&ofproto->revalidate_set);
-ofproto->need_revalidate = 0;
-
-HMAP_FOR_EACH (facet, hmap_node, &ofproto->facets) {
-if (revalidate_all
-|| tag_set_intersects(&revalidate_set, facet->tags)) {
-facet_revalidate(facet);
-}
-}
-}
+mac_learning_run(ofproto->ml, &ofproto->backer->revalidate_set);
 
 /* Check the consistency of a random facet, to aid debugging. */
-if (!hmap_is_empty(&ofproto->facets) && !ofproto->need_revalidate) {
+if (!hmap_is_empty(&ofproto-

[ovs-dev] [patch_ports (rebased) 4/4] netdev-dummy: Test LACP negotiation.

2013-01-23 Thread Ethan Jackson
From: Ben Pfaff 

Signed-off-by: Ben Pfaff 
Signed-off-by: Ethan Jackson 
---
 tests/lacp.at |  430 +
 1 file changed, 430 insertions(+)

diff --git a/tests/lacp.at b/tests/lacp.at
index 7d96143..870beb0 100644
--- a/tests/lacp.at
+++ b/tests/lacp.at
@@ -117,3 +117,433 @@ slave p2: disabled
 ])
 OVS_VSWITCHD_STOP
 AT_CLEANUP
+
+AT_SETUP([lacp - negotiation])
+# Create bond0 on br0 with interfaces p0 and p1
+#and bond1 on br1 with interfaces p2 and p3
+# with p0 patched to p2 and p1 patched to p3.
+OVS_VSWITCHD_START(
+  [add-bond br0 bond0 p0 p1 bond_mode=balance-tcp lacp=active \
+other-config:lacp-time=fast \
+other-config:bond-rebalance-interval=0 -- \
+   set interface p0 type=patch options:peer=p2 ofport_request=1 -- \
+   set interface p1 type=patch options:peer=p3 ofport_request=2 -- \
+   add-br br1 -- \
+   set bridge br1 other-config:hwaddr=aa:66:aa:66:00:00 -- \
+   set bridge br1 datapath-type=dummy other-config:datapath-id=1234 \
+  fail-mode=secure -- \
+   add-bond br1 bond1 p2 p3 bond_mode=balance-tcp lacp=active \
+other-config:lacp-time=fast \
+other-config:bond-rebalance-interval=0 -- \
+   set interface p2 type=patch options:peer=p0 ofport_request=3 -- \
+   set interface p3 type=patch options:peer=p1 ofport_request=4 --])
+
+AT_CHECK([ovs-appctl netdev-dummy/set-admin-state up], 0, [OK
+])
+
+# Wait for up to 5 (simulated) seconds, until LACP negotiation finishes.
+i=0
+while :; do
+ovs-appctl lacp/show bond0 > bond0
+AT_CAPTURE_FILE([bond0])
+ovs-appctl lacp/show bond1 > bond1
+AT_CAPTURE_FILE([bond1])
+if grep negotiated bond0 && grep negotiated bond1; then
+if grep expired bond0 || grep expired bond1; then
+:
+else
+break
+fi
+fi
+i=`expr $i + 1`
+if test $i = 50; then
+AT_FAIL_IF([:])
+fi
+ovs-appctl time/warp 100
+done
+
+# Now check the correctly negotiated configuration.
+AT_CHECK(
+  [ovs-appctl lacp/show bond0
+ovs-appctl lacp/show bond1
+ovs-appctl bond/show bond0
+ovs-appctl bond/show bond1], [0], [stdout])
+AT_CHECK([sed '/active slave/d' stdout], [0], [dnl
+ bond0 
+   status: active negotiated
+   sys_id: aa:55:aa:55:00:00
+   sys_priority: 65534
+   aggregation key: 2
+   lacp_time: fast
+
+slave: p0: current attached
+   port_id: 1
+   port_priority: 65535
+   may_enable: true
+
+   actor sys_id: aa:55:aa:55:00:00
+   actor sys_priority: 65534
+   actor port_id: 1
+   actor port_priority: 65535
+   actor key: 2
+   actor state: activity timeout aggregation synchronized collecting 
distributing
+
+   partner sys_id: aa:66:aa:66:00:00
+   partner sys_priority: 65534
+   partner port_id: 3
+   partner port_priority: 65535
+   partner key: 3
+   partner state: activity timeout aggregation synchronized collecting 
distributing
+
+slave: p1: current attached
+   port_id: 2
+   port_priority: 65535
+   may_enable: true
+
+   actor sys_id: aa:55:aa:55:00:00
+   actor sys_priority: 65534
+   actor port_id: 2
+   actor port_priority: 65535
+   actor key: 2
+   actor state: activity timeout aggregation synchronized collecting 
distributing
+
+   partner sys_id: aa:66:aa:66:00:00
+   partner sys_priority: 65534
+   partner port_id: 4
+   partner port_priority: 65535
+   partner key: 3
+   partner state: activity timeout aggregation synchronized collecting 
distributing
+ bond1 
+   status: active negotiated
+   sys_id: aa:66:aa:66:00:00
+   sys_priority: 65534
+   aggregation key: 3
+   lacp_time: fast
+
+slave: p2: current attached
+   port_id: 3
+   port_priority: 65535
+   may_enable: true
+
+   actor sys_id: aa:66:aa:66:00:00
+   actor sys_priority: 65534
+   actor port_id: 3
+   actor port_priority: 65535
+   actor key: 3
+   actor state: activity timeout aggregation synchronized collecting 
distributing
+
+   partner sys_id: aa:55:aa:55:00:00
+   partner sys_priority: 65534
+   partner port_id: 1
+   partner port_priority: 65535
+   partner key: 2
+   partner state: activity timeout aggregation synchronized collecting 
distributing
+
+slave: p3: current attached
+   port_id: 4
+   port_priority: 65535
+   may_enable: true
+
+   actor sys_id: aa:66:aa:66:00:00
+   actor sys_priority: 65534
+   actor port_id: 4
+   actor port_priority: 65535
+   actor key: 3
+   actor state: activity timeout aggregation synchronized collecting 
distributing
+
+   partner sys_id: aa:55:aa:55:00:00
+   partner sys_priority: 65534
+   partner port_id: 2
+   partner port_priority: 65535
+   partner key: 2
+   partner state: activ

[ovs-dev] [patch_ports (rebased) 3/4] ofproto-dpif: Implement patch ports in userspace.

2013-01-23 Thread Ethan Jackson
This commit moves responsibility for implementing patch ports from
the datapath to ofproto-dpif.  There are two main reasons to do
this.

The first is a matter of design:  ofproto-dpif both has more
information than the datapath, and is better suited to handle the
complexity required to implement patch ports.

The second is performance.  My setup is a virtual machine with two
basic learning bridges connected by patch ports.  I used
ovs-benchmark to ping the virtual router IP residing outside the
VM.  Over a 60 second run, "ovs-benchmark rate" improves from
14618.1 to 19311.9 transactions per second, or a 32% improvement.
Similarly, "ovs-benchmark latency" improves from 6ms to 4ms.

Signed-off-by: Ethan Jackson 
---
 FAQ|   25 +++--
 NEWS   |1 +
 lib/netdev-provider.h  |1 +
 lib/netdev-vport.c |  280 +---
 lib/netdev-vport.h |   10 +-
 lib/netdev.c   |8 ++
 lib/netdev.h   |1 +
 ofproto/ofproto-dpif.c |  173 ++
 tests/ofproto-dpif.at  |   57 ++
 9 files changed, 390 insertions(+), 166 deletions(-)

diff --git a/FAQ b/FAQ
index ab1c1cc..a466ca4 100644
--- a/FAQ
+++ b/FAQ
@@ -172,17 +172,24 @@ A: The kernel module in upstream Linux 3.3 and later does 
not include
  vSwitch distribution instead of the upstream Linux kernel
  module.
 
-   - Patch virtual ports, that is, interfaces with type "patch".
- You can use Linux "veth" devices as a substitute.
-
- We don't have any plans to add patch ports upstream.
-
 Q: What features are not available when using the userspace datapath?
 
-A: Tunnel and patch virtual ports are not supported, as described in the
-   previous answer.  It is also not possible to use queue-related
-   actions.  On Linux kernels before 2.6.39, maximum-sized VLAN packets
-   may not be transmitted.
+A: The kernel module in upstream Linux 3.3 and later does not include
+   tunnel virtual ports, that is, interfaces with type "gre",
+   "ipsec_gre", "gre64", "ipsec_gre64", "vxlan", or "capwap".  It is
+   possible to create tunnels in Linux and attach them to Open vSwitch
+   as system devices.  However, they cannot be dynamically created
+   through the OVSDB protocol or set the tunnel ids as a flow action.
+
+   Work is in progress in adding tunnel virtual ports to the upstream
+   Linux version of the Open vSwitch kernel module.  For now, if you
+   need these features, use the kernel module from the Open vSwitch
+   distribution instead of the upstream Linux kernel module.
+
+   The upstream kernel module does not include patch ports, but this
+   only matters for Open vSwitch 1.9 and earlier, because Open vSwitch
+   1.10 and later implement patch ports without using this kernel
+   feature.
 
 
 Terminology
diff --git a/NEWS b/NEWS
index 6cf09ba..62488d0 100644
--- a/NEWS
+++ b/NEWS
@@ -25,6 +25,7 @@ post-v1.9.0
 retire that meaning of ANY in favor of the OpenFlow 1.1 meaning.
 - Inheritance of the Don't Fragment bit in IP tunnels (df_inherit) is
   no longer supported.
+- Patch ports are implemented in userspace.
 
 
 v1.9.0 - xx xxx 
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index 9db950c..c7de2c2 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -601,6 +601,7 @@ const struct netdev_class *netdev_lookup_provider(const 
char *type);
 extern const struct netdev_class netdev_linux_class;
 extern const struct netdev_class netdev_internal_class;
 extern const struct netdev_class netdev_tap_class;
+extern const struct netdev_class netdev_patch_class;
 #ifdef __FreeBSD__
 extern const struct netdev_class netdev_bsd_class;
 #endif
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 60437b9..4054ee0 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -29,6 +29,7 @@
 #include "byte-order.h"
 #include "daemon.h"
 #include "dirs.h"
+#include "dpif.h"
 #include "dpif-linux.h"
 #include "hash.h"
 #include "hmap.h"
@@ -56,21 +57,21 @@ VLOG_DEFINE_THIS_MODULE(netdev_vport);
 
 struct netdev_dev_vport {
 struct netdev_dev netdev_dev;
-struct ofpbuf *options;
 unsigned int change_seq;
 uint8_t etheraddr[ETH_ADDR_LEN];
+
+/* Tunnels. */
+struct ofpbuf *options;
 struct netdev_tunnel_config tnl_cfg;
+
+/* Patch Ports. */
+struct netdev_stats stats;
+char *peer;
 };
 
 struct vport_class {
 enum ovs_vport_type type;
 struct netdev_class netdev_class;
-int (*parse_config)(const char *name, const char *type,
-const struct smap *args, struct ofpbuf *options,
-struct netdev_tunnel_config *tnl_cfg);
-int (*unparse_config)(const char *name, const char *type,
-  const struct nlattr *options, size_t options_len,
-  struct smap *args);
 };
 
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
@@ 

Re: [ovs-dev] [BUG] broad-/multicast & SLB bonding -> FAIL

2013-01-23 Thread Ben Pfaff
On Wed, Jan 23, 2013 at 06:18:52PM +0100, Markus Schuster wrote:
> On Wednesday 23 January 2013 17:49:09 Ben Pfaff wrote:
> > It looks like you have each interface on an SLB bond connected to two
> > different switches.  
> 
> Correct. 
> 
> 
> > That's not a supported configuration and won't work reliably.
> 
> OK, any explanation why? Or to say it the other way: I can't see any  
> technical reason why it should work on the same switch but not on two 
> different? The MAC will jump between the ports the very same way and the VM 
> will suffer from the same connectivity issues, no matter what. 

Jesse, can you remind me why?  I can never remember.  This time, I'll
add it to the FAQ.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 1/4] ofproto-dpif: Scope revalidation state to dpif_backers.

2013-01-23 Thread Ethan Jackson
>  /* Check the consistency of a random facet, to aid debugging. */
> -if (!hmap_is_empty(&ofproto->facets) && !ofproto->need_revalidate) {
> +if (!hmap_is_empty(&ofproto->facets) && 
> ofproto->backer->need_revalidate) {

Reid Price pointed out off list that I lost a "!" when making this
change.  I've folded the fix into the patch.

Ethan


>  struct facet *facet;
>
>  facet = CONTAINER_OF(hmap_random_node(&ofproto->facets),
>   struct facet, hmap_node);
> -if (!tag_set_intersects(&ofproto->revalidate_set, facet->tags)) {
> +if (!tag_set_intersects(&ofproto->backer->revalidate_set,
> +facet->tags)) {
>  if (!facet_check_consistency(facet)) {
> -ofproto->need_revalidate = REV_INCONSISTENCY;
> +ofproto->backer->need_revalidate = REV_INCONSISTENCY;
>  }
>  }
>  }
> @@ -1396,7 +1406,7 @@ wait(struct ofproto *ofproto_)
>  if (ofproto->sflow) {
>  dpif_sflow_wait(ofproto->sflow);
>  }
> -if (!tag_set_is_empty(&ofproto->revalidate_set)) {
> +if (!tag_set_is_empty(&ofproto->backer->revalidate_set)) {
>  poll_immediate_wake();
>  }
>  HMAP_FOR_EACH (ofport, up.hmap_node, &ofproto->up.ports) {
> @@ -1410,7 +1420,7 @@ wait(struct ofproto *ofproto_)
>  }
>  mac_learning_wait(ofproto->ml);
>  stp_wait(ofproto);
> -if (ofproto->need_revalidate) {
> +if (ofproto->backer->need_revalidate) {
>  /* Shouldn't happen, but if it does just go around again. */
>  VLOG_DBG_RL(&rl, "need revalidate in ofproto_wait_cb()");
>  poll_immediate_wake();
> @@ -1511,7 +1521,7 @@ port_construct(struct ofport *port_)
>  struct dpif_port dpif_port;
>  int error;
>
> -ofproto->need_revalidate = REV_RECONFIGURE;
> +ofproto->backer->need_revalidate = REV_RECONFIGURE;
>  port->bundle = NULL;
>  port->cfm = NULL;
>  port->tag = tag_create_random();
> @@ -1567,7 +1577,7 @@ port_destruct(struct ofport *port_)
>
>  sset_find_and_delete(&ofproto->ports, devname);
>  hmap_remove(&ofproto->backer->odp_to_ofport_map, &port->odp_port_node);
> -ofproto->need_revalidate = REV_RECONFIGURE;
> +ofproto->backer->need_revalidate = REV_RECONFIGURE;
>  bundle_remove(port_);
>  set_cfm(port_, NULL);
>  if (ofproto->sflow) {
> @@ -1598,7 +1608,7 @@ port_reconfigured(struct ofport *port_, enum 
> ofputil_port_config old_config)
>  if (changed & (OFPUTIL_PC_NO_RECV | OFPUTIL_PC_NO_RECV_STP |
> OFPUTIL_PC_NO_FWD | OFPUTIL_PC_NO_FLOOD |
> OFPUTIL_PC_NO_PACKET_IN)) {
> -ofproto->need_revalidate = REV_RECONFIGURE;
> +ofproto->backer->need_revalidate = REV_RECONFIGURE;
>
>  if (changed & OFPUTIL_PC_NO_FLOOD && port->bundle) {
>  bundle_update(port->bundle);
> @@ -1621,13 +1631,13 @@ set_sflow(struct ofproto *ofproto_,
>  HMAP_FOR_EACH (ofport, up.hmap_node, &ofproto->up.ports) {
>  dpif_sflow_add_port(ds, &ofport->up, ofport->odp_port);
>  }
> -ofproto->need_revalidate = REV_RECONFIGURE;
> +ofproto->backer->need_revalidate = REV_RECONFIGURE;
>  }
>  dpif_sflow_set_options(ds, sflow_options);
>  } else {
>  if (ds) {
>  dpif_sflow_destroy(ds);
> -ofproto->need_revalidate = REV_RECONFIGURE;
> +ofproto->backer->need_revalidate = REV_RECONFIGURE;
>  ofproto->sflow = NULL;
>  }
>  }
> @@ -1647,7 +1657,7 @@ set_cfm(struct ofport *ofport_, const struct 
> cfm_settings *s)
>  struct ofproto_dpif *ofproto;
>
>  ofproto = ofproto_dpif_cast(ofport->up.ofproto);
> -ofproto->need_revalidate = REV_RECONFIGURE;
> +ofproto->backer->need_revalidate = REV_RECONFIGURE;
>  ofport->cfm = cfm_create(netdev_get_name(ofport->up.netdev));
>  }
>
> @@ -1735,7 +1745,7 @@ set_stp(struct ofproto *ofproto_, const struct 
> ofproto_stp_settings *s)
>
>  /* Only revalidate flows if the configuration changed. */
>  if (!s != !ofproto->stp) {
> -ofproto->need_revalidate = REV_RECONFIGURE;
> +ofproto->backer->need_revalidate = REV_RECONFIGURE;
>  }
>
>  if (s) {
> @@ -1803,12 +1813,13 @@ update_stp_port_state(struct ofport_dpif *ofport)
>  if (stp_learn_in_state(ofport->stp_state)
>  != stp_learn_in_state(state)) {
>  /* xxx Learning action flows should also be flushed. */
> -mac_learning_flush(ofproto->ml, &ofproto->revalidate_set);
> +mac_learning_flush(ofproto->ml,
> +   &ofproto->backer->revalidate_set);
>  }
>  fwd_change = stp_forward_in_state(ofport->stp_state)
>  != stp_forward_in_state(state);
>
> -ofproto->need_revalidate = REV_ST

[ovs-dev] [PATCH] bridge: Modify equal_pathnames function

2013-01-23 Thread Pavithra Ramesh
Modify equal_pathnames to return true when one string is
substring of the other(when a value smaller than SIZE_MAX
is supplied as stop_len)
---
 vswitchd/bridge.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 82c3bff..792a131 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -2751,6 +2751,10 @@ equal_pathnames(const char *a, const char *b, size_t 
b_stoplen)
 b++;
 }
 }
+if (*b == '\0') {
+/* b is a substring of a */
+   return true;
+}
 return false;
 }
 
-- 
1.7.0.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] bridge: Modify equal_pathnames function

2013-01-23 Thread Ben Pfaff
On Wed, Jan 23, 2013 at 01:22:03PM -0800, Pavithra Ramesh wrote:
> Modify equal_pathnames to return true when one string is
> substring of the other(when a value smaller than SIZE_MAX
> is supplied as stop_len)

I think this change is correct only when b_stoplen == strlen(b).
That's the case for the current caller but it's supposed to be more
general.

How about this:

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 82c3bff..dd3099f 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -2740,8 +2740,12 @@ static bool
 equal_pathnames(const char *a, const char *b, size_t b_stoplen)
 {
 const char *b_start = b;
-while (b - b_start < b_stoplen && *a == *b) {
-if (*a == '/') {
+for (;;) {
+if (b - b_start >= b_stoplen) {
+return true;
+} else if (*a != *b) {
+return false;
+} else if (*a == '/') {
 a += strspn(a, "/");
 b += strspn(b, "/");
 } else if (*a == '\0') {
@@ -2751,7 +2755,6 @@ equal_pathnames(const char *a, const char *b, size_t 
b_stoplen)
 b++;
 }
 }
-return false;
 }
 
 static void
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [BUG] broad-/multicast & SLB bonding -> FAIL

2013-01-23 Thread Jesse Gross
On Wed, Jan 23, 2013 at 12:56 PM, Ben Pfaff  wrote:
> On Wed, Jan 23, 2013 at 06:18:52PM +0100, Markus Schuster wrote:
>> On Wednesday 23 January 2013 17:49:09 Ben Pfaff wrote:
>> > It looks like you have each interface on an SLB bond connected to two
>> > different switches.
>>
>> Correct.
>>
>>
>> > That's not a supported configuration and won't work reliably.
>>
>> OK, any explanation why? Or to say it the other way: I can't see any
>> technical reason why it should work on the same switch but not on two
>> different? The MAC will jump between the ports the very same way and the VM
>> will suffer from the same connectivity issues, no matter what.
>
> Jesse, can you remind me why?  I can never remember.  This time, I'll
> add it to the FAQ.

It does seem risky given the other problems inherent in SLB bonding
but I can't actually think of a problem off the top of my head.  It's
been at least 3 years since I worked on that code though, so I don't
really remember if I said something before.

The behavior described seems odd though and not really related to the
upstream switch configuration.  Were there any modifications made to
the flow table?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] bridge: Modify equal_pathnames function

2013-01-23 Thread Pavithra Ramesh
Hi Ben,

Thanks for taking a look. This patch looks good.

Thanks,
Pavithra

- Original Message -
From: "Ben Pfaff" 
To: "Pavithra Ramesh" 
Cc: dev@openvswitch.org
Sent: Wednesday, January 23, 2013 1:29:49 PM
Subject: Re: [ovs-dev] [PATCH] bridge: Modify equal_pathnames function

On Wed, Jan 23, 2013 at 01:22:03PM -0800, Pavithra Ramesh wrote:
> Modify equal_pathnames to return true when one string is
> substring of the other(when a value smaller than SIZE_MAX
> is supplied as stop_len)

I think this change is correct only when b_stoplen == strlen(b).
That's the case for the current caller but it's supposed to be more
general.

How about this:

diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 82c3bff..dd3099f 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -2740,8 +2740,12 @@ static bool
 equal_pathnames(const char *a, const char *b, size_t b_stoplen)
 {
 const char *b_start = b;
-while (b - b_start < b_stoplen && *a == *b) {
-if (*a == '/') {
+for (;;) {
+if (b - b_start >= b_stoplen) {
+return true;
+} else if (*a != *b) {
+return false;
+} else if (*a == '/') {
 a += strspn(a, "/");
 b += strspn(b, "/");
 } else if (*a == '\0') {
@@ -2751,7 +2755,6 @@ equal_pathnames(const char *a, const char *b, size_t 
b_stoplen)
 b++;
 }
 }
-return false;
 }
 
 static void
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 3/4] ofproto-dpif: Implement patch ports in userspace.

2013-01-23 Thread Ed Maste
On 23 January 2013 15:51, Ethan Jackson  wrote:
> diff --git a/FAQ b/FAQ
> index ab1c1cc..a466ca4 100644
> --- a/FAQ
> +++ b/FAQ
> @@ -172,17 +172,24 @@ A: The kernel module in upstream Linux 3.3 and later 
> does not include
>   vSwitch distribution instead of the upstream Linux kernel
>   module.
>
> -   - Patch virtual ports, that is, interfaces with type "patch".
> - You can use Linux "veth" devices as a substitute.
> -
> - We don't have any plans to add patch ports upstream.
> -
>  Q: What features are not available when using the userspace datapath?
>
> -A: Tunnel and patch virtual ports are not supported, as described in the
> -   previous answer.  It is also not possible to use queue-related
> -   actions.  On Linux kernels before 2.6.39, maximum-sized VLAN packets
> -   may not be transmitted.
> +A: The kernel module in upstream Linux 3.3 and later does not include
> +   tunnel virtual ports, that is, interfaces with type "gre",
> +   "ipsec_gre", "gre64", "ipsec_gre64", "vxlan", or "capwap".  It is
> +   possible to create tunnels in Linux and attach them to Open vSwitch
> +   as system devices.  However, they cannot be dynamically created
> +   through the OVSDB protocol or set the tunnel ids as a flow action.
...

This new answer seems to be exclusively about Linux kernel modules,
which seems surprising for a question about the userspace datapath.
On FreeBSD (which uses only the userspace datapath) the new answer has
no meaning.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [BUG] broad-/multicast & SLB bonding -> FAIL

2013-01-23 Thread Markus Schuster
On Wednesday 23 January 2013 22:31:45 Jesse Gross wrote:
> The behavior described seems odd though and not really related to the
> upstream switch configuration.  Were there any modifications made to
> the flow table?

Not that I'm aware of - meaning we haven't modified it but I can't guarantee 
for XCP - it might do some magic. Any way to double check that?

Regards,
Markus
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] bridge: Modify equal_pathnames function

2013-01-23 Thread Ben Pfaff
I applied it to master, thanks.

On Wed, Jan 23, 2013 at 02:00:14PM -0800, Pavithra Ramesh wrote:
> Hi Ben,
> 
> Thanks for taking a look. This patch looks good.
> 
> Thanks,
> Pavithra
> 
> - Original Message -
> From: "Ben Pfaff" 
> To: "Pavithra Ramesh" 
> Cc: dev@openvswitch.org
> Sent: Wednesday, January 23, 2013 1:29:49 PM
> Subject: Re: [ovs-dev] [PATCH] bridge: Modify equal_pathnames function
> 
> On Wed, Jan 23, 2013 at 01:22:03PM -0800, Pavithra Ramesh wrote:
> > Modify equal_pathnames to return true when one string is
> > substring of the other(when a value smaller than SIZE_MAX
> > is supplied as stop_len)
> 
> I think this change is correct only when b_stoplen == strlen(b).
> That's the case for the current caller but it's supposed to be more
> general.
> 
> How about this:
> 
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index 82c3bff..dd3099f 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -2740,8 +2740,12 @@ static bool
>  equal_pathnames(const char *a, const char *b, size_t b_stoplen)
>  {
>  const char *b_start = b;
> -while (b - b_start < b_stoplen && *a == *b) {
> -if (*a == '/') {
> +for (;;) {
> +if (b - b_start >= b_stoplen) {
> +return true;
> +} else if (*a != *b) {
> +return false;
> +} else if (*a == '/') {
>  a += strspn(a, "/");
>  b += strspn(b, "/");
>  } else if (*a == '\0') {
> @@ -2751,7 +2755,6 @@ equal_pathnames(const char *a, const char *b, size_t 
> b_stoplen)
>  b++;
>  }
>  }
> -return false;
>  }
>  
>  static void
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 3/4] ofproto-dpif: Implement patch ports in userspace.

2013-01-23 Thread Ben Pfaff
On Wed, Jan 23, 2013 at 05:10:54PM -0500, Ed Maste wrote:
> On 23 January 2013 15:51, Ethan Jackson  wrote:
> > diff --git a/FAQ b/FAQ
> > index ab1c1cc..a466ca4 100644
> > --- a/FAQ
> > +++ b/FAQ
> > @@ -172,17 +172,24 @@ A: The kernel module in upstream Linux 3.3 and later 
> > does not include
> >   vSwitch distribution instead of the upstream Linux kernel
> >   module.
> >
> > -   - Patch virtual ports, that is, interfaces with type "patch".
> > - You can use Linux "veth" devices as a substitute.
> > -
> > - We don't have any plans to add patch ports upstream.
> > -
> >  Q: What features are not available when using the userspace datapath?
> >
> > -A: Tunnel and patch virtual ports are not supported, as described in the
> > -   previous answer.  It is also not possible to use queue-related
> > -   actions.  On Linux kernels before 2.6.39, maximum-sized VLAN packets
> > -   may not be transmitted.
> > +A: The kernel module in upstream Linux 3.3 and later does not include
> > +   tunnel virtual ports, that is, interfaces with type "gre",
> > +   "ipsec_gre", "gre64", "ipsec_gre64", "vxlan", or "capwap".  It is
> > +   possible to create tunnels in Linux and attach them to Open vSwitch
> > +   as system devices.  However, they cannot be dynamically created
> > +   through the OVSDB protocol or set the tunnel ids as a flow action.
> ...
> 
> This new answer seems to be exclusively about Linux kernel modules,
> which seems surprising for a question about the userspace datapath.
> On FreeBSD (which uses only the userspace datapath) the new answer has
> no meaning.

I think something got mixed up.  Here's the change that I expected to
see here:

--
diff --git a/FAQ b/FAQ
index ab1c1cc..bd4d37e 100644
--- a/FAQ
+++ b/FAQ
@@ -158,28 +158,25 @@ Q: What features are not available in the Open vSwitch 
kernel datapath
that ships as part of the upstream Linux kernel?
 
 A: The kernel module in upstream Linux 3.3 and later does not include
-   the following features:
-
-   - Tunnel virtual ports, that is, interfaces with type "gre",
- "ipsec_gre", "capwap".  It is possible to create tunnels in
- Linux and attach them to Open vSwitch as system devices.
- However, they cannot be dynamically created through the OVSDB
- protocol or set the tunnel ids as a flow action.
-
- Work is in progress in adding these features to the upstream
- Linux version of the Open vSwitch kernel module.  For now, if
- you need these features, use the kernel module from the Open
- vSwitch distribution instead of the upstream Linux kernel
- module.
-
-   - Patch virtual ports, that is, interfaces with type "patch".
- You can use Linux "veth" devices as a substitute.
-
- We don't have any plans to add patch ports upstream.
+   tunnel virtual ports, that is, interfaces with type "gre",
+   "ipsec_gre", "gre64", "ipsec_gre64", "vxlan", or "capwap".  It is
+   possible to create tunnels in Linux and attach them to Open vSwitch
+   as system devices.  However, they cannot be dynamically created
+   through the OVSDB protocol or set the tunnel ids as a flow action.
+
+   Work is in progress in adding tunnel virtual ports to the upstream
+   Linux version of the Open vSwitch kernel module.  For now, if you
+   need these features, use the kernel module from the Open vSwitch
+   distribution instead of the upstream Linux kernel module.
+
+   The upstream kernel module does not include patch ports, but this
+   only matters for Open vSwitch 1.9 and earlier, because Open vSwitch
+   1.10 and later implement patch ports without using this kernel
+   feature.
 
 Q: What features are not available when using the userspace datapath?
 
-A: Tunnel and patch virtual ports are not supported, as described in the
+A: Tunnel virtual ports are not supported, as described in the
previous answer.  It is also not possible to use queue-related
actions.  On Linux kernels before 2.6.39, maximum-sized VLAN packets
may not be transmitted.
--

which yields a FAQ excerpt that looks like: 

--
Q: What features are not available in the Open vSwitch kernel datapath
   that ships as part of the upstream Linux kernel?

A: The kernel module in upstream Linux 3.3 and later does not include
   tunnel virtual ports, that is, interfaces with type "gre",
   "ipsec_gre", "gre64", "ipsec_gre64", "vxlan", or "capwap".  It is
   possible to create tunnels in Linux and attach them to Open vSwitch
   as system devices.  However, they cannot be dynamically created
   through the OVSDB protocol or set the tunnel ids as a flow action.

   Work is in progress in adding tunnel virtual ports to the upstream
   Linux version of the Open vSwitch kernel mo

Re: [ovs-dev] [PATCH] FAQ: Add QoS section.

2013-01-23 Thread Ben Pfaff
I found one typo in testing:

diff --git a/FAQ b/FAQ
index ab4338e..72a1479 100644
--- a/FAQ
+++ b/FAQ
@@ -472,7 +472,7 @@ A: Suppose that you want to set up bridge br0 connected to 
physical
   add-port br0 vif1.0 -- set interface vif1.0 ofport_request=5 -- \
   add-port br0 vif2.0 -- set interface vif2.0 ofport_request=6 -- \
   set port eth0 qos=@newqos -- \
-  --id=@newqos create qos type=linus-htb \
+  --id=@newqos create qos type=linux-htb \
other-config:max-rate=10 \
   queues:123=@vif10queue \
   queues:234=@vif20queue -- \

With that fixed, I applied this to master.

On Tue, Jan 22, 2013 at 10:20:28AM -0800, Ben Pfaff wrote:
> Thanks.  I think I ought to test it before I commit it, so it probably
> won't go in before afternoon.
> 
> On Tue, Jan 22, 2013 at 10:17:16AM -0800, Justin Pettit wrote:
> > Looks good.  Thanks!
> > 
> > --Justin
> > 
> > 
> > On Jan 22, 2013, at 10:15 AM, Ben Pfaff  wrote:
> > 
> > > On Tue, Jan 22, 2013 at 10:06:12AM -0800, Justin Pettit wrote:
> > >> I hope you had this section queued up for review, and not that you
> > >> whipped it up in the minute since you responded to that QoS question
> > >> on ovs-discuss.  I already feel inadequate enough.
> > > 
> > > It took me about half an hour to write it this morning.  I hope that it
> > > will save me more than half an hour in 2013.
> > > 
> > >>> +A: Suppose that you want to set up bridge br0 connected to physical
> > >>> +   Ethernet port eth0 (a 1 Gbps device) and virtual machine interface
> > >> 
> > >> s/interface/interfaces/
> > > 
> > > Thanks, fixed.
> > > 
> > >>> +   vif1.0 and vif2.0, and that you want to limit traffic from vif1.0
> > >>> +   to eth0 to 10 Mbps and from vif2.0 to eth0 to 20 Mbps.  Then, you
> > >>> +   could configure the bridge this way:
> > >>> +
> > >>> +   ovs-vsctl -- \
> > >>> +   add-br br0 -- \
> > >>> +  add-port br0 eth0 -- \
> > >>> +  add-port br0 vif1.0 -- set interface vif1.0 ofport_request=1 
> > >>> -- \
> > >>> +  add-port br0 vif2.0 -- set interface vif2.0 ofport_request=2 
> > >>> -- \
> > >> 
> > >> I haven't run the commands either, but I think these ports you
> > >> requested may have been given to br0 and eth0 already, since we try to
> > >> allocate from 1.  (If you update these ones, there are a couple of
> > >> places later on that also reference these OpenFlow port numbers.)
> > > 
> > > I would think that explicit requests should take precedence over
> > > defaults, but I think the actual implementation may be weak in that
> > > regard.
> > > 
> > > I changed them to 5 and 6.  (I'm trying to use numbers that are not
> > > close to 1.0 or 2.0 (or 10 or 20) to keep readers from thinking that the
> > > actual numbers are significant.)
> > > 
> > > Here's a revised version, still not tested.
> > > 
> > > --8<--cut here-->8--
> > > 
> > > From: Ben Pfaff 
> > > Date: Tue, 22 Jan 2013 10:14:13 -0800
> > > Subject: [PATCH] FAQ: Add QoS section.
> > > 
> > > Signed-off-by: Ben Pfaff 
> > > ---
> > > FAQ |  104 
> > > +++
> > > 1 file changed, 104 insertions(+)
> > > 
> > > diff --git a/FAQ b/FAQ
> > > index ab1c1cc..ab4338e 100644
> > > --- a/FAQ
> > > +++ b/FAQ
> > > @@ -455,6 +455,110 @@ A: In version 1.9.0, OVS switched to using a single 
> > > datapath that is
> > >commands provide similar functionality that is scoped by the bridge.
> > > 
> > > 
> > > +Quality of Service (QoS)
> > > +
> > > +
> > > +Q: How do I configure Quality of Service (QoS)?
> > > +
> > > +A: Suppose that you want to set up bridge br0 connected to physical
> > > +   Ethernet port eth0 (a 1 Gbps device) and virtual machine interfaces
> > > +   vif1.0 and vif2.0, and that you want to limit traffic from vif1.0
> > > +   to eth0 to 10 Mbps and from vif2.0 to eth0 to 20 Mbps.  Then, you
> > > +   could configure the bridge this way:
> > > +
> > > +   ovs-vsctl -- \
> > > +   add-br br0 -- \
> > > +add-port br0 eth0 -- \
> > > +add-port br0 vif1.0 -- set interface vif1.0 ofport_request=5 -- \
> > > +add-port br0 vif2.0 -- set interface vif2.0 ofport_request=6 -- \
> > > +set port eth0 qos=@newqos -- \
> > > +--id=@newqos create qos type=linus-htb \
> > > +   other-config:max-rate=10 \
> > > +queues:123=@vif10queue \
> > > +queues:234=@vif20queue -- \
> > > +   --id=@vif10queue create queue other-config:max-rate=1000 
> > > -- \
> > > +   --id=@vif20queue create queue other-config:max-rate=2000
> > > +
> > > +   At this point, bridge br0 is configured with the ports and eth0 is
> > > +   configured with the queues that you need for QoS, but nothing is
> > > +   actually directing packets from vif1.0 or vif2.0 to the queues that
> > > +   we have set up for 

Re: [ovs-dev] [patch_ports (rebased) 1/4] ofproto-dpif: Scope revalidation state to dpif_backers.

2013-01-23 Thread Ben Pfaff
On Wed, Jan 23, 2013 at 12:51:10PM -0800, Ethan Jackson wrote:
> Before this patch, the "need_revalidate" flag and the
> "revalidate_set" tag_set where maintained separately for each
> ofproto.  This won't work in future patches when a flow table
> change in one ofproto can require revalidation in another entirely
> separate ofproto.  For this reason, this patch scopes these flags
> to the dpif_backers.
> 
> Signed-off-by: Ethan Jackson 

The revalidation logic in type_run() still looks wrong.  The previous
version of the code:

1. Saves copies of revalidate_set and need_revalidate.
2. Clears revalidate_set and need_revalidate.
3. Uses the copies to revalidate.

The new version appears to:

1. Clear revalidate_set and need_revalidate.
2. Use the cleared original versions to revalidate.

Otherwise this looks good (modulo what Reid noticed), thanks.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 2/4] ofproto-dpif: Refresh stats before ovs-appctl dpif/dump-flows.

2013-01-23 Thread Ben Pfaff
On Wed, Jan 23, 2013 at 12:51:11PM -0800, Ethan Jackson wrote:
> As a matter of convenience, this patch refreshes the statistics
> when ovs-appctl dpif/dump-flows is called.  Hopefully this will
> prevent users from being confused because statistics they were
> expecting to see haven't made it from the datapath into the switch
> yet.
> 
> Signed-off-by: Ethan Jackson 

Still looks good, thanks.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 3/4] ofproto-dpif: Implement patch ports in userspace.

2013-01-23 Thread Ed Maste
On 23 January 2013 18:40, Ben Pfaff  wrote:
> I think something got mixed up.  Here's the change that I expected to
> see here:
>
> --
> diff --git a/FAQ b/FAQ
> index ab1c1cc..bd4d37e 100644
> --- a/FAQ
> +++ b/FAQ
> @@ -158,28 +158,25 @@ Q: What features are not available in the Open vSwitch 
> kernel datapath
> that ships as part of the upstream Linux kernel?
>
>  A: The kernel module in upstream Linux 3.3 and later does not include
> -   the following features:
> -
> -   - Tunnel virtual ports, that is, interfaces with type "gre",
> - "ipsec_gre", "capwap".  It is possible to create tunnels in
> - Linux and attach them to Open vSwitch as system devices.
> - However, they cannot be dynamically created through the OVSDB
> - protocol or set the tunnel ids as a flow action.
> -
> - Work is in progress in adding these features to the upstream
> - Linux version of the Open vSwitch kernel module.  For now, if
> - you need these features, use the kernel module from the Open
> - vSwitch distribution instead of the upstream Linux kernel
> - module.
> -
> -   - Patch virtual ports, that is, interfaces with type "patch".
> - You can use Linux "veth" devices as a substitute.
> -
> - We don't have any plans to add patch ports upstream.
> +   tunnel virtual ports, that is, interfaces with type "gre",
> +   "ipsec_gre", "gre64", "ipsec_gre64", "vxlan", or "capwap".  It is
> +   possible to create tunnels in Linux and attach them to Open vSwitch
> +   as system devices.  However, they cannot be dynamically created
> +   through the OVSDB protocol or set the tunnel ids as a flow action.
> +
> +   Work is in progress in adding tunnel virtual ports to the upstream
> +   Linux version of the Open vSwitch kernel module.  For now, if you
> +   need these features, use the kernel module from the Open vSwitch
> +   distribution instead of the upstream Linux kernel module.
> +
> +   The upstream kernel module does not include patch ports, but this
> +   only matters for Open vSwitch 1.9 and earlier, because Open vSwitch
> +   1.10 and later implement patch ports without using this kernel
> +   feature.
>
>  Q: What features are not available when using the userspace datapath?
>
> -A: Tunnel and patch virtual ports are not supported, as described in the
> +A: Tunnel virtual ports are not supported, as described in the
> previous answer.  It is also not possible to use queue-related
> actions.  On Linux kernels before 2.6.39, maximum-sized VLAN packets
> may not be transmitted.

Aha, that makes much more sense, thanks.

> I can believe that we still need clarification here (and elsewhere?)
> in the FAQ to make sure that everything applies properly to FreeBSD.
> I'd welcome improvements.

I'll see if I can come up with a suitable patch for the FAQ.  I went
over INSTALL in the course of getting the FreeBSD patch ready but
didn't really spend any time on the FAQ.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [BUG] broad-/multicast & SLB bonding -> FAIL

2013-01-23 Thread Jesse Gross
On Wed, Jan 23, 2013 at 2:42 PM, Markus Schuster
 wrote:
> On Wednesday 23 January 2013 22:31:45 Jesse Gross wrote:
>> The behavior described seems odd though and not really related to the
>> upstream switch configuration.  Were there any modifications made to
>> the flow table?
>
> Not that I'm aware of - meaning we haven't modified it but I can't guarantee
> for XCP - it might do some magic. Any way to double check that?

I don't think it should but you can double check using:
ovs-ofctl dump-flows 

There should be one flow with an action of NORMAL.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 3/4] ofproto-dpif: Implement patch ports in userspace.

2013-01-23 Thread Ben Pfaff
On Wed, Jan 23, 2013 at 12:51:12PM -0800, Ethan Jackson wrote:
> This commit moves responsibility for implementing patch ports from
> the datapath to ofproto-dpif.  There are two main reasons to do
> this.
> 
> The first is a matter of design:  ofproto-dpif both has more
> information than the datapath, and is better suited to handle the
> complexity required to implement patch ports.
> 
> The second is performance.  My setup is a virtual machine with two
> basic learning bridges connected by patch ports.  I used
> ovs-benchmark to ping the virtual router IP residing outside the
> VM.  Over a 60 second run, "ovs-benchmark rate" improves from
> 14618.1 to 19311.9 transactions per second, or a 32% improvement.
> Similarly, "ovs-benchmark latency" improves from 6ms to 4ms.
> 
> Signed-off-by: Ethan Jackson 

set_patch_config() checks that the peer name is less than IFNAMSIZ
bytes but I don't know why we care.

compose_output_action__() seems to treat patch ports almost like
resubmits but a little too faithfully: it saves and restores the
in_port but not other flow information like the VLAN.

Otherwise this looks good and the performance boost should be awesome.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 4/4] netdev-dummy: Test LACP negotiation.

2013-01-23 Thread Ben Pfaff
On Wed, Jan 23, 2013 at 12:51:13PM -0800, Ethan Jackson wrote:
> From: Ben Pfaff 
> 
> Signed-off-by: Ben Pfaff 
> Signed-off-by: Ethan Jackson 

As long as it passes reliably for you I'm happy with this, thanks.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 1/4] ofproto-dpif: Scope revalidation state to dpif_backers.

2013-01-23 Thread Ethan Jackson
> 1. Saves copies of revalidate_set and need_revalidate.
> 2. Clears revalidate_set and need_revalidate.
> 3. Uses the copies to revalidate.
>
> The new version appears to:
>
> 1. Clear revalidate_set and need_revalidate.
> 2. Use the cleared original versions to revalidate.

Oops, I've folded a fix in, thanks for the review.

Ethan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 3/4] ofproto-dpif: Implement patch ports in userspace.

2013-01-23 Thread Ethan Jackson
> set_patch_config() checks that the peer name is less than IFNAMSIZ
> bytes but I don't know why we care.

Good point, this is leftover from when patch ports were linux specific.  Which
has me thinking, at some point we'll need to start building patch ports (and
possibly tunnels) on other platforms.  Soon, they will have no linux specific
code.

An incremental follows.

---
 lib/netdev-vport.c |5 -
 ofproto/ofproto-dpif.c |4 ++--
 2 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 4054ee0..cd6ae54 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -820,11 +820,6 @@ set_patch_config(struct netdev_dev *dev_, const struct 
smap *args)
 return EINVAL;
 }
 
-if (strlen(peer) >= IFNAMSIZ) {
-VLOG_ERR("%s: patch 'peer' arg too long", name);
-return EINVAL;
-}
-
 if (!strcmp(name, peer)) {
 VLOG_ERR("%s: patch peer must not be self", name);
 return EINVAL;
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 2de2964..d5bd0b4 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -5521,7 +5521,7 @@ compose_output_action__(struct action_xlate_ctx *ctx, 
uint16_t ofp_port,
 
 if (netdev_vport_is_patch(ofport->up.netdev)) {
 struct ofport_dpif *peer = ofport_get_peer(ofport);
-uint16_t old_in_port = ctx->flow.in_port;
+struct flow old_flow = ctx->flow;
 const struct ofproto_dpif *peer_ofproto;
 
 if (!peer) {
@@ -5538,7 +5538,7 @@ compose_output_action__(struct action_xlate_ctx *ctx, 
uint16_t ofp_port,
 ctx->ofproto = ofproto_dpif_cast(peer->up.ofproto);
 ctx->flow.in_port = peer->up.ofp_port;
 xlate_table_action(ctx, ctx->flow.in_port, 0, true);
-ctx->flow.in_port = old_in_port;
+ctx->flow = old_flow;
 ctx->ofproto = ofproto_dpif_cast(ofport->up.ofproto);
 
 if (ctx->resubmit_stats) {
-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 3/4] ofproto-dpif: Implement patch ports in userspace.

2013-01-23 Thread Ethan Jackson
> --
> Q: What features are not available in the Open vSwitch kernel datapath
>that ships as part of the upstream Linux kernel?
>
> A: The kernel module in upstream Linux 3.3 and later does not include
>tunnel virtual ports, that is, interfaces with type "gre",
>"ipsec_gre", "gre64", "ipsec_gre64", "vxlan", or "capwap".  It is
>possible to create tunnels in Linux and attach them to Open vSwitch
>as system devices.  However, they cannot be dynamically created
>through the OVSDB protocol or set the tunnel ids as a flow action.
>
>Work is in progress in adding tunnel virtual ports to the upstream
>Linux version of the Open vSwitch kernel module.  For now, if you
>need these features, use the kernel module from the Open vSwitch
>distribution instead of the upstream Linux kernel module.
>
>The upstream kernel module does not include patch ports, but this
>only matters for Open vSwitch 1.9 and earlier, because Open vSwitch
>1.10 and later implement patch ports without using this kernel
>feature.
>
> Q: What features are not available when using the userspace datapath?
>
> A: Tunnel virtual ports are not supported, as described in the
>previous answer.  It is also not possible to use queue-related
>actions.  On Linux kernels before 2.6.39, maximum-sized VLAN packets
>may not be transmitted.
> --

Also, sorry about the confusion.  I've changed the FAQ to be precisely this.

Ethan
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 01/16] User-Space MPLS actions and matches

2013-01-23 Thread Simon Horman
On Tue, Jan 22, 2013 at 10:47:17AM -0800, Ben Pfaff wrote:
> On Fri, Jan 18, 2013 at 09:27:55AM +0900, Simon Horman wrote:
> > This patch implements use-space datapath and non-datapath code
> > to match and use the datapath API set out in Leo Alterman's patch
> > "user-space datapath: Add basic MPLS support to kernel".
> 
> I'm resuming my review starting from packets.c.
> 
> eth_mpls_depth() seems poorly documented to me.  The comment makes an
> incorrect claim about packet->l2, which the function does not use, and
> does not mention packet->l2_5, which the function does use.

Thanks.

I will unify the comment for eth_mpls_depth() with that of eth_push_vlan().
My proposed text is:

/* Return depth of mpls stack.
 *
 * 'packet->l2_5' should initially point to 'packet''s outer-most MPLS header
 * or may be NULL if there are no MPLS headers. */

> In get_ethtype_ptr(), the expression in the "return" looks wrong:
> if (eh->eth_type == htons(ETH_TYPE_VLAN)) {
> return (ovs_be16 *)((char *)(packet->l2_5 ? packet->l2_5 : 
> packet->l3)) - 2;
> because casts have higher precedence than subtraction, so I believe this
> will subtract 2*sizeof(uint16) == 4 bytes from the beginning of the L2.5
> or L3 header, which is wrong.  Do any tests cover this?

Thanks, nice catch.

It turns out that you are correct. And that I overlooked exercising this.

I have corrected the problem by moving the final ')' to after the 2 as
follows:

 return (ovs_be16 *)((char *)(packet->l2_5 ? packet->l2_5 : packet->l3) 
- 2);

And I have verified both that the existing code is broken and the new code
is not by adding a test to ofproto-dpif.at which uses a push_mpls() action
on a packet with a VLAN tag. It is in keeping with other mpls tests that
this series adds to that file.

> I believe that the correct name for an MPLS header is a "label", not a
> "tag".  It would be nice to get this right, although certainly it's not
> a huge deal.

Thanks. Referring to wikipedia[1] it seems that an MPLS label is one
component of a MPLS label stack entry (LSE) which is in turn a component of
the MPLS label stack.

I'll do a search and replace pass over the entire patch set to clean things up.

[1] http://en.wikipedia.org/wiki/Multiprotocol_Label_Switching#MPLS_operation

> set_mpls_lse_values() has ;; at the end of one line.

Fixed.

> push_mpls_lse() isn't documented as putting the new MPLS label before or
> after the existing labels, but it appears to me that it puts the new
> label after the existing labels.  If I'm right about that, it's
> surprising; one would ordinarily expect a new header to be an outermost
> one, not an innermost one.

I believe that it pushes the new MPLS stack entry onto the MPLS stack
before any existing entries.

The memmove() moves everything before l2_5, that is, everything before the
current MPLS label stack. And then (memcpy) puts the new MPLS label stack entry
immediately after the data moved by memove(), still before the existing
MPLS label stack entries.

I have updated the comment above push_mpls_lse() to state its intentions
more clearly:

/* Push an MPLS stack entry onto the MPLS stack before any existing entries
 * and adjust packet->l2 and packet->l2.5 accordingly. */

I also note that push_mpls_lse() makes the following assumptions.

* sizeof *mh == MPLS_HLEN
  This is true.
* packet->l2 == packet->data
  Is this always true? I am unsure.
* packet->l2_5 is set to a value after packet->l2.
  This is not true in at least the following case:
  The packet arrives via flow_compose() with an MPLS ethernet type,
  in which case packet->l2_5 is NULL.

I plan to resolve the problem raised in the last point as follows.
As far as I can see push_mpls() is the only caller of push_mpls_lse().

diff --git a/lib/packets.c b/lib/packets.c
index 190803e..5d9a44e 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -360,6 +361,9 @@ push_mpls(struct ofpbuf *packet, ovs_be16 ethtype, ovs_be32 
lse)
 if (!eth_type_mpls(eth_type)) {
 /* Set ethtype and mpls label stack entry. */
 set_ethertype(packet, ethtype);
+}
+/* l2_5 may not already be set if this is the first mpls_push. */
+if (!packet->l2_5) {
 packet->l2_5 = packet->l3;
 }
 

I have added a test to tests/ofproto-dpif.at to exercise both the order
of MPLS label stack entry insertion and the l2_5 NULL problem described
immediately above. It was in the course of adding the test that I found
that problem. The test performs an mpls_push action on an MPLS frame.

I have also added a test to tests/ofproto-dpif.at to exercise this and the
NULL l2_5 problem described below.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [patch_ports (rebased) 3/4] ofproto-dpif: Implement patch ports in userspace.

2013-01-23 Thread Ben Pfaff
On Wed, Jan 23, 2013 at 06:05:29PM -0800, Ethan Jackson wrote:
> > set_patch_config() checks that the peer name is less than IFNAMSIZ
> > bytes but I don't know why we care.
> 
> Good point, this is leftover from when patch ports were linux specific.  Which
> has me thinking, at some point we'll need to start building patch ports (and
> possibly tunnels) on other platforms.  Soon, they will have no linux specific
> code.
> 
> An incremental follows.

Thanks, that looks good.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 01/16] User-Space MPLS actions and matches

2013-01-23 Thread Andre Khan

Hi Simon,

Regarding this part in your mail:

   >> push_mpls_lse() isn't documented as putting the new MPLS label before or
   >> after the existing labels, but it appears to me that it puts the new
   >> label after the existing labels.  If I'm right about that, it's
   >> surprising; one would ordinarily expect a new header to be an outermost
   >> one, not an innermost one.

   >I believe that it pushes the new MPLS stack entry onto the MPLS stack
   >before any existing entries.

The newly pushed MPLS label is the "outermost" label (which is what I believe 
Ben alluded to). 
Typically this label is for the benefit of the MPLS Nexthop to which it has 
local significance.   

I just started to scan this list, so forgive me if I lack the full context of 
this thread and missed something... besides it is getting late here :)

Thanks,
-Andre

PS: I would suggest that that location of the label be signified with the words 
"inner/outer" rather than "before/after" as it removes any ambiguity... not to 
mention that is the MPLS speak :) 


- Original Message -
From: "Simon Horman" 
To: "Ben Pfaff" 
Cc: dev@openvswitch.org, "Isaku Yamahata" , "Ravi K" 

Sent: Wednesday, January 23, 2013 9:43:58 PM
Subject: Re: [ovs-dev] [PATCH 01/16] User-Space MPLS actions and matches

On Tue, Jan 22, 2013 at 10:47:17AM -0800, Ben Pfaff wrote:
> On Fri, Jan 18, 2013 at 09:27:55AM +0900, Simon Horman wrote:
> > This patch implements use-space datapath and non-datapath code
> > to match and use the datapath API set out in Leo Alterman's patch
> > "user-space datapath: Add basic MPLS support to kernel".
> 
> I'm resuming my review starting from packets.c.
> 
> eth_mpls_depth() seems poorly documented to me.  The comment makes an
> incorrect claim about packet->l2, which the function does not use, and
> does not mention packet->l2_5, which the function does use.

Thanks.

I will unify the comment for eth_mpls_depth() with that of eth_push_vlan().
My proposed text is:

/* Return depth of mpls stack.
 *
 * 'packet->l2_5' should initially point to 'packet''s outer-most MPLS header
 * or may be NULL if there are no MPLS headers. */

> In get_ethtype_ptr(), the expression in the "return" looks wrong:
> if (eh->eth_type == htons(ETH_TYPE_VLAN)) {
> return (ovs_be16 *)((char *)(packet->l2_5 ? packet->l2_5 : 
> packet->l3)) - 2;
> because casts have higher precedence than subtraction, so I believe this
> will subtract 2*sizeof(uint16) == 4 bytes from the beginning of the L2.5
> or L3 header, which is wrong.  Do any tests cover this?

Thanks, nice catch.

It turns out that you are correct. And that I overlooked exercising this.

I have corrected the problem by moving the final ')' to after the 2 as
follows:

 return (ovs_be16 *)((char *)(packet->l2_5 ? packet->l2_5 : packet->l3) 
- 2);

And I have verified both that the existing code is broken and the new code
is not by adding a test to ofproto-dpif.at which uses a push_mpls() action
on a packet with a VLAN tag. It is in keeping with other mpls tests that
this series adds to that file.

> I believe that the correct name for an MPLS header is a "label", not a
> "tag".  It would be nice to get this right, although certainly it's not
> a huge deal.

Thanks. Referring to wikipedia[1] it seems that an MPLS label is one
component of a MPLS label stack entry (LSE) which is in turn a component of
the MPLS label stack.

I'll do a search and replace pass over the entire patch set to clean things up.

[1] http://en.wikipedia.org/wiki/Multiprotocol_Label_Switching#MPLS_operation

> set_mpls_lse_values() has ;; at the end of one line.

Fixed.

> push_mpls_lse() isn't documented as putting the new MPLS label before or
> after the existing labels, but it appears to me that it puts the new
> label after the existing labels.  If I'm right about that, it's
> surprising; one would ordinarily expect a new header to be an outermost
> one, not an innermost one.

I believe that it pushes the new MPLS stack entry onto the MPLS stack
before any existing entries.

The memmove() moves everything before l2_5, that is, everything before the
current MPLS label stack. And then (memcpy) puts the new MPLS label stack entry
immediately after the data moved by memove(), still before the existing
MPLS label stack entries.

I have updated the comment above push_mpls_lse() to state its intentions
more clearly:

/* Push an MPLS stack entry onto the MPLS stack before any existing entries
 * and adjust packet->l2 and packet->l2.5 accordingly. */

I also note that push_mpls_lse() makes the following assumptions.

* sizeof *mh == MPLS_HLEN
  This is true.
* packet->l2 == packet->data
  Is this always true? I am unsure.
* packet->l2_5 is set to a value after packet->l2.
  This is not true in at least the following case:
  The packet arrives via flow_compose() with an MPLS ethernet type,
  in which case packet->l2_5 is NULL.

I plan to resolve the problem raised in the last point as follows.
As far as I 

Re: [ovs-dev] [PATCH 01/16] User-Space MPLS actions and matches

2013-01-23 Thread Simon Horman
On Tue, Jan 22, 2013 at 03:41:37PM -0800, Ben Pfaff wrote:
> On Tue, Jan 22, 2013 at 10:47:17AM -0800, Ben Pfaff wrote:
> > On Fri, Jan 18, 2013 at 09:27:55AM +0900, Simon Horman wrote:
> > > This patch implements use-space datapath and non-datapath code
> > > to match and use the datapath API set out in Leo Alterman's patch
> > > "user-space datapath: Add basic MPLS support to kernel".
> > 
> > I'm still reading, I'll continue later.
> 
> Here I go.
> 
> I noticed just now that the code added to execute_set_action() in
> dpif-netdev.c is indented four spaces too far.
> 
> I get one build failure:
> cc1: warnings being treated as errors
> ../lib/ofp-actions.c: In function ‘ofpact_from_nxast’:
> ../lib/ofp-actions.c:407: error: initialization from incompatible pointer 
> type
> on this line:
> struct nx_action_push_mpls *nxapm = (struct nx_action_push *)a;

Sorry for letting that slip through.

Somehow -Werror wasn't enabled. It is now.

> In packets.c, a lot of the new functions check that the packet is at
> least sizeof(struct eth_header) bytes long.  I think that we should
> simply make this a requirement and not bother checking for it.  There
> shouldn't be any code passing in packets that are not at least 14
> bytes long.  (Or if you know of any, let's fix them.)

Sure, I have made it so.
I'm not aware of any cases where this requirement is not met.

> push_mpls() is unnecessarily confusing because it has variables named
> 'ethtype' and 'eth_type'.  I'd drop the latter entirely and rewrite
> if (!eth_type_mpls(eth_type)) {
> as
> if (!eth_type_mpls(get_ethertype(packet))) {

Agreed, it was not 5 minutes ago when this most recently confused me.

> pop_mpls() has the same variable name confusion, please fix it there
> too.

Done.

> I think that we should try to maintain an invariant that l2_5 is
> nonnull if and only if the ethertype is an mpls ethertype.  Then we
> could simplify some of the code here by checking only for l2_5 !=
> NULL and not bothering with ethertypes.  Actually it looks like we
> already maintain this invariant.

Actually, I believe that is not true. As explained in my previous email
I believe that in the case where compose_flow() calls push_mpls()
for an MPLS frame then l2_5 is not set although the ethtype is an MPLS
ethtype. In my previous email I proposed a fix by having push_mpls()
ensure that l2_5 is set.

That said, I think it is easy enough to switch things around so that
invariant is true and make the simplification you suggest. I will see about
doing so.


> eth_mpls_depth() should take a "const" parameter.

Done.

> struct mpls_eth_header isn't used anywhere.  Do you think it is
> useful?

No, I don't think it is useful. I have removed it.

> IP_DSCP doesn't properly parenthesize its argument.  Also it isn't
> used anywhere.

Thanks, I have removed it.

> IP6_VERSION and IP6_VER aren't used anywhere.

Also removed.

> Why does compose_output_action__() save and restore mpls_lse and
> mpls_depth?  It saves nw_tos and vlan_tci because the output process
> might need to temporarily change those, for output ports with
> configured priorities and for VLAN splinters, respectively, but I
> don't know of anything similar for MPLS.
> 

This is most likely because I didn't understand the motivation for saving
nw_tos and vlan_tci, and decided to follow the same pattern for MPLS.
I will remove the offending code.

> Most of the functions called by do_xlate_actions() have "execute" or
> "xlate" in their names, so perhaps compose_mpls_push_action() and
> compose_mpls_pop_action() should follow this pattern.

Sure, I will s/compose_/execute_/

> In compose_mpls_push_action(), for the case where there's already an
> MPLS header, do you know why we're getting the LSE from base_flow?
> I'd think we could just keep the current one.

Yes, true. I will remove the following line:

ctx->flow.mpls_lse = ctx->base_flow.mpls_lse;

> For the other case,
> this:
> +ctx->flow.mpls_lse &= ~htonl(MPLS_LABEL_MASK | MPLS_TC_MASK |
> + MPLS_TTL_MASK | MPLS_BOS_MASK);
> seems equivalent to just "ctx->flow.mpls_lse = htonl(0);".  But later
> on the code writes to mpls_lse without any intervening read, so maybe
> we can just delete the &= statement.  

Also true. I will delete the &= statement.

> I'm not sure why compose_mpls_pop_action() emits
> OVS_ACTION_ATTR_POP_MPLS instead of just manipulating mpls_depth and
> other flow fields.

Good point. I'll see about changing things as you suggest.

> I think I'm done reviewing this patch for now.

Thanks. I'll try and address all the points you raised here and elsewhere
and get a fresh revision to the mailing list soon.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 01/16] User-Space MPLS actions and matches

2013-01-23 Thread Simon Horman
On Wed, Jan 23, 2013 at 10:31:52PM -0800, Andre Khan wrote:
> 
> Hi Simon,
> 
> Regarding this part in your mail:
> 
>>> push_mpls_lse() isn't documented as putting the new MPLS label before or
>>> after the existing labels, but it appears to me that it puts the new
>>> label after the existing labels.  If I'm right about that, it's
>>> surprising; one would ordinarily expect a new header to be an outermost
>>> one, not an innermost one.
> 
>>I believe that it pushes the new MPLS stack entry onto the MPLS stack
>>before any existing entries.
> 
> The newly pushed MPLS label is the "outermost" label (which is what I believe 
> Ben alluded to). 
> Typically this label is for the benefit of the MPLS Nexthop to which it has 
> local significance.   
> 
> I just started to scan this list, so forgive me if I lack the full context of 
> this thread and missed something... besides it is getting late here :)
> 
> Thanks,
> -Andre
> 
> PS: I would suggest that that location of the label be signified with the 
> words "inner/outer" rather than "before/after" as it removes any ambiguity... 
> not to mention that is the MPLS speak :) 

Thanks. I believe that the behaviour of the code is consistent with
what you describe.

How about this updated comment for push_mpls_lse().

/* Push an new MPLS stack entry onto the MPLS stack
 * and adjust packet->l2 and packet->l2.5 accordingly.
 * The new entry will be the outermost entry on the stack. */
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 01/16] User-Space MPLS actions and matches

2013-01-23 Thread Andre Khan

Great! Thanks for clearing that up for me.

-Andre


- Original Message -
From: "Simon Horman" 
To: "Andre Khan" 
Cc: dev@openvswitch.org, "Isaku Yamahata" , "Ravi K" 
, "Ben Pfaff" 
Sent: Wednesday, January 23, 2013 11:00:11 PM
Subject: Re: [ovs-dev] [PATCH 01/16] User-Space MPLS actions and matches

On Wed, Jan 23, 2013 at 10:31:52PM -0800, Andre Khan wrote:
> 
> Hi Simon,
> 
> Regarding this part in your mail:
> 
>>> push_mpls_lse() isn't documented as putting the new MPLS label before or
>>> after the existing labels, but it appears to me that it puts the new
>>> label after the existing labels.  If I'm right about that, it's
>>> surprising; one would ordinarily expect a new header to be an outermost
>>> one, not an innermost one.
> 
>>I believe that it pushes the new MPLS stack entry onto the MPLS stack
>>before any existing entries.
> 
> The newly pushed MPLS label is the "outermost" label (which is what I believe 
> Ben alluded to). 
> Typically this label is for the benefit of the MPLS Nexthop to which it has 
> local significance.   
> 
> I just started to scan this list, so forgive me if I lack the full context of 
> this thread and missed something... besides it is getting late here :)
> 
> Thanks,
> -Andre
> 
> PS: I would suggest that that location of the label be signified with the 
> words "inner/outer" rather than "before/after" as it removes any ambiguity... 
> not to mention that is the MPLS speak :) 

Thanks. I believe that the behaviour of the code is consistent with
what you describe.

How about this updated comment for push_mpls_lse().

/* Push an new MPLS stack entry onto the MPLS stack
 * and adjust packet->l2 and packet->l2.5 accordingly.
 * The new entry will be the outermost entry on the stack. */
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev