[ovs-dev] [PATCH v2.29] datapath: Add basic MPLS support to kernel

2013-05-17 Thread Simon Horman
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.

Based heavily on work by Leo Alterman and Ravi K.

Cc: Ravi K 
Cc: Leo Alterman 
Reviewed-by: Isaku Yamahata 
Signed-off-by: Simon Horman 

---

This is the remaining patch of the series "MPLS actions and matches".
To aid review it is available in git at:

git://github.com/horms/openvswitch.git devel/mpls-v2.28

TODO:
* Finalise kernel network core side of GSO. That will likely require
  minor updates to this patch.

v2.29
* Break include/ and lib/ portions of the patch out into a
  separate patch "datapath: Add basic MPLS support to kernel"
* Update for new MPLS GSO scheme
  - skb->protocol is set to the new ethertype of the packet
on MPLS push and pop
  - When pushing the first MPLS LSE onto a previously non-MPLS
packet set skb->inner_protocol to the original ethertype.
  - skb->inner_protocol may be used by the network stack
for GSO of the inner-packet.
* Drop const from ethertype parameter of set_ethertype.
  This appears to be a legacy of this parameter being a pointer.
* Pass the ethertype patrameter of pop_mpls as a value rather
  than a pointer.

v2.28
* Kernel Datapath changes as suggested by Jarno Rajahalme
  + Correct the logic introduced in v2.27 to set the network_header
to after the MPLS label stack in the case of an MPLS packet.
- Increment stack_len offset so that label stacks of depth greater
  than two do not cause an infinite loop.
- Correct offset passed to check_header to include skb->mac len

v2.27
* Kernel Datapath changes as suggested by Jarno Rajahalme and Jesse Gross:
  + Previously the mac_len and network_header of an skb corresponded
to the end of the L2 header.  To support GSO, just before transmission,
do_output, with the results as follows:

Input: non-MPLS skb: Output: network header and mac_len correspond
 to the beginning of the L3 headers
Input: MPLS: Output: network header and mac_len correspond to the
 end of the L2 headers.

This is somewhat confusing.

  + The new scheme is as follows:
- The mac_len always corresponds to the end of the L2 header.
- The network header always corresponds to the beginning of the
  L3 header.

  + Note that in the case of MPLS output the end of the L2 headers and the
  beginning of the L3 headers will differ.

* Remove unused declaration of skb_cb_mpls_stack()

v2.26
* Rebase on master
* Kernel Datapath changes as suggested by Jarno Rajahalme
  - Use skb_network_header() instead of skb_mac_header() to locate
the ethertype to set in set_ethertype() as the latter will
be wrong in the presence of VLAN tags. This resolves
a regression introduced in v2.24.
  - Enhance comment in do_output()
  - do_execute_actions(): Do not alter mpls_stack_depth if
a MPLS push or pop action fail. This is achieved by altering
mpls_stack_depth at the end of push_mpls() and pop_mpls().

v2.25
* Rebase on master
* Pass big-endian value as the last argument of eth_types_set() in
  validate_and_copy_actions__()
* Use revised GSO support as provided by the patch series
  "[PATCH 0/2] Small Modifications to GSO to allow segmentation of MPLS"
  - Set skb->mac_len to the length of the l2 header + MPLS stack length
  - Update skb->network_header accordingly
  - Set skb->encapsulated_features

v2.24
* Use skb_mac_header() in set_ethertype()
* Set skb->encapsulation in set_ethertype() to support MPLS GSO.
  Also add a note about the other requirements for MPLS GSO.
  MPLS GSO support will be posted as a patch net-next (Linux mainline)
  "MPLS: Add limited GSO support"
* Do not add ETH_TYPE_MIN, it is no longer used

v2.23
* As suggested by Jesse Gross:
  - Verify the current ethernet type when validating sample actions
both for the taken and not-taken path if the sample action.
  - Document that the OVS_KEY_ATTR_MPLS attribute accepts a list of
struct ovs_key_mpls but that an implementation may restrict
the length it accepts.
  - Restrict the array length of the OVS_KEY_ATTR_MPLS to one.
+ Don't add ovs_flow_verify_key_len as it was added to
  handle attributes whose values are arrays but there are
  no attributes with values that are arrays (of length greater than one).

v2.22
* As suggested by Jesse Gross:
  - Fix sparse warning in validate_and_copy_actions()
I have no idea why sparse doesn't show this up this on my system.
  - Remove call to skb_cow_head() from push_mpls() as it
is already covered by a call to make_writable()
  - Check (key_type > OVS_KEY_ATTR_MAX) in ovs_flow_verify_key_len()
  - Disallow set actions on l2.5+ data and MPLS push and pop actions
after an MPLS pop action as there is no verification that the packet
is actually of the new ethernet type. This may later be supported
using recirculation or by other means.
  - Do not add spurious 

Re: [ovs-dev] [PATCH net-next v2] net: Loosen constraints for recalculating checksum in skb_segment()

2013-05-17 Thread Eric Dumazet
On Fri, 2013-05-17 at 15:49 +0900, Simon Horman wrote:
> In the case where the ability to offload checksums changes
> then it may be necessary for software checksumming of an skb to occur.
> 
> An example of such a case is where a non-GRE packet is received but
> is to be encapsulated and transmitted as GRE.
> 
> Another example relates to my proposed support for for GSO packets
> that are non-MPLS when received but MPLS when transmitted.
> 
> The cost of this change is that csum may be checked when it previously
> was not. In the case where csum is true this is pure overhead. In the
> case where csum is false it leads to software checksumming, which
> I believe also leads to correct checksums in transmitted packets for
> the cases described above.
> 
> This patch relies on the return value of can_checksum_protocol()
> being correct and in turn the return value of skb_network_protocol(),
> used to provide the protocol parameter of can_checksum_protocol(),
> being correct. It also relies on the features passed to skb_segment()
> and in turn to can_checksum_protocol() being correct.
> 
> I believe that this problem does has not been observed for VLANs because it
> appears that all drivers except xgbe set vlan_features such that that the
> checksum offload support for VLAN packets is greater than or equal to that
> of non-VLAN packets.
> 
> I wonder if the code in xgbe may be an oversight and the hardware does
> support checksumming of VLAN packets.  If so it may be worth updating the
> vlan_features of the driver as this patch will force such checksums to be
> performed in software rather than hardware.
> 
> Signed-off-by: Simon Horman 
> 
> ---
> 
> Tested using the bnx2 driver.
> 
> v2
> Rewrite changlog after further analysis
> ---
>  net/core/skbuff.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index af9185d..3a43be4 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -2853,7 +2853,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, 
> netdev_features_t features)
>doffset + tnl_hlen);
>  
>   if (fskb != skb_shinfo(skb)->frag_list)
> - continue;
> + goto csum;
>  
>   if (!sg) {
>   nskb->ip_summed = CHECKSUM_NONE;
> @@ -2917,6 +2917,7 @@ skip_fraglist:
>   nskb->len += nskb->data_len;
>   nskb->truesize += nskb->data_len;
>  
> +csum:
>   if (!csum) {
>   nskb->csum = skb_checksum(nskb, doffset,
> nskb->len - doffset, 0);


I had hard time to understand this patch, but I think I now get it.

Changelog seems to be very specific, while the change looks like a more
generic problem about skbs having a frag_list ? Oh well...

skb_segment() is probably too complex for my understanding.

Could you please change the label name ?

perform_csum_check for example, and maybe it would be easier :)



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH net-next v3] MPLS: Add limited GSO support

2013-05-17 Thread Eric Dumazet
On Fri, 2013-05-17 at 15:50 +0900, Simon Horman wrote:

> @@ -509,6 +511,8 @@ struct sk_buff {
>   __u32   reserved_tailroom;
>   };
>  
> + __be16  inner_protocol;
> + /* 16/48 bit hole */
>   sk_buff_data_t  inner_transport_header;
>   sk_buff_data_t  inner_network_header;
>   sk_buff_data_t  inner_mac_header;

We are reaching the point where sk_buff is so big that per packet cost
is killing us, and guys want linux kernel bypass.

sizeof(sk_buff) = 0xf8 -> 0x100

sizeof(skbuff_fclone_cache) = 0x200 -> 0x240 

So TCP stack performance is going to be hurt by this change, as the
atomic_t containing fclone_ref will be in a separate cache line.

__copy_skb_header() needs to be smarter and perform a bulk copy using
long words

Maybe we could use 16 bits instead of 32 for the inner_*_headers ?



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [xlate 2/4] ofproto-dpif: Move odp_actions from subfacet to facet.

2013-05-17 Thread Ben Pfaff
On Tue, May 14, 2013 at 07:29:34PM -0700, Ethan Jackson wrote:
> Upon close inspection, it appears that it's not possible for
> actions to differ between subfacets belonging to a given facet.
> Given this fact, it makes sense to move datapath actions from
> subfacets to their parent facets.  It's both conceptually more
> straightforward, and necessary for future threading and megaflow
> work.
> 
> Justin Pettit fixed a serious learning bug.

Would you describe it a little?  It's a big patch so the fix isn't
necessarily obvious (I haven't read on yet).  Is it something that we
should break out separately so that we can backport it?

> Co-authored-by: Justin Pettit 
> Signed-off-by: Ethan Jackson 

I agree with Justin's comment.

subfacet_create() refers to subfacet_make_actions() in a comment, but
there is no longer such a function.

handle_flow_miss_with_facet() previously translated the actions
separately for each packet, if it was necessary due to slow-pathing.
The new version doesn't do that.  Are you sure that every packet for a
facet always has the same datapath actions, even if the facet is
slow-pathed?  We have a comment in facet_check_consistency() that says
that they can vary:

/* The actions for slow-path flows may legitimately vary from one
 * packet to the next.  We're done. */

The new odp_actions_stub in struct facet makes facets huge!  This
should not be necessary.  I'd use a much smaller stub there, if we
need one at all.  I would consider an approach like the following.

- Put a pointer, a length, and a small fixed-size buffer that
  we think will cover the common case (maybe 32 bytes?) into
  struct facet.  This avoids not just a huge stub there but
  the pretty-large struct ofpbuf (currently 52 bytes on
  32-bit) too.

struct nlattr *odp_actions;
size_t odp_actions_len; /* or unsigned int */
uint64_t odp_actions_stub[32 / 8];

- Translate the actions into a larger (1 kB or 4 kB) on-stack
  stub in facet_create().

- If the translated actions fit into facet->odp_actions_stub,
  use it and point facet->odp_actions to it, otherwise
  xmemdup() an allocated region.

- When we're done with the facet, if facet->odp_actions !=
  facet->odp_actions_stub, free(facet->odp_actions).

Finally, I think we might actually have a fairly long-standing bug due
to a bad assumption.  We assume that a subfacet, that is, a given ODP
key, can map to exactly one facet (see "This shouldn't happen."
comment in subfacet_create()) but now I think that's wrong.  For
example, suppose as a thought experiment that the kernel reported no
fields at all for any packet.  Then every packet would have the same
ODP key, and so all of them would have the same subfacet, but
obviously we'd want more than one facet.  I think that essentially the
same thing, but more subtle, can come up whenever the kernel supports
less-specific matching than userspace.  One simple solution seems to
be to just never create facets or subfacets when ODP_FIT_TOO_LITTLE
comes up.  That would simplify other code, too, since we could get rid
of SLOW_MATCH, subfacet_slow_reason(), etc.

I'd like to review this again after revision.

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [xlate 3/4] ofproto-dpif: Rename action_xlate_ctx.

2013-05-17 Thread Ben Pfaff
On Tue, May 14, 2013 at 07:29:35PM -0700, Ethan Jackson wrote:
> This patch changes the name of action_xlate_ctx to xlate_ctx. Aside
> from being a bit snappier, it fits more cleanly with structures
> added in future patches.
> 
> Signed-off-by: Ethan Jackson 

Sure, fine with me.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [xlate 4/4] ofproto-dpif: Revamp xlate_actions() interface.

2013-05-17 Thread Ben Pfaff
On Tue, May 14, 2013 at 07:29:36PM -0700, Ethan Jackson wrote:
> This patch implements a new interface to xlate_actions which, aside
> from being simpler and more intuitive, achieves several goals.  It
> pulls all of xlate_actions() results into a single structure which
> will be easier to cache and pass around in future.  And it shields
> xlate_ctx from the rest of the code, making it possible to factor
> xlate_actions() in the future (hopefully).
> 
> Signed-off-by: Ethan Jackson 

I'm concerned about putting a 512+ byte structure into a facet.  I
think the size of xlate_out should therefore be reduced.

I'd like to look at this series again when you've revised it.

Thanks,

Ben.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v8 1/5] Add execute_actions

2013-05-17 Thread Jesse Gross
On Wed, May 15, 2013 at 1:31 AM, Simon Horman  wrote:
> This moves generic action execution code out of lib/dpif-netedev.c
> and into a new file, lib/execute-actions.c.
>
> This is in preparation for using execute_set_action()
> in lib/odp-util.c to handle recirculation/
>
> Signed-off-by: Simon Horman 

Here's a combined set of comments for all of the patches in the
execute-actions code:
 * The convention used other places in the code when passing around
void pointers and casting them is to have the variable named with an
underscore be the void *.
 * I would extract the userdata for userspace actions in
execute_actions() and pass it directly to the userspace function
pointer.
 * I don't think there is a lot of value in the asserts in
execute_actions(). I think it's valid to pass NULL for 'dp' and 'key'
in these situations if you want. The function pointers can't be NULL
but that will stop the program just as quickly as an assert anyways.
 * The later patches pass around a huge number of arguments. We should
find some way of encapsulating them in a struct. It might even be
possible to store them directly in the flow. It deviates slightly from
other behavior because we don't update the flow when we change the
packet but it would clean a lot of other things up.
 * We already do some action execution in userspace when we send
packets to the controller. It would be good to see if we can start
using this code there as well.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH net-next v3] MPLS: Add limited GSO support

2013-05-17 Thread Jesse Gross
On Thu, May 16, 2013 at 11:50 PM, Simon Horman  wrote:
> * Set skb->mac_len and skb->network_header to correspond to the
>   end of the L3 header, including the MPLS label stack.

I believe that the treatment of skb->mac_len is different from what
you are proposing in the OVS patch. I think this is because there is a
call to skb_reset_mac_len() in __skb_gso_segment() but it is somewhat
confusing. The GSO code is moving away from being able to reset all
header pointers and reparse the packet (obviously that's not possible
in the case of MPLS) so maybe we can just remove that call.

> A new NETIF_F_GRE_GSO feature is added for devices which support
> hardware MPLS GSO offload.  Currently no devices support this
> and MPLS GSO always falls back to software.

I assume that this is supposed to refer to MPLS and not GRE.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [oftest v3 1/5] netdev-dummy: Limit receive queue length to 100 packets.

2013-05-17 Thread Ben Pfaff
It doesn't seem like a good idea to allow the queue length to grow without
bound, even for a test-only device.

Signed-off-by: Ben Pfaff 
---
 lib/netdev-dummy.c |   15 +--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 908fef2..62d8a1b 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -54,10 +54,15 @@ struct netdev_dummy {
 struct list rxes;   /* List of child "netdev_rx_dummy"s. */
 };
 
+/* Max 'recv_queue_len' in struct netdev_dummy. */
+#define NETDEV_DUMMY_MAX_QUEUE 100
+
 struct netdev_rx_dummy {
 struct netdev_rx up;
 struct list node;   /* In netdev_dummy's "rxes" list. */
 struct list recv_queue;
+int recv_queue_len; /* list_size(&recv_queue). */
+bool listening;
 };
 
 static struct shash dummy_netdevs = SHASH_INITIALIZER(&dummy_netdevs);
@@ -159,6 +164,7 @@ netdev_dummy_rx_open(struct netdev *netdev_, struct 
netdev_rx **rxp)
 netdev_rx_init(&rx->up, &netdev->up, &netdev_rx_dummy_class);
 list_push_back(&netdev->rxes, &rx->node);
 list_init(&rx->recv_queue);
+rx->recv_queue_len = 0;
 
 *rxp = &rx->up;
 return 0;
@@ -176,6 +182,7 @@ netdev_rx_dummy_recv(struct netdev_rx *rx_, void *buffer, 
size_t size)
 }
 
 packet = ofpbuf_from_list(list_pop_front(&rx->recv_queue));
+rx->recv_queue_len--;
 if (packet->size > size) {
 return -EMSGSIZE;
 }
@@ -210,6 +217,7 @@ netdev_rx_dummy_drain(struct netdev_rx *rx_)
 {
 struct netdev_rx_dummy *rx = netdev_rx_dummy_cast(rx_);
 ofpbuf_list_delete(&rx->recv_queue);
+rx->recv_queue_len = 0;
 return 0;
 }
 
@@ -463,8 +471,11 @@ netdev_dummy_receive(struct unixctl_conn *conn,
 
 n_listeners = 0;
 LIST_FOR_EACH (rx, node, &dummy_dev->rxes) {
-struct ofpbuf *copy = ofpbuf_clone(packet);
-list_push_back(&rx->recv_queue, ©->list_node);
+if (rx->recv_queue_len < NETDEV_DUMMY_MAX_QUEUE) {
+struct ofpbuf *copy = ofpbuf_clone(packet);
+list_push_back(&rx->recv_queue, ©->list_node);
+rx->recv_queue_len++;
+}
 n_listeners++;
 }
 ofpbuf_delete(packet);
-- 
1.7.2.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [oftest v3 3/5] netdev-dummy: Drop "nobody listened" reply from netdev-dummy/receive.

2013-05-17 Thread Ben Pfaff
Ethan pointed out that this wasn't very useful.

Signed-off-by: Ben Pfaff 
---
 lib/netdev-dummy.c|   16 +++-
 tests/ofproto-dpif.at |   24 
 2 files changed, 11 insertions(+), 29 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index ce84a0c..8c182a3 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -441,24 +441,19 @@ eth_from_packet_or_flow(const char *s)
 return packet;
 }
 
-static int
+static void
 netdev_dummy_queue_packet(struct netdev_dummy *dummy,
   const void *data, size_t size)
 {
 struct netdev_rx_dummy *rx;
-int n_listeners;
 
-n_listeners = 0;
 LIST_FOR_EACH (rx, node, &dummy->rxes) {
 if (rx->recv_queue_len < NETDEV_DUMMY_MAX_QUEUE) {
 struct ofpbuf *copy = ofpbuf_clone_data(data, size);
 list_push_back(&rx->recv_queue, ©->list_node);
 rx->recv_queue_len++;
-n_listeners++;
 }
 }
-
-return n_listeners;
 }
 
 static void
@@ -488,16 +483,11 @@ netdev_dummy_receive(struct unixctl_conn *conn,
 dummy_dev->stats.rx_packets++;
 dummy_dev->stats.rx_bytes += packet->size;
 
-n_listeners += netdev_dummy_queue_packet(dummy_dev,
- packet->data, packet->size);
+netdev_dummy_queue_packet(dummy_dev, packet->data, packet->size);
 ofpbuf_delete(packet);
 }
 
-if (!n_listeners) {
-unixctl_command_reply(conn, "packets queued but nobody listened");
-} else {
-unixctl_command_reply(conn, "success");
-}
+unixctl_command_reply(conn, NULL);
 }
 
 static void
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 1fdbac3..ba8c975 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -1730,8 +1730,7 @@ AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip], [0],
 # flow stats updates are mainly what implements the fin_timeout
 # feature, we warp forward a couple of times to ensure that flow stats
 # run before re-checking the flow table.)
-AT_CHECK([ovs-appctl netdev-dummy/receive br0 
0021853763af0026b98cb0f90800453c2e2440004006465dac11370dac11370b828b0016751e267ba00216d01736020405b40402080a2d25085f01030307],
 [0], [success
-])
+AT_CHECK([ovs-appctl netdev-dummy/receive br0 
0021853763af0026b98cb0f90800453c2e2440004006465dac11370dac11370b828b0016751e267ba00216d01736020405b40402080a2d25085f01030307])
 AT_CHECK([ovs-appctl time/warp 1000 && ovs-appctl time/warp 1000], [0], [warped
 warped
 ])
@@ -1740,8 +1739,7 @@ AT_CHECK([ovs-ofctl dump-flows br0 | ofctl_strip], [0],
  n_packets=1, n_bytes=74, idle_timeout=60, actions=fin_timeout(idle_timeout=5)
 ])
 # Check that a TCP FIN packet does change the timeout.
-AT_CHECK([ovs-appctl netdev-dummy/receive br0 
0021853763af0026b98cb0f90800451000342e3e40004006463bac11370dac11370b828b0016751e319dfc96399b801100717ae80101080a2d250a9408579588],
 [0], [success
-])
+AT_CHECK([ovs-appctl netdev-dummy/receive br0 
0021853763af0026b98cb0f90800451000342e3e40004006463bac11370dac11370b828b0016751e319dfc96399b801100717ae80101080a2d250a9408579588])
 AT_CHECK([ovs-appctl time/warp 1000 && ovs-appctl time/warp 1000], [0], [warped
 warped
 ])
@@ -1803,12 +1801,9 @@ OVS_VSWITCHD_START([add-br br1 -- \
 ADD_OF_PORTS([br0], [1], [2])
 ADD_OF_PORTS([br1], [3])
 
-AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'],
 [0], [success
-])
-AT_CHECK([ovs-appctl netdev-dummy/receive p2 
'in_port(2),eth(src=50:54:00:00:00:07,dst=50:54:00:00:00:05),eth_type(0x0800),ipv4(src=192.168.0.2,dst=192.168.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0)'],
 [0], [success
-])
-AT_CHECK([ovs-appctl netdev-dummy/receive p3 
'in_port(3),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'],
 [0], [success
-])
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 
'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
+AT_CHECK([ovs-appctl netdev-dummy/receive p2 
'in_port(2),eth(src=50:54:00:00:00:07,dst=50:54:00:00:00:05),eth_type(0x0800),ipv4(src=192.168.0.2,dst=192.168.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=0,code=0)'])
+AT_CHECK([ovs-appctl netdev-dummy/receive p3 
'in_port(3),eth(src=50:54:00:00:00:09,dst=50:54:00:00:00:0a),eth_type(0x0800),ipv4(src=10.0.0.2,dst=10.0.0.1,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0)'])
 
 AT_CHECK([ovs-appctl dpif/dump-flows br0 | sort | STRIP_USED], [0], [dnl
 
in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=64,frag=no),icmp(type=8,code=0),
 packets:0, by

[ovs-dev] [oftest v3 2/5] netdev-dummy: Factor some netdev_dummy_receive() code out into new function.

2013-05-17 Thread Ben Pfaff
An upcoming patch will add another user.

Signed-off-by: Ben Pfaff 
---
 lib/netdev-dummy.c |   32 ++--
 1 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 62d8a1b..ce84a0c 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -441,6 +441,26 @@ eth_from_packet_or_flow(const char *s)
 return packet;
 }
 
+static int
+netdev_dummy_queue_packet(struct netdev_dummy *dummy,
+  const void *data, size_t size)
+{
+struct netdev_rx_dummy *rx;
+int n_listeners;
+
+n_listeners = 0;
+LIST_FOR_EACH (rx, node, &dummy->rxes) {
+if (rx->recv_queue_len < NETDEV_DUMMY_MAX_QUEUE) {
+struct ofpbuf *copy = ofpbuf_clone_data(data, size);
+list_push_back(&rx->recv_queue, ©->list_node);
+rx->recv_queue_len++;
+n_listeners++;
+}
+}
+
+return n_listeners;
+}
+
 static void
 netdev_dummy_receive(struct unixctl_conn *conn,
  int argc, const char *argv[], void *aux OVS_UNUSED)
@@ -457,7 +477,6 @@ netdev_dummy_receive(struct unixctl_conn *conn,
 
 n_listeners = 0;
 for (i = 2; i < argc; i++) {
-struct netdev_rx_dummy *rx;
 struct ofpbuf *packet;
 
 packet = eth_from_packet_or_flow(argv[i]);
@@ -469,15 +488,8 @@ netdev_dummy_receive(struct unixctl_conn *conn,
 dummy_dev->stats.rx_packets++;
 dummy_dev->stats.rx_bytes += packet->size;
 
-n_listeners = 0;
-LIST_FOR_EACH (rx, node, &dummy_dev->rxes) {
-if (rx->recv_queue_len < NETDEV_DUMMY_MAX_QUEUE) {
-struct ofpbuf *copy = ofpbuf_clone(packet);
-list_push_back(&rx->recv_queue, ©->list_node);
-rx->recv_queue_len++;
-}
-n_listeners++;
-}
+n_listeners += netdev_dummy_queue_packet(dummy_dev,
+ packet->data, packet->size);
 ofpbuf_delete(packet);
 }
 
-- 
1.7.2.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [oftest v3 0/5] add support for OFTest

2013-05-17 Thread Ben Pfaff
This is the latest version of the oftest series.  The first version
was posted Oct. 25, 2012.  The second version was posted Feb. 1, 2013.
The only difference between v2 and v3 is considerable work rebasing
due to the restructuring of the netdev library interface.

Ben Pfaff (5):
  netdev-dummy: Limit receive queue length to 100 packets.
  netdev-dummy: Factor some netdev_dummy_receive() code out into new
function.
  netdev-dummy: Drop "nobody listened" reply from netdev-dummy/receive.
  netdev-dummy: Add "pstream" option for connecting a dummy to a
process.
  tests: Add support for running OFTest.

 Makefile.am   |1 +
 NEWS  |2 +
 README-OFTest |   77 ++
 lib/netdev-dummy.c|  267 +---
 tests/automake.mk |6 +
 tests/ofproto-dpif.at |   24 ++---
 tests/run-oftest  |   94 +
 7 files changed, 438 insertions(+), 33 deletions(-)
 create mode 100644 README-OFTest
 create mode 100755 tests/run-oftest

-- 
1.7.2.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [oftest v3 4/5] netdev-dummy: Add "pstream" option for connecting a dummy to a process.

2013-05-17 Thread Ben Pfaff
Signed-off-by: Ben Pfaff 
---
 lib/netdev-dummy.c |  244 +---
 1 files changed, 232 insertions(+), 12 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 8c182a3..a617218 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -31,6 +31,8 @@
 #include "poll-loop.h"
 #include "shash.h"
 #include "sset.h"
+#include "stream.h"
+#include "unaligned.h"
 #include "unixctl.h"
 #include "vlog.h"
 
@@ -42,6 +44,12 @@ VLOG_DEFINE_THIS_MODULE(netdev_dummy);
 #define FREE_BSD 0
 #endif
 
+struct dummy_stream {
+struct stream *stream;
+struct ofpbuf rxbuf;
+struct list txq;
+};
+
 struct netdev_dummy {
 struct netdev up;
 uint8_t hwaddr[ETH_ADDR_LEN];
@@ -51,6 +59,10 @@ struct netdev_dummy {
 unsigned int change_seq;
 int ifindex;
 
+struct pstream *pstream;
+struct dummy_stream *streams;
+size_t n_streams;
+
 struct list rxes;   /* List of child "netdev_rx_dummy"s. */
 };
 
@@ -73,6 +85,9 @@ static unixctl_cb_func netdev_dummy_set_admin_state;
 static int netdev_dummy_create(const struct netdev_class *, const char *,
struct netdev **);
 static void netdev_dummy_poll_notify(struct netdev_dummy *);
+static void netdev_dummy_queue_packet(struct netdev_dummy *, struct ofpbuf *);
+
+static void dummy_stream_close(struct dummy_stream *);
 
 static bool
 is_dummy_class(const struct netdev_class *class)
@@ -94,6 +109,140 @@ netdev_rx_dummy_cast(const struct netdev_rx *rx)
 return CONTAINER_OF(rx, struct netdev_rx_dummy, up);
 }
 
+static void
+netdev_dummy_run(void)
+{
+struct shash_node *node;
+
+SHASH_FOR_EACH (node, &dummy_netdevs) {
+struct netdev_dummy *dev = node->data;
+size_t i;
+
+if (dev->pstream) {
+struct stream *new_stream;
+int error;
+
+error = pstream_accept(dev->pstream, &new_stream);
+if (!error) {
+struct dummy_stream *s;
+
+dev->streams = xrealloc(dev->streams,
+((dev->n_streams + 1)
+ * sizeof *dev->streams));
+s = &dev->streams[dev->n_streams++];
+s->stream = new_stream;
+ofpbuf_init(&s->rxbuf, 2048);
+list_init(&s->txq);
+} else if (error != EAGAIN) {
+VLOG_WARN("%s: accept failed (%s)",
+  pstream_get_name(dev->pstream), strerror(error));
+pstream_close(dev->pstream);
+dev->pstream = NULL;
+}
+}
+
+for (i = 0; i < dev->n_streams; i++) {
+struct dummy_stream *s = &dev->streams[i];
+int error = 0;
+size_t n;
+
+stream_run(s->stream);
+
+if (!list_is_empty(&s->txq)) {
+struct ofpbuf *txbuf;
+int retval;
+
+txbuf = ofpbuf_from_list(list_front(&s->txq));
+retval = stream_send(s->stream, txbuf->data, txbuf->size);
+if (retval > 0) {
+ofpbuf_pull(txbuf, retval);
+if (!txbuf->size) {
+list_remove(&txbuf->list_node);
+ofpbuf_delete(txbuf);
+}
+} else if (retval != -EAGAIN) {
+error = -retval;
+}
+}
+
+if (!error) {
+if (s->rxbuf.size < 2) {
+n = 2 - s->rxbuf.size;
+} else {
+uint16_t frame_len;
+
+frame_len = ntohs(get_unaligned_be16(s->rxbuf.data));
+if (frame_len < ETH_HEADER_LEN) {
+error = EPROTO;
+n = 0;
+} else {
+n = (2 + frame_len) - s->rxbuf.size;
+}
+}
+}
+if (!error) {
+int retval;
+
+ofpbuf_prealloc_tailroom(&s->rxbuf, n);
+retval = stream_recv(s->stream, ofpbuf_tail(&s->rxbuf), n);
+if (retval > 0) {
+s->rxbuf.size += retval;
+if (retval == n && s->rxbuf.size > 2) {
+ofpbuf_pull(&s->rxbuf, 2);
+netdev_dummy_queue_packet(dev,
+  ofpbuf_clone(&s->rxbuf));
+ofpbuf_clear(&s->rxbuf);
+}
+} else if (retval != -EAGAIN) {
+error = (retval < 0 ? -retval
+ : s->rxbuf.size ? EPROTO
+ : EOF);
+}
+}
+
+if (error) {
+VLOG_DBG("%s: closing connection (%s)",
+ stream_get_name(s->stream),
+  

[ovs-dev] [oftest v3 5/5] tests: Add support for running OFTest.

2013-05-17 Thread Ben Pfaff
Signed-off-by: Ben Pfaff 
---
 Makefile.am   |1 +
 NEWS  |2 +
 README-OFTest |   77 +++
 tests/automake.mk |6 +++
 tests/run-oftest  |   94 +
 5 files changed, 180 insertions(+), 0 deletions(-)
 create mode 100644 README-OFTest
 create mode 100755 tests/run-oftest

diff --git a/Makefile.am b/Makefile.am
index 36beb6c..193b19e 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -55,6 +55,7 @@ EXTRA_DIST = \
NOTICE \
OPENFLOW-1.1+ \
PORTING \
+   README-OFTest \
README-gcov \
README-lisp \
REPORTING-BUGS \
diff --git a/NEWS b/NEWS
index 4cb4499..7340898 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,8 @@ post-v1.11.0
   * New support for matching outer source and destination IP address
 of tunneled packets, for tunnel ports configured with the newly
added "remote_ip=flow" and "local_ip=flow" options.
+- New "check-oftest" Makefile target for running OFTest against Open
+  vSwitch.  See README-OFTest for details.
 
 
 v1.11.0 - xx xxx 
diff --git a/README-OFTest b/README-OFTest
new file mode 100644
index 000..92936a7
--- /dev/null
+++ b/README-OFTest
@@ -0,0 +1,77 @@
+How to Use OFTest With Open vSwitch
+===
+
+This document describes how to use the OFTest OpenFlow protocol
+testing suite with Open vSwitch in "dummy mode".  In this mode of
+testing, no packets travel across physical or virtual networks.
+Instead, Unix domain sockets stand in as simulated networks.  This
+simulation is imperfect, but it is much easier to set up, does not
+require extra physical or virtual hardware, and does not require
+supervisor privileges.
+
+Prerequisites
+-
+
+First, build Open vSwitch according to the instructions in INSTALL.
+You need not install it.
+
+Second, obtain a copy of OFTest and install its prerequisites.  You
+need a copy of OFTest that includes this commit available at
+https://github.com/blp/oftest/commits/master:
+
+commit 406614846c5eae01f0eb460a9f107e7ed604924f
+Author: Ben Pfaff 
+
+make ovs-dummy platform work again
+
+Commit e1b8da9 (dataplane: single-threaded rewrite) changed the
+DataPlanePort required interface but it did not update the ovs-dummy
+implementation to match.  This commit makes the platform work again.
+
+Testing OVS in dummy mode does not require root privilege, so you may
+ignore that requirement.
+
+Optionally, add the top-level OFTest directory (containing the "oft"
+program) to your $PATH.  This slightly simplifies running OFTest later.
+
+Running OFTest
+--
+
+To run OFTest in dummy mode, run the following command from your Open
+vSwitch build directory:
+
+make check-oftest OFT=
+
+where  is the absolute path to the "oft" program in
+OFTest.
+
+If you added "oft" to your $PATH, you may omit the OFT variable
+assignment:
+
+make check-oftest
+
+By default, "check-oftest" passes "oft" just enough options to enable
+dummy mode.  You can use OFTFLAGS to pass additional options.  For
+example, to run just the basic.Echo test instead of all tests (the
+default) and enable verbose logging:
+
+make check-oftest OFT= OFTFLAGS='--verbose -T basic.Echo'
+
+Interpreting OFTest Results
+---
+
+Please interpret OFTest results cautiously.  Open vSwitch can fail a
+given test in OFTest for many reasons, including bugs in Open vSwitch,
+bugs in OFTest, bugs in the "dummy mode" integration, and differing
+interpretations of the OpenFlow standard and other standards.
+
+Open vSwitch has not been validated against OFTest.  Please do report
+test failures that you believe to represent bugs in Open vSwitch.
+Include the precise versions of Open vSwitch and OFTest in your bug
+report, plus any other information needed to reproduce the problem.
+
+Contact 
+---
+
+b...@openvswitch.org
+http://openvswitch.org/
diff --git a/tests/automake.mk b/tests/automake.mk
index 4442eb5..15df623 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -145,6 +145,12 @@ check-valgrind: all tests/atconfig tests/atlocal 
$(TESTSUITE) \
@echo 'Valgrind output can be found in tests/testsuite.dir/*/valgrind.*'
@echo 
'--'
 
+# OFTest support.
+
+check-oftest: all
+   srcdir='$(srcdir)' $(SHELL) $(srcdir)/tests/run-oftest
+EXTRA_DIST += tests/run-oftest
+
 clean-local:
test ! -f '$(TESTSUITE)' || $(SHELL) '$(TESTSUITE)' -C tests --clean
 
diff --git a/tests/run-oftest b/tests/run-oftest
new file mode 100755
index 000..d12a22f
--- /dev/null
+++ b/tests/run-oftest
@@ -0,0 +1,94 @@
+#! /bin/sh
+
+set -e
+
+run () {
+echo "$@"
+"$@" || exit 1
+}
+
+# Put built tools early in $PATH.
+builddir=`pwd`
+if test ! -e vswitchd/ovs-vswitchd; then
+echo >&2 'not in build directo

[ovs-dev] [PATCH] netdev-dummy: Remove FreeBSD dependency.

2013-05-17 Thread Ben Pfaff
There's no particular reason that netdev_dummy_register() has to care about
the particular OS, except that the tests like to use the special Linux-only
tunnel vport types.  But that can be done better, I think, by just always
registering them from netdev_dummy_register() and making that function
idempotent, so that calling it twice under Linux has no additional effect.
This commit implements that solution.

CC: Ed Maste 
Signed-off-by: Ben Pfaff 
---
 lib/netdev-dummy.c |   10 +-
 lib/netdev-vport.c |8 ++--
 2 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c
index 908fef2..14b286b 100644
--- a/lib/netdev-dummy.c
+++ b/lib/netdev-dummy.c
@@ -36,12 +36,6 @@
 
 VLOG_DEFINE_THIS_MODULE(netdev_dummy);
 
-#ifdef __FreeBSD__
-#define FREE_BSD 1
-#else
-#define FREE_BSD 0
-#endif
-
 struct netdev_dummy {
 struct netdev up;
 uint8_t hwaddr[ETH_ADDR_LEN];
@@ -553,7 +547,5 @@ netdev_dummy_register(bool override)
 }
 netdev_register_provider(&dummy_class);
 
-if (FREE_BSD) {
-netdev_vport_tunnel_register();
-}
+netdev_vport_tunnel_register();
 }
diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c
index 38fc996..6eee8a7 100644
--- a/lib/netdev-vport.c
+++ b/lib/netdev-vport.c
@@ -685,11 +685,15 @@ netdev_vport_tunnel_register(void)
 TUNNEL_CLASS("vxlan", "vxlan_system"),
 TUNNEL_CLASS("lisp", "lisp_system")
 };
+static bool inited;
 
 int i;
 
-for (i = 0; i < ARRAY_SIZE(vport_classes); i++) {
-netdev_register_provider(&vport_classes[i].netdev_class);
+if (!inited) {
+inited = true;
+for (i = 0; i < ARRAY_SIZE(vport_classes); i++) {
+netdev_register_provider(&vport_classes[i].netdev_class);
+}
 }
 }
 
-- 
1.7.2.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2.29] datapath: Add basic MPLS support to kernel

2013-05-17 Thread Jesse Gross
On Fri, May 17, 2013 at 12:06 AM, Simon Horman  wrote:
> diff --git a/datapath/actions.c b/datapath/actions.c
> index 0dac658..ac4423a 100644
> --- a/datapath/actions.c
> +++ b/datapath/actions.c
> +/* The end of the mac header.
> + *
> + * For non-MPLS skbs this will correspond to the network header.
> + * For MPLS skbs it will be berfore the network_header as the MPLS
> + * label stack lies between the end of the mac header and the network
> + * header. That is, for MPLS skbs the end of the mac header
> + * is the top of the MPLS label stack.
> + */
> +static inline unsigned char *skb_mac_header_end(const struct sk_buff *skb)
> +{
> +   return skb_mac_header(skb) + skb->mac_len;
> +}

This should either be moved into skbuff.h or given a name that makes
it clear that it is a local function.

> +static void set_ethertype(struct sk_buff *skb, __be16 ethertype)
> +{
> +   __be16 *skb_ethertype = get_ethertype(skb);
> +   if (*skb_ethertype == ethertype)
> +   return;
> +   if (get_ip_summed(skb) == OVS_CSUM_COMPLETE) {
> +   __be16 diff[] = { ~*skb_ethertype, ethertype };
> +   skb->csum = ~csum_partial((char *)diff, sizeof(diff),
> + ~skb->csum);
> +   }

Inside of OVS the Ethernet header is not covered by CHECKSUM_COMPLETE.

> +static int push_mpls(struct sk_buff *skb,
> +const struct ovs_action_push_mpls *mpls)
> +{
> +   __be32 *new_mpls_lse;
> +   int err;
> +
> +   err = make_writable(skb, skb->mac_len + MPLS_HLEN);
> +   if (unlikely(err))
> +   return err;
> +
> +   skb_push(skb, MPLS_HLEN);

What happens if there isn't enough headroom?

> +static int pop_mpls(struct sk_buff *skb, const __be16 ethertype)
> +{
> +   int err;
> +
> +   err = make_writable(skb, skb->mac_len + MPLS_HLEN);
> +   if (unlikely(err))
> +   return err;
> +
> +   if (get_ip_summed(skb) == OVS_CSUM_COMPLETE)
> +   skb->csum = csum_sub(skb->csum,
> +csum_partial(skb_mac_header_end(skb),
> + MPLS_HLEN, 0));
> +
> +   memmove(skb_mac_header(skb) + MPLS_HLEN, skb_mac_header(skb),
> +   skb->mac_len);
> +
> +   skb_pull(skb, MPLS_HLEN);

You could use __skb_pull() here.

>  /* remove VLAN header from packet and update csum accordingly. */
>  static int __pop_vlan_tci(struct sk_buff *skb, __be16 *current_tci)
>  {
> @@ -115,6 +229,9 @@ static int push_vlan(struct sk_buff *skb, const struct 
> ovs_action_push_vlan *vla
> if (!__vlan_put_tag(skb, current_tag))
> return -ENOMEM;
>
> +   /* update mac_len for skb_mac_header_end() */
> +   skb_reset_mac_len(skb);

Doesn't this make us forget the start of the MPLS label stack?

> -/* Execute a list of actions against 'skb'. */
> +/* Execute a list of actions against 'skb'.
> + *
> + * The stack depth is only tracked in the case of a non-MPLS packet
> + * that becomes MPLS via an MPLS push action. The stack depth
> + * is passed to do_output() in order to allow it to prepare the
> + * skb for possible GSO segmentation. */

I don't think this comment applies any more.

> diff --git a/datapath/datapath.c b/datapath/datapath.c
> index 42af315..3a1c203 100644
> --- a/datapath/datapath.c
> +++ b/datapath/datapath.c

It's not clear to that using the depth is actually sufficient to
capture all possible combinations because the more common case is that
actions are a list, not nested. For example, consider the following
(invalid) action list on an IP input packet:

push_mpls, sample(pop_mpls), pop_mpls

I don't believe that this will be rejected since by the time we get to
the second pop_mpls we will have forgotten about the sample action.

> @@ -811,6 +869,35 @@ static int validate_and_copy_actions(const struct nlattr 
> *attr,
> return -EINVAL;
> break;
>
> +   case OVS_ACTION_ATTR_PUSH_MPLS: {
> +   const struct ovs_action_push_mpls *mpls = nla_data(a);
> +   if (!eth_p_mpls(mpls->mpls_ethertype))
> +   return -EINVAL;
> +   eth_types_set(eth_types, depth, mpls->mpls_ethertype);
> +   break;
> +   }
> +
> +   case OVS_ACTION_ATTR_POP_MPLS: {
> +   size_t i;
> +
> +   for (i = 0; i < eth_types->depth; i++)
> +   if (eth_types->types[i] != htons(ETH_P_IP))
> +   return -EINVAL;

I think the check here should be for MPLS, not IP.
> diff --git a/datapath/flow.c b/datapath/flow.c
> index 3ce926e..2a86f90 100644
> --- a/datapath/flow.c
> +++ b/datapath/flow.c
> @@ -848,6 +880,9 @@ const int ovs_key_lens[OVS_KEY_ATTR_MAX + 1] = {
> [OVS_KEY_ATTR_AR

Re: [ovs-dev] [PATCH] netdev-dummy: Remove FreeBSD dependency.

2013-05-17 Thread Ed Maste
On 17 May 2013 18:57, Ben Pfaff  wrote:
> There's no particular reason that netdev_dummy_register() has to care about
> the particular OS, except that the tests like to use the special Linux-only
> tunnel vport types.  But that can be done better, I think, by just always
> registering them from netdev_dummy_register() and making that function
> idempotent, so that calling it twice under Linux has no additional effect.
> This commit implements that solution.
>
> CC: Ed Maste 
> Signed-off-by: Ben Pfaff 

Acked-by: Ed Maste 

Looks good to me, and unit tests continue to pass on FreeBSD.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev