Re: [ovs-dev] [PATCH v4 05/14] Support userdata in NXT_PACKET_IN2.
On Fri, Feb 19, 2016 at 02:35:40PM -0800, Jarno Rajahalme wrote: > With the comments below: > > Acked-by: Jarno Rajahalme Thanks for the review. > > +/* Properties for NXAST_CONTROLLER2. */ > > +enum nx_action_controller2_prop_type { > > +NXAC2PT_MAX_LEN,/* ovs_be16 max length to send controller. > > */ > > +NXAC2PT_CONTROLLER_ID, /* ovs_be16 controller ID of destination. > > */ > > +NXAC2PT_REASON, /* uint8_t reason (OFPR_*). */ > > +NXAC2PT_USERDATA, /* Data to copy into NXPINT_USERDATA. */ > > +}; > > Should document the default values here. OK, I somehow managed to fit them in now. > > diff --git a/lib/ofp-print.c b/lib/ofp-print.c > > index 74f0de6..39e1c8b 100644 > > --- a/lib/ofp-print.c > > +++ b/lib/ofp-print.c > > @@ -141,6 +141,17 @@ ofp_print_packet_in(struct ds *string, const struct > > ofp_header *oh, > > } > > ds_put_char(string, '\n'); > > > > +if (pin.userdata_len) { > > +ds_put_cstr(string, " userdata="); > > +for (size_t i = 0; i < pin.userdata_len; i++) { > > +if (i) { > > +ds_put_char(string, '.'); > > +} > > +ds_put_format(string, "%02x", pin.userdata[i]); > > +} > > Could use format_hex_arg() here? OK, done. Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 07/14] Implement serializing the state of packet traversal in "continuations".
With small comments below: Acked-by: Jarno Rajahalme > On Feb 19, 2016, at 12:34 AM, Ben Pfaff wrote: > > One purpose of OpenFlow packet-in messages is to allow a controller to > interpose on the path of a packet through the flow tables. If, for > example, the controller needs to modify a packet in some way that the > switch doesn't directly support, the controller should be able to > program the switch to send it the packet, then modify the packet and > send it back to the switch to continue through the flow table. > > That's the theory. In practice, this doesn't work with any but the > simplest flow tables. Packet-in messages simply don't include enough > context to allow the flow table traversal to continue. For example: > >* Via "resubmit" actions, an Open vSwitch packet can have an > effective "call stack", but a packet-in can't describe it, and > so it would be lost. > >* Via "patch ports", an Open vSwitch packet can traverse multiple > OpenFlow logical switches. A packet-in can't describe or resume > this context. > Is there any context regarding this that needs to be described? >* A packet-in can't preserve the stack used by NXAST_PUSH and > NXAST_POP actions. > >* A packet-in can't preserve the OpenFlow 1.1+ action set. > >* A packet-in can't preserve the state of Open vSwitch mirroring > or connection tracking. > > This commit introduces a solution called "continuations". A continuation > is the state of a packet's traversal through OpenFlow flow tables. A > "controller" action with the "pause" flag, which is newly implemented in > this comit, generates a continuation and sends it to the OpenFlow “commit" > controller in a packet-in asynchronous message (only NXT_PACKET_IN2 > supports continuations, so the controller must configure them with > NXT_SET_PACKET_IN_FORMAT). The controller processes the packet-in, > possibly modifying some of its data, and sends it back to the switch with > an NXT_RESUME request, which causes flow table traversal to continue. In > principle, a single packet can be paused and resumed multiple times. > > Another way to look at it is: > >- "pause" is an extension of the existing OFPAT_CONTROLLER > action. It sends the packet to the controller, with full > pipeline context (some of which is switch implementation > dependent, and may thus vary from switch to switch). > >- A continuation is an extension of OFPT_PACKET_IN, allowing for > implementation dependent metadata. > >- NXT_RESUME is an extension of OFPT_PACKET_OUT, with the > semantics that the pipeline processing is continued with the > original translation context from where it was left at the time > it was paused. > > Signed-off-by: Ben Pfaff > Acked-by: Jarno Rajahalme > --- > NEWS | 5 +- > include/openflow/nicira-ext.h | 96 - > lib/learning-switch.c | 3 +- > lib/meta-flow.c | 9 +- > lib/meta-flow.h | 3 +- > lib/ofp-actions.c | 28 ++- > lib/ofp-actions.h | 5 + > lib/ofp-errors.h | 16 +- > lib/ofp-msgs.h| 4 + > lib/ofp-print.c | 78 +-- > lib/ofp-util.c| 470 -- > lib/ofp-util.h| 57 - > lib/rconn.c | 3 +- > ofproto/connmgr.c | 23 ++- > ofproto/connmgr.h | 2 +- > ofproto/fail-open.c | 16 +- > ofproto/ofproto-dpif-xlate.c | 199 ++ > ofproto/ofproto-dpif-xlate.h | 4 + > ofproto/ofproto-dpif.c| 34 +++ > ofproto/ofproto-provider.h| 3 + > ofproto/ofproto.c | 24 +++ > ovn/TODO | 57 - > ovn/controller/pinctrl.c | 4 +- > tests/ofp-actions.at | 13 +- > tests/ofp-print.at| 12 ++ > tests/ofproto-dpif.at | 172 > tests/ofproto-macros.at | 35 +++- > utilities/ovs-ofctl.8.in | 11 +- > utilities/ovs-ofctl.c | 109 +++--- > 29 files changed, 1239 insertions(+), 256 deletions(-) > > diff --git a/NEWS b/NEWS > index 9ab6cae..ba4b7f7 100644 > --- a/NEWS > +++ b/NEWS > @@ -6,7 +6,10 @@ Post-v2.5.0 > * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY. > * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported. > * New property-based packet-in message format NXT_PACKET_IN2 with support > - for arbitrary user-provided data. > + for arbitrary user-provided data and for serializing flow table > + traversal into a continuation for later resumption. > + * New extension message NXT_SET_ASYNC_CONFIG2 to allow OpenFlow 1.4-like > + control over asynchronous messages in earlier versions of OpenFlow. >- ovs-ofctl: > * queue-get-config command now allows a queue ID to be specified. >- DPDK: > diff --git a/include/openflow
Re: [ovs-dev] [PATCH v4 08/14] pinctrl: Fix header guard.
Acked-by: Jarno Rajahalme > On Feb 19, 2016, at 12:34 AM, Ben Pfaff wrote: > > Signed-off-by: Ben Pfaff > --- > ovn/controller/pinctrl.h | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/ovn/controller/pinctrl.h b/ovn/controller/pinctrl.h > index 65d5dfe..fd279ff 100644 > --- a/ovn/controller/pinctrl.h > +++ b/ovn/controller/pinctrl.h > @@ -1,4 +1,3 @@ > - > /* Copyright (c) 2015 Nicira, Inc. > * > * Licensed under the Apache License, Version 2.0 (the "License"); > @@ -14,8 +13,8 @@ > * limitations under the License. > */ > > -#ifndef DHCP_H > -#define DHCP_H 1 > +#ifndef PINCTRL_H > +#define PINCTRL_H 1 > > #include > > @@ -31,4 +30,4 @@ void pinctrl_run(struct controller_ctx *ctx, > void pinctrl_wait(void); > void pinctrl_destroy(void); > > -#endif /* ovn/dhcp.h */ > +#endif /* ovn/pinctrl.h */ > -- > 2.1.3 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 06/14] ofp-prop: Add support for putting and parsing nested properties.
On Fri, Feb 19, 2016 at 02:47:59PM -0800, Jarno Rajahalme wrote: > With one comment below: > > Acked-by: Jarno Rajahalme Thanks for the review. > > On Feb 19, 2016, at 12:34 AM, Ben Pfaff wrote: > > > > It hadn't occurred to me before that any special support was actually > > necessary or useful for nested properties, but the functions introduced in > > this commit are nice wrappers to deal with the extra 4-byte padding that > > ensures that the nested properties begin on 8-byte boundaries just like > > the outer properties. > > > > Signed-off-by: Ben Pfaff ... > > +enum ofperr > > +ofpprop_parse_nested(const struct ofpbuf *property, struct ofpbuf *nested) > > +{ > > +size_t nested_offset = ROUND_UP(ofpbuf_headersize(property), 8); > > +if (property->size < nested_offset) { > > +return OFPERR_OFPBPC_BAD_LEN; > > +} > > + > > +ofpbuf_use_const(nested, property->data, property->size); > > +ofpbuf_pull(nested, nested_offset); > > +return 0; > > +} ... > > +/* Appends a header for a property of type 'type' to 'msg', followed by > > padding > > + * suitable for putting nested properties into the property; that is, > > padding > > + * to an 8-byte alignment. > > + * > > + * This otherwise works like ofpprop_start(). > > + * > > + * There's no need for ofpprop_end_nested(), because ofpprop_end() works > > fine > > + * for this case. */ > > +size_t > > +ofpprop_start_nested(struct ofpbuf *msg, uint64_t type) > > +{ > > +size_t start_ofs = ofpprop_start(msg, type); > > +ofpbuf_put_zeros(msg, 4); > > Somehow I find the ‘4’ here asymmetric to the > ROUND_UP(ofpbuf_headersize(…)..) in ofpprop_parse_nested(). Maybe it > would be better to simply pull ‘8’ there (with the comment that both > the outer property and the padding are bypassed? Hmm, 8 would be wrong sometimes there, because experimenter property headers are 12 bytes. Do you like this better: ofpbuf_padto(msg, ROUND_UP(msg->size, 8)); ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/2] datapath: Remove OVS_FRAGMENT_BACKPORT
This macro is not required as we drop support for unsupported kernel versions. Signed-off-by: Pravin B Shelar --- acinclude.m4 | 9 - datapath/compat.h| 5 - datapath/linux/compat/include/linux/netfilter_ipv6.h | 7 --- datapath/linux/compat/include/net/inet_frag.h| 2 -- datapath/linux/compat/include/net/inetpeer.h | 5 ++--- datapath/linux/compat/include/net/ip.h | 20 ++-- datapath/linux/compat/include/net/ip6_route.h| 9 - datapath/linux/compat/include/net/ipv6.h | 2 +- .../include/net/netfilter/ipv6/nf_defrag_ipv6.h | 11 --- datapath/linux/compat/inet_fragment.c| 4 ++-- datapath/linux/compat/ip6_output.c | 4 ++-- datapath/linux/compat/ip_fragment.c | 4 ++-- datapath/linux/compat/reassembly.c | 6 +++--- 13 files changed, 14 insertions(+), 74 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index 0ae6a81..78743d9 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -551,15 +551,6 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_GREP_IFELSE([$KSRC/include/linux/utsrelease.h], [el6], [OVS_DEFINE([HAVE_RHEL6_PER_CPU])]) - dnl Conntrack support, and therefore, IP fragment handling backport, should - dnl only be enabled on kernels 3.10+. In future when OVS drops support for - dnl kernels older than 3.10, this macro could be removed from the codebase. - if test "$version" = 4; then -OVS_DEFINE([OVS_FRAGMENT_BACKPORT]) - elif test "$version" = 3 && test "$patchlevel" -ge 10; then -OVS_DEFINE([OVS_FRAGMENT_BACKPORT]) - fi - if cmp -s datapath/linux/kcompat.h.new \ datapath/linux/kcompat.h >/dev/null 2>&1; then rm datapath/linux/kcompat.h.new diff --git a/datapath/compat.h b/datapath/compat.h index 8bf779f..816f754 100644 --- a/datapath/compat.h +++ b/datapath/compat.h @@ -35,7 +35,6 @@ #define GROUP_ID(grp) 0 #endif -#ifdef OVS_FRAGMENT_BACKPORT #ifdef HAVE_NF_IPV6_OPS_FRAGMENT static inline int __init ip6_output_init(void) { return 0; } static inline void ip6_output_exit(void) { } @@ -74,9 +73,5 @@ static inline void compat_exit(void) nf_ct_frag6_cleanup(); rpl_ipfrag_fini(); } -#else -static inline int __init compat_init(void) { return 0; } -static inline void compat_exit(void) { } -#endif #endif /* compat.h */ diff --git a/datapath/linux/compat/include/linux/netfilter_ipv6.h b/datapath/linux/compat/include/linux/netfilter_ipv6.h index 3939e14..8d896fb 100644 --- a/datapath/linux/compat/include/linux/netfilter_ipv6.h +++ b/datapath/linux/compat/include/linux/netfilter_ipv6.h @@ -18,7 +18,6 @@ struct ovs_nf_ipv6_ops { }; #define nf_ipv6_ops ovs_nf_ipv6_ops -#if defined(OVS_FRAGMENT_BACKPORT) static struct ovs_nf_ipv6_ops ovs_ipv6_ops = { .fragment = ip6_fragment, }; @@ -27,12 +26,6 @@ static inline struct ovs_nf_ipv6_ops *ovs_nf_get_ipv6_ops(void) { return &ovs_ipv6_ops; } -#else /* !OVS_FRAGMENT_BACKPORT */ -static inline const struct ovs_nf_ipv6_ops *ovs_nf_get_ipv6_ops(void) -{ - return NULL; -} -#endif #define nf_get_ipv6_ops ovs_nf_get_ipv6_ops #endif /* HAVE_NF_IPV6_OPS_FRAGMENT */ diff --git a/datapath/linux/compat/include/net/inet_frag.h b/datapath/linux/compat/include/net/inet_frag.h index 606e952..aa9a019 100644 --- a/datapath/linux/compat/include/net/inet_frag.h +++ b/datapath/linux/compat/include/net/inet_frag.h @@ -13,7 +13,6 @@ } while (0) #endif -#ifdef OVS_FRAGMENT_BACKPORT #ifdef HAVE_INET_FRAGS_LAST_IN #define q_flags(q) (q->last_in) #define qp_flags(qp) (qp->q.last_in) @@ -80,6 +79,5 @@ void rpl_inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f); void rpl_inet_frag_destroy(struct inet_frag_queue *q, struct inet_frags *f); #define inet_frag_destroy(q, f, work) rpl_inet_frag_destroy(q, f) #endif /* !HAVE_CORRECT_MRU_HANDLING */ -#endif /* OVS_FRAGMENT_BACKPORT */ #endif /* inet_frag.h */ diff --git a/datapath/linux/compat/include/net/inetpeer.h b/datapath/linux/compat/include/net/inetpeer.h index c086f3b..c5f5eb1 100644 --- a/datapath/linux/compat/include/net/inetpeer.h +++ b/datapath/linux/compat/include/net/inetpeer.h @@ -3,8 +3,7 @@ #include_next -#if defined(OVS_FRAGMENT_BACKPORT) && \ -!defined(HAVE_INETPEER_VIF_SUPPORT) +#ifndef HAVE_INETPEER_VIF_SUPPORT static inline struct inet_peer *rpl_inet_getpeer_v4(struct inet_peer_base *base, __be32 v4daddr, int vif, int create) @@ -12,6 +11,6 @@ static inline struct inet_peer *rpl_inet_getpeer_v4(struct inet_peer_base *base, return inet_getpeer_v4(base, v4daddr, create); } #define inet_getpeer_v4 rpl_inet_getpeer_v4 -#endif /* OVS_FRAGMENT_BACKPORT */ +#endif /* HAVE_INETPEER_VIF_SUPPORT */ #e
[ovs-dev] [PATCH 1/2] datapath: Drop support for kernel older than 3.10
Currently OVS out of tree datapath supports a large number of kernel versions. From 2.6.32 to 4.3 and various distribution-specific kernels. But at this point major features are only available on more recent kernels. For example, stateful services are only available starting in kernel 3.10 and STT is available on starting with 3.5. Since these features are becoming essential to many OVS deployments, and the effort of maintaining the backports is high. We have decided to drop support for older kernel. Following patch drops supports for kernel older than 3.10. Signed-off-by: Pravin B Shelar --- acinclude.m4 | 15 ++- datapath/actions.c| 28 ++ datapath/compat.h | 23 + datapath/conntrack.c | 5 +- datapath/conntrack.h | 2 +- datapath/datapath.c | 7 +- datapath/datapath.h | 1 - datapath/flow.c | 3 +- datapath/flow_table.c | 10 +- datapath/linux/compat/geneve.c| 10 +- datapath/linux/compat/gso.c | 3 +- datapath/linux/compat/include/linux/netdevice.h | 8 +- datapath/linux/compat/include/net/net_namespace.h | 51 --- datapath/linux/compat/include/net/vxlan.h | 4 +- datapath/linux/compat/ip_gre.c| 4 +- datapath/linux/compat/ip_tunnel.c | 2 +- datapath/linux/compat/ip_tunnels_core.c | 9 +- datapath/linux/compat/lisp.c | 6 +- datapath/linux/compat/vxlan.c | 9 +- datapath/vlan.h | 65 - datapath/vport-geneve.c | 8 +- datapath/vport-internal_dev.c | 107 +- datapath/vport-netdev.c | 38 +--- datapath/vport-netdev.h | 1 - datapath/vport.c | 32 +-- datapath/vport.h | 35 ++- 26 files changed, 125 insertions(+), 361 deletions(-) delete mode 100644 datapath/vlan.h diff --git a/acinclude.m4 b/acinclude.m4 index 11c7787..0ae6a81 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -139,14 +139,10 @@ AC_DEFUN([OVS_CHECK_LINUX], [ else AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version newer than 4.3.x is not supported (please refer to the FAQ for advice)]) fi -elif test "$version" = 3; then +elif test "$version" = 3 && test "$patchlevel" -ge 10; then : # Linux 3.x else - if test "$version" -le 1 || test "$patchlevel" -le 5 || test "$sublevel" -le 31; then - AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 2.6.32 or later is required]) - else - : # Linux 2.6.x - fi + AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version 3.10 or later is required]) fi if (test ! -e "$KBUILD"/include/linux/version.h && \ test ! -e "$KBUILD"/include/generated/uapi/linux/version.h)|| \ @@ -307,7 +303,7 @@ AC_DEFUN([OVS_DEFINE], [ dnl OVS_CHECK_LINUX_COMPAT dnl -dnl Runs various Autoconf checks on the Linux 2.6 kernel source in +dnl Runs various Autoconf checks on the Linux kernel source in dnl the directory in $KBUILD. AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ rm -f datapath/linux/kcompat.h.new @@ -381,7 +377,10 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [can_checksum_protocol]) OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [ndo_get_iflink]) OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [netdev_features_t]) - OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [pcpu_sw_netstats]) + dnl Ubuntu kernel 3.13 has defined this struct but not used for netdev->tstats. + dnl So check type of tstats. + OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [pcpu_sw_netstats.*tstats], + [OVS_DEFINE([HAVE_PCPU_SW_NETSTATS])]) OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [netdev_rx_handler_register]) OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [net_device_extended]) OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [rx_handler_func_t.*pskb], diff --git a/datapath/actions.c b/datapath/actions.c index 20413c9..dcf8591 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -41,7 +41,6 @@ #include "datapath.h" #include "conntrack.h" #include "gso.h" -#include "vlan.h" #include "vport.h" static int do_execute_actions(struct datapath *dp, struct sk_buff *skb, @@ -68,9 +67,7 @@ struct ovs_frag_data { u8 l2_data[MAX_L2_LEN]; }; -#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,10,0) static DEFINE_PER_CPU(struct ovs_frag_data, ovs_frag_data_storage); -#endif
Re: [ovs-dev] [PATCH v4 07/14] Implement serializing the state of packet traversal in "continuations".
On Fri, Feb 19, 2016 at 03:43:35PM -0800, Jarno Rajahalme wrote: > With small comments below: > > Acked-by: Jarno Rajahalme Thanks for the review! I fixed the typos you pointed out. > > + * The controller might change the pipeline configuration concurrently with > > + * steps 2 through 4. For example, it might add or remove OpenFlow flows. > > If > > + * that happens, then the packet will experience a mix of processing from > > the > > + * two configurations, that is, the initial processing (before > > + * NXAST_CONTROLLER2) uses the initial flow table, and the later processing > > + * (after NXT_RESUME) uses the later flow table. > > Maybe it should be noted here that if the layout of data that is > pushed/popped to/from the stack changes then the continuation of the > packet processing might have unpredictable behavior. But maybe this is > true for pipeline “shape” changes in general. I think it's more general, so I just added a sentence: This means that the * controller needs to take care to avoid incompatible pipeline changes while * processing continuations. > > + * A packet-in that includes a continuation always includes the entire > > packet > > + * and is never buffered. > > Does this need to be the case? Does not not contradict the > stateful/stateless comment above? Good point. I changed this to read: * A stateless implementation of continuations may ignore the "controller" * action max_len, always sending the whole packet, because the full packet is * required to continue traversal. Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 1/5] datapath-windows: Added recirculation support.
Hi Sorin, Thanks for the patch. It needs to be updated to make it thread-safe. The DeferredActionQueue needs a RW lock for synchronization. I also ran into memory leak while testing out the patch (the kernel couldn¹t be uninstalled without a reboot). If am able to get to it, I will send out a patch that addresses some of the issues that I found while testing it with my Conntrack patch. Thanks, Sairam On 2/11/16, 1:59 PM, "Sorin Vinturis" wrote: >Recirculation support for the OVS extension. > >Tested with PING using MPLS packets. > >Signed-off-by: Sorin Vinturis >Co-authored-by: Alin Gabriel Serdean >Reported-by: Sorin Vinturis >Reported-at: >https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvswitc >h_ovs-2Dissues_issues_104&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNt >Xt-uEs&r=Dcruz40PROJ40ROzSpxyQSLw6fcrOWpJgEcEmNR3JEQ&m=CREZI27LygkjkrqW7Lm >dwo9sAA-aCidqVPqBJcTGKMU&s=xtUP8mawJ_-UjWP27ySmpPMABxUknGA-V4pRZ5SSrmk&e= >--- >v2: Initialize flow key before using it. >--- > datapath-windows/automake.mk | 3 + > datapath-windows/ovsext/Actions.c | 215 >++ > datapath-windows/ovsext/Actions.h | 54 > datapath-windows/ovsext/DpInternal.h | 2 +- > datapath-windows/ovsext/Flow.c| 76 +-- > datapath-windows/ovsext/Flow.h| 5 +- > datapath-windows/ovsext/Netlink/Netlink.h | 11 ++ > datapath-windows/ovsext/PacketIO.c| 16 ++- > datapath-windows/ovsext/PacketIO.h| 10 -- > datapath-windows/ovsext/Recirc.c | 165 +++ > datapath-windows/ovsext/Recirc.h | 82 > datapath-windows/ovsext/Tunnel.c | 15 ++- > datapath-windows/ovsext/User.c| 13 +- > datapath-windows/ovsext/ovsext.vcxproj| 3 + > 14 files changed, 602 insertions(+), 68 deletions(-) > create mode 100644 datapath-windows/ovsext/Actions.h > create mode 100644 datapath-windows/ovsext/Recirc.c > create mode 100644 datapath-windows/ovsext/Recirc.h > >diff --git a/datapath-windows/automake.mk b/datapath-windows/automake.mk >index f29f548..04fc97f 100644 >--- a/datapath-windows/automake.mk >+++ b/datapath-windows/automake.mk >@@ -9,6 +9,7 @@ EXTRA_DIST += \ > datapath-windows/misc/uninstall.cmd \ > datapath-windows/ovsext.sln \ > datapath-windows/ovsext/Actions.c \ >+ datapath-windows/ovsext/Actions.h \ > datapath-windows/ovsext/Atomic.h \ > datapath-windows/ovsext/BufferMgmt.c \ > datapath-windows/ovsext/BufferMgmt.h \ >@@ -45,6 +46,8 @@ EXTRA_DIST += \ > datapath-windows/ovsext/PacketIO.h \ > datapath-windows/ovsext/PacketParser.c \ > datapath-windows/ovsext/PacketParser.h \ >+ datapath-windows/ovsext/Recirc.c \ >+ datapath-windows/ovsext/Recirc.h \ > datapath-windows/ovsext/Stt.c \ > datapath-windows/ovsext/Stt.h \ > datapath-windows/ovsext/Switch.c \ >diff --git a/datapath-windows/ovsext/Actions.c >b/datapath-windows/ovsext/Actions.c >index 5a04541..d3f18f2 100644 >--- a/datapath-windows/ovsext/Actions.c >+++ b/datapath-windows/ovsext/Actions.c >@@ -14,8 +14,7 @@ > * limitations under the License. > */ > >-#include "precomp.h" >- >+#include "Actions.h" > #include "Debug.h" > #include "Event.h" > #include "Flow.h" >@@ -24,6 +23,7 @@ > #include "NetProto.h" > #include "Offload.h" > #include "PacketIO.h" >+#include "Recirc.h" > #include "Stt.h" > #include "Switch.h" > #include "User.h" >@@ -35,6 +35,8 @@ > #endif > #define OVS_DBG_MOD OVS_DBG_ACTION > >+#define OVS_DEST_PORTS_ARRAY_MIN_SIZE 2 >+ > typedef struct _OVS_ACTION_STATS { > UINT64 rxGre; > UINT64 txGre; >@@ -66,7 +68,6 @@ OVS_ACTION_STATS ovsActionStats; > * exercised while adding new members to the structure - only add ones >that get > * used across multiple stages in the pipeline/get used in multiple >functions. > */ >-#define OVS_DEST_PORTS_ARRAY_MIN_SIZE 2 > typedef struct OvsForwardingContext { > POVS_SWITCH_CONTEXT switchContext; > /* The NBL currently used in the pipeline. */ >@@ -99,7 +100,7 @@ typedef struct OvsForwardingContext { > */ > OvsIPv4TunnelKey tunKey; > >- /* >+/* > * Tunneling - Tx: > * To store the output port, when it is a tunneled port. We don't >foresee > * multiple tunneled ports as outport for any given NBL. >@@ -117,7 +118,6 @@ typedef struct OvsForwardingContext { > OVS_PACKET_HDR_INFO layers; > } OvsForwardingContext; > >- > /* > * >-- > * OvsInitForwardingCtx -- >@@ -564,10 +564,10 @@ OvsCompleteNBLForwardingCtx(OvsForwardingContext >*ovsFwdCtx, > static __inline NDIS_STATUS > OvsDoFlowLookupOutput(OvsForwardingContext *ovsFwdCtx) > { >-OvsFlowKey key; >-OvsFlow *flow; >-UINT64 hash; >-NDIS_STATUS status; >+OvsFlowKey key = { 0 }; >+OvsFlow *flow = NULL; >+UINT64 hash = 0; >+NDIS_STATUS status = NDIS_
Re: [ovs-dev] [PATCH v4 09/14] ofp-actions: Introduce macro for padding struct members.
On Fri, Feb 19, 2016 at 03:00:47PM -0800, Joe Stringer wrote: > On 19 February 2016 at 00:34, Ben Pfaff wrote: > > An upcoming commit will add another case where it's desirable to ensure > > that a variable-length array is aligned on an 8-byte boundary. This macro > > makes that a little easier. > > > > Signed-off-by: Ben Pfaff > > CC: Joe Stringer > > MSVC seems happy, so I'm happy: That's the rule I live by too1! > Acked-by: Joe Stringer Thanks for the review. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 09/14] ofp-actions: Introduce macro for padding struct members.
I’ve reviewed the series unto this point. Maybe someone else will review the OVN patches. I did note, however, that the last patch does not help the test failure: Before the last patch: # put_arp -actions=push:NXM_NX_REG1[],push:NXM_OF_ETH_SRC[],push:NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],pop:NXM_NX_REG1[],pop:NXM_OF_ETH_SRC[],push:NXM_NX_REG0[],set_field:0xbd9c9810->reg0,controller(reason=packet_out),pop:NXM_NX_REG0[],pop:NXM_OF_ETH_SRC[],pop:NXM_NX_REG1[], prereqs=eth.type == 0x806 && eth.type == 0x806 +actions=push:NXM_NX_REG0[],push:NXM_OF_ETH_SRC[],push:NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],pop:NXM_NX_REG0[],pop:NXM_OF_ETH_SRC[],controller(userdata=01.00.00.00.00.00.00.00),pop:NXM_OF_ETH_SRC[],pop:NXM_NX_REG0[], prereqs=eth.type == 0x806 && eth.type == 0x806 After the last patch: # put_arp -actions=push:NXM_NX_REG1[],push:NXM_OF_ETH_SRC[],push:NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],pop:NXM_NX_REG1[],pop:NXM_OF_ETH_SRC[],push:NXM_NX_REG0[],set_field:0xbd9c9810->reg0,controller(reason=packet_out),pop:NXM_NX_REG0[],pop:NXM_OF_ETH_SRC[],pop:NXM_NX_REG1[], prereqs=eth.type == 0x806 && eth.type == 0x806 +actions=push:NXM_NX_REG0[],push:NXM_OF_ETH_SRC[],push:NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],pop:NXM_NX_REG0[],pop:NXM_OF_ETH_SRC[],controller(userdata=01.00.00.00.00.00.00.00),pop:NXM_OF_ETH_SRC[],pop:NXM_NX_REG0[], prereqs=eth.type == 0x806 && eth.type == 0x806 I.e., the same. I hope I did not ack the bug causing this.. Jarno > On Feb 19, 2016, at 3:00 PM, Joe Stringer wrote: > > On 19 February 2016 at 00:34, Ben Pfaff wrote: >> An upcoming commit will add another case where it's desirable to ensure >> that a variable-length array is aligned on an 8-byte boundary. This macro >> makes that a little easier. >> >> Signed-off-by: Ben Pfaff >> CC: Joe Stringer > > MSVC seems happy, so I'm happy: > https://ci.appveyor.com/project/joestringer/openvswitch/build/1.0.24 > > Acked-by: Joe Stringer > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 06/14] ofp-prop: Add support for putting and parsing nested properties.
> On Feb 19, 2016, at 3:56 PM, Ben Pfaff wrote: > > On Fri, Feb 19, 2016 at 02:47:59PM -0800, Jarno Rajahalme wrote: >> With one comment below: >> >> Acked-by: Jarno Rajahalme > > Thanks for the review. > >>> On Feb 19, 2016, at 12:34 AM, Ben Pfaff wrote: >>> >>> It hadn't occurred to me before that any special support was actually >>> necessary or useful for nested properties, but the functions introduced in >>> this commit are nice wrappers to deal with the extra 4-byte padding that >>> ensures that the nested properties begin on 8-byte boundaries just like >>> the outer properties. >>> >>> Signed-off-by: Ben Pfaff > > ... > >>> +enum ofperr >>> +ofpprop_parse_nested(const struct ofpbuf *property, struct ofpbuf *nested) >>> +{ >>> +size_t nested_offset = ROUND_UP(ofpbuf_headersize(property), 8); >>> +if (property->size < nested_offset) { >>> +return OFPERR_OFPBPC_BAD_LEN; >>> +} >>> + >>> +ofpbuf_use_const(nested, property->data, property->size); >>> +ofpbuf_pull(nested, nested_offset); >>> +return 0; >>> +} > > ... > >>> +/* Appends a header for a property of type 'type' to 'msg', followed by >>> padding >>> + * suitable for putting nested properties into the property; that is, >>> padding >>> + * to an 8-byte alignment. >>> + * >>> + * This otherwise works like ofpprop_start(). >>> + * >>> + * There's no need for ofpprop_end_nested(), because ofpprop_end() works >>> fine >>> + * for this case. */ >>> +size_t >>> +ofpprop_start_nested(struct ofpbuf *msg, uint64_t type) >>> +{ >>> +size_t start_ofs = ofpprop_start(msg, type); >>> +ofpbuf_put_zeros(msg, 4); >> >> Somehow I find the ‘4’ here asymmetric to the >> ROUND_UP(ofpbuf_headersize(…)..) in ofpprop_parse_nested(). Maybe it >> would be better to simply pull ‘8’ there (with the comment that both >> the outer property and the padding are bypassed? > > Hmm, 8 would be wrong sometimes there, because experimenter property > headers are 12 bytes. > > Do you like this better: >ofpbuf_padto(msg, ROUND_UP(msg->size, 8)); Yes, thanks! Jarno ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 09/14] ofp-actions: Introduce macro for padding struct members.
Oh, that's a completely different test failure. I'm sure it's less mysterious. Probably I introduced it somewhere in revision today. I see it now too, I'll fix it. The test failure I had a problem with is: 1751: ovn -- 3 HVs, 3 LS, 3 lports/LS, 1 LR On Fri, Feb 19, 2016 at 04:17:42PM -0800, Jarno Rajahalme wrote: > I’ve reviewed the series unto this point. Maybe someone else will review the > OVN patches. I did note, however, that the last patch does not help the test > failure: > > Before the last patch: > > # put_arp > -actions=push:NXM_NX_REG1[],push:NXM_OF_ETH_SRC[],push:NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],pop:NXM_NX_REG1[],pop:NXM_OF_ETH_SRC[],push:NXM_NX_REG0[],set_field:0xbd9c9810->reg0,controller(reason=packet_out),pop:NXM_NX_REG0[],pop:NXM_OF_ETH_SRC[],pop:NXM_NX_REG1[], > prereqs=eth.type == 0x806 && eth.type == 0x806 > +actions=push:NXM_NX_REG0[],push:NXM_OF_ETH_SRC[],push:NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],pop:NXM_NX_REG0[],pop:NXM_OF_ETH_SRC[],controller(userdata=01.00.00.00.00.00.00.00),pop:NXM_OF_ETH_SRC[],pop:NXM_NX_REG0[], > prereqs=eth.type == 0x806 && eth.type == 0x806 > > After the last patch: > > # put_arp > -actions=push:NXM_NX_REG1[],push:NXM_OF_ETH_SRC[],push:NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],pop:NXM_NX_REG1[],pop:NXM_OF_ETH_SRC[],push:NXM_NX_REG0[],set_field:0xbd9c9810->reg0,controller(reason=packet_out),pop:NXM_NX_REG0[],pop:NXM_OF_ETH_SRC[],pop:NXM_NX_REG1[], > prereqs=eth.type == 0x806 && eth.type == 0x806 > +actions=push:NXM_NX_REG0[],push:NXM_OF_ETH_SRC[],push:NXM_NX_ARP_SHA[],push:NXM_OF_ARP_SPA[],pop:NXM_NX_REG0[],pop:NXM_OF_ETH_SRC[],controller(userdata=01.00.00.00.00.00.00.00),pop:NXM_OF_ETH_SRC[],pop:NXM_NX_REG0[], > prereqs=eth.type == 0x806 && eth.type == 0x806 > > I.e., the same. > > I hope I did not ack the bug causing this.. > > Jarno > > > On Feb 19, 2016, at 3:00 PM, Joe Stringer wrote: > > > > On 19 February 2016 at 00:34, Ben Pfaff wrote: > >> An upcoming commit will add another case where it's desirable to ensure > >> that a variable-length array is aligned on an 8-byte boundary. This macro > >> makes that a little easier. > >> > >> Signed-off-by: Ben Pfaff > >> CC: Joe Stringer > > > > MSVC seems happy, so I'm happy: > > https://ci.appveyor.com/project/joestringer/openvswitch/build/1.0.24 > > > > Acked-by: Joe Stringer > > ___ > > dev mailing list > > dev@openvswitch.org > > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 09/14] ofp-actions: Introduce macro for padding struct members.
On Fri, Feb 19, 2016 at 04:24:30PM -0800, Ben Pfaff wrote: > Oh, that's a completely different test failure. I'm sure it's less > mysterious. Probably I introduced it somewhere in revision today. > > I see it now too, I'll fix it. It was really that I forgot to update the test after changing how the action works. Since I've pushed a bunch of patches, I'll send a v5 for the series. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v5 0/5] Implement continuation (aka closure) support
This series is also available at: https://github.com/blp/ovs-reviews/tree/l3-4 g...@github.com:blp/ovs-reviews.git l3-4 This is the fifth revision of my "closures" series. The first version was: http://openvswitch.org/pipermail/dev/2016-January/064607.html The second version, unfortunately not marked v2, was: https://patchwork.ozlabs.org/patch/574946/ and the v1->v2 changes were described at: http://openvswitch.org/pipermail/dev/2016-January/064875.html Changes v2->v3: - Patches 1 through 6 are new. - There's a terminology change starting in patch 6. Now the process of pausing translation is called "freezing", and there are two users of freezing, which are recirculation and closures. - I added the ability to provide arbitrary user data on NXAST_PAUSE which is passed to the controller in a property on NXT_CLOSURE. - I made all the changes described in: http://openvswitch.org/pipermail/dev/2016-February/065984.html - I added a few more tests to ofp-print.at and ofp-actions.at. - Probably other refinements I've forgotten. Changes v3->v4: - Several patches were applied and thus dropped from the series. - I added the previous L3 ARP patches at the end, revised (see below). - I had a major change of heart on the approach here. Until now, I had an idea that "closures" were fundamentally different from OpenFlow "packet-ins" and thus deserved separate but parallel infrastructure. Now that I started to use them for ARP, I realized that there in fact isn't that much difference. Thus, this revision merges closures into packet-ins and drops the "closure" naming. Instead, what was previously a closure is just a packet-in that includes a "continuation", that is, the state that can be used to resume the pipeline. - The ARP series from before is tacked on the end here, but it is revised to use packet-ins instead of adding a specialized OpenFlow action for ARP. - The ARP patches (patches 8 and later) are not as mature as the rest of the series. In particular, I haven't been able to figure out why the final patch is necessary; it may indicate a bug in my test for the ARP support. In testing, I've definitely figured out that debugging is harder than it should be. Changes v4->v5: - Several patches were applied and thus dropped from the series. - I fixed a bad test in the second-to-last patch (thanks Jarno). - I fixed a host endianness dependency in struct action_header, which affected patches 1 and 4. Ben Pfaff (5): actions: Implement OVN "arp" action. ovn: Use callback function instead of simap for logical port number map. ovn-controller: Add data structure for indexing lports, multicast groups. ovn: Implement basic ARP support for L3 logical routers. [RFC] lflow: Disable egress table optimization. lib/ofp-actions.h | 30 ++-- lib/packets.c | 38 +++-- lib/packets.h | 3 +- ovn/TODO| 55 ++- ovn/controller/automake.mk | 2 + ovn/controller/lflow.c | 251 +++ ovn/controller/lflow.h | 11 +- ovn/controller/lport.c | 157 ovn/controller/lport.h | 67 + ovn/controller/ovn-controller.c | 47 +++--- ovn/controller/pinctrl.c| 321 ovn/controller/pinctrl.h| 9 +- ovn/lib/actions.c | 235 - ovn/lib/actions.h | 39 - ovn/lib/expr.c | 141 +- ovn/lib/expr.h | 16 +- ovn/northd/ovn-northd.8.xml | 112 -- ovn/northd/ovn-northd.c | 105 + ovn/ovn-architecture.7.xml | 78 +++--- ovn/ovn-sb.ovsschema| 15 +- ovn/ovn-sb.xml | 158 ++-- ovn/utilities/ovn-sbctl.c | 4 + tests/ovn.at| 190 ++-- tests/test-ovn.c| 20 ++- 24 files changed, 1675 insertions(+), 429 deletions(-) create mode 100644 ovn/controller/lport.c create mode 100644 ovn/controller/lport.h -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v5 1/5] actions: Implement OVN "arp" action.
An upcoming commit will use this as a building block in adding ARP support to the OVN L3 logical router implementation. Signed-off-by: Ben Pfaff --- lib/ofp-actions.h | 30 +++ lib/packets.c | 38 + lib/packets.h | 3 +- ovn/controller/ovn-controller.c | 4 +- ovn/controller/pinctrl.c| 174 ovn/controller/pinctrl.h| 5 +- ovn/lib/actions.c | 57 - ovn/lib/actions.h | 18 - ovn/ovn-sb.xml | 21 +++-- tests/ovn.at| 3 + 10 files changed, 277 insertions(+), 76 deletions(-) diff --git a/lib/ofp-actions.h b/lib/ofp-actions.h index 24143d3..55058ef 100644 --- a/lib/ofp-actions.h +++ b/lib/ofp-actions.h @@ -254,20 +254,22 @@ struct ofpact_output { * * Used for NXAST_CONTROLLER. */ struct ofpact_controller { -struct ofpact ofpact; -uint16_t max_len; /* Maximum length to send to controller. */ -uint16_t controller_id; /* Controller ID to send packet-in. */ -enum ofp_packet_in_reason reason; /* Reason to put in packet-in. */ - -/* If true, this action freezes packet traversal of the OpenFlow tables and - * adds a continuation to the packet-in message, that a controller can use - * to resume that traversal. */ -bool pause; - -/* Arbitrary data to include in the packet-in message (currently, only in - * NXT_PACKET_IN2). */ -uint16_t userdata_len; -uint8_t userdata[]; +OFPACT_PADDED_MEMBERS( +struct ofpact ofpact; +uint16_t max_len; /* Max length to send to controller. */ +uint16_t controller_id; /* Controller ID to send packet-in. */ +enum ofp_packet_in_reason reason; /* Reason to put in packet-in. */ + +/* If true, this action freezes packet traversal of the OpenFlow + * tables and adds a continuation to the packet-in message, that + * a controller can use to resume that traversal. */ +bool pause; + +/* Arbitrary data to include in the packet-in message (currently, + * only in NXT_PACKET_IN2). */ +uint16_t userdata_len; +); +uint8_t userdata[0]; }; /* OFPACT_ENQUEUE. diff --git a/lib/packets.c b/lib/packets.c index d82341d..b693f78 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc. + * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -1191,35 +1191,45 @@ packet_format_tcp_flags(struct ds *s, uint16_t tcp_flags) * 'arp_op', 'arp_sha', 'arp_tha', 'arp_spa', and 'arp_tpa'. The outer * Ethernet frame is initialized with Ethernet source 'arp_sha' and destination * 'arp_tha', except that destination ff:ff:ff:ff:ff:ff is used instead if - * 'broadcast' is true. */ + * 'broadcast' is true. Points the L3 header to the ARP header. */ void compose_arp(struct dp_packet *b, uint16_t arp_op, const struct eth_addr arp_sha, const struct eth_addr arp_tha, bool broadcast, ovs_be32 arp_spa, ovs_be32 arp_tpa) { -struct eth_header *eth; -struct arp_eth_header *arp; +compose_arp__(b); + +struct eth_header *eth = dp_packet_l2(b); +eth->eth_dst = broadcast ? eth_addr_broadcast : arp_tha; +eth->eth_src = arp_sha; + +struct arp_eth_header *arp = dp_packet_l3(b); +arp->ar_op = htons(arp_op); +arp->ar_sha = arp_sha; +arp->ar_tha = arp_tha; +put_16aligned_be32(&arp->ar_spa, arp_spa); +put_16aligned_be32(&arp->ar_tpa, arp_tpa); +} +/* Clears 'b' and replaces its contents by an ARP frame. Sets the fields in + * the Ethernet and ARP headers that are fixed for ARP frames to those fixed + * values, and zeroes the other fields. Points the L3 header to the ARP + * header. */ +void +compose_arp__(struct dp_packet *b) +{ dp_packet_clear(b); dp_packet_prealloc_tailroom(b, ARP_PACKET_SIZE); dp_packet_reserve(b, 2 + VLAN_HEADER_LEN); -eth = dp_packet_put_uninit(b, sizeof *eth); -eth->eth_dst = broadcast ? eth_addr_broadcast : arp_tha; -eth->eth_src = arp_sha; +struct eth_header *eth = dp_packet_put_zeros(b, sizeof *eth); eth->eth_type = htons(ETH_TYPE_ARP); -arp = dp_packet_put_uninit(b, sizeof *arp); +struct arp_eth_header *arp = dp_packet_put_zeros(b, sizeof *arp); arp->ar_hrd = htons(ARP_HRD_ETHERNET); arp->ar_pro = htons(ARP_PRO_IP); arp->ar_hln = sizeof arp->ar_sha; arp->ar_pln = sizeof arp->ar_spa; -arp->ar_op = htons(arp_op); -arp->ar_sha = arp_sha; -arp->ar_tha = arp_tha; - -put_16aligned_be32(&arp->ar_spa, arp_spa); -put_16aligned_be32(&arp->ar_tpa, arp_tpa); dp_packet_reset_offsets(b); dp_packet_set_l3(b, arp); diff --git a/li
[ovs-dev] [PATCH v5 3/5] ovn-controller: Add data structure for indexing lports, multicast groups.
This was more or less implemented inside lflow.c until now, but some upcoming code that shouldn't be in that file needs to use it too. This also adds a second index on lports, so that lports can be looked up based on the logical datapath tunnel key and the logical port tunnel key. An upcoming commit will add a user for this new index. Signed-off-by: Ben Pfaff --- ovn/controller/automake.mk | 2 + ovn/controller/lflow.c | 181 ++-- ovn/controller/lflow.h | 10 ++- ovn/controller/lport.c | 157 ++ ovn/controller/lport.h | 67 +++ ovn/controller/ovn-controller.c | 36 6 files changed, 299 insertions(+), 154 deletions(-) create mode 100644 ovn/controller/lport.c create mode 100644 ovn/controller/lport.h diff --git a/ovn/controller/automake.mk b/ovn/controller/automake.mk index cadfa9c..cf57bbd 100644 --- a/ovn/controller/automake.mk +++ b/ovn/controller/automake.mk @@ -8,6 +8,8 @@ ovn_controller_ovn_controller_SOURCES = \ ovn/controller/encaps.h \ ovn/controller/lflow.c \ ovn/controller/lflow.h \ + ovn/controller/lport.c \ + ovn/controller/lport.h \ ovn/controller/ofctrl.c \ ovn/controller/ofctrl.h \ ovn/controller/pinctrl.c \ diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index 912c609..bd08ad6 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2015 Nicira, Inc. +/* Copyright (c) 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -15,12 +15,13 @@ #include #include "lflow.h" +#include "lport.h" #include "dynamic-string.h" #include "ofctrl.h" #include "ofp-actions.h" #include "ofpbuf.h" #include "openvswitch/vlog.h" -#include "ovn/controller/ovn-controller.h" +#include "ovn-controller.h" #include "ovn/lib/actions.h" #include "ovn/lib/expr.h" #include "ovn/lib/ovn-sb-idl.h" @@ -43,8 +44,8 @@ add_logical_register(struct shash *symtab, enum mf_field_id id) expr_symtab_add_field(symtab, name, id, NULL, false); } -static void -symtab_init(void) +void +lflow_init(void) { shash_init(&symtab); @@ -155,161 +156,62 @@ symtab_init(void) expr_symtab_add_field(&symtab, "sctp.dst", MFF_SCTP_DST, "sctp", false); } -/* Logical datapaths and logical port numbers. */ - -enum ldp_type { -LDP_TYPE_ROUTER, -LDP_TYPE_SWITCH, -}; - -/* A logical datapath. - * - * 'ports' maps 'logical_port' names to 'tunnel_key' values in the OVN_SB - * Port_Binding table within the logical datapath. */ -struct logical_datapath { -struct hmap_node hmap_node; /* Indexed on 'uuid'. */ -struct uuid uuid; /* UUID from Datapath_Binding row. */ -uint32_t tunnel_key;/* 'tunnel_key' from Datapath_Binding row. */ -struct simap ports; /* Logical port name to port number. */ -enum ldp_type type; /* Type of logical datapath */ +struct lookup_port_aux { +const struct lport_index *lports; +const struct mcgroup_index *mcgroups; +const struct sbrec_datapath_binding *dp; }; -/* Contains "struct logical_datapath"s. */ -static struct hmap logical_datapaths = HMAP_INITIALIZER(&logical_datapaths); - -/* Finds and returns the logical_datapath for 'binding', or NULL if no such - * logical_datapath exists. */ -static struct logical_datapath * -ldp_lookup(const struct sbrec_datapath_binding *binding) -{ -struct logical_datapath *ldp; -HMAP_FOR_EACH_IN_BUCKET (ldp, hmap_node, uuid_hash(&binding->header_.uuid), - &logical_datapaths) { -if (uuid_equals(&ldp->uuid, &binding->header_.uuid)) { -return ldp; -} -} -return NULL; -} - -/* Creates a new logical_datapath for the given 'binding'. */ -static struct logical_datapath * -ldp_create(const struct sbrec_datapath_binding *binding) -{ -struct logical_datapath *ldp; - -ldp = xmalloc(sizeof *ldp); -hmap_insert(&logical_datapaths, &ldp->hmap_node, -uuid_hash(&binding->header_.uuid)); -ldp->uuid = binding->header_.uuid; -ldp->tunnel_key = binding->tunnel_key; -const char *ls = smap_get(&binding->external_ids, "logical-switch"); -ldp->type = ls ? LDP_TYPE_SWITCH : LDP_TYPE_ROUTER; -simap_init(&ldp->ports); -return ldp; -} - -static struct logical_datapath * -ldp_lookup_or_create(const struct sbrec_datapath_binding *binding) -{ -struct logical_datapath *ldp = ldp_lookup(binding); -return ldp ? ldp : ldp_create(binding); -} - -static void -ldp_free(struct logical_datapath *ldp) +static bool +lookup_port_cb(const void *aux_, const char *port_name, unsigned int *portp) { -simap_destroy(&ldp->ports); -hmap_remove(&logical_datapaths, &ldp->hmap_node); -free(ldp); -} +const struct lookup_port_aux *aux = aux_
[ovs-dev] [PATCH v5 5/5] [RFC] lflow: Disable egress table optimization.
I don't understand why, but without this change, the test in the previous commit does not pass. Signed-off-by: Ben Pfaff --- ovn/controller/lflow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index f16c5c9..a18c760 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -209,7 +209,7 @@ add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports, if (!ldp) { continue; } -if (!ingress && is_switch(ldp)) { +if (0 && !ingress && is_switch(ldp)) { /* For a logical switch datapath, local_datapaths tells us if there * are any local ports for this datapath. If not, processing * logical flows for the egress pipeline of this datapath is -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v5 4/5] ovn: Implement basic ARP support for L3 logical routers.
This is sufficient support that an L3 logical router can now transmit packets to VMs (and other destinations) without having to know the IP-to-MAC binding in advance. The details are carefully documented in all of the appropriate places. There are several important caveats that need to be fixed before this can be taken seriously in production. These are documented in ovn/TODO. The most important of these are renewal, expiration, and limiting the size of the ARP table. Signed-off-by: Ben Pfaff --- ovn/TODO| 55 ++- ovn/controller/lflow.c | 86 +++-- ovn/controller/lflow.h | 1 + ovn/controller/ovn-controller.c | 9 +- ovn/controller/pinctrl.c| 197 +- ovn/controller/pinctrl.h| 8 +- ovn/lib/actions.c | 203 +--- ovn/lib/actions.h | 11 +++ ovn/lib/expr.c | 53 +++ ovn/lib/expr.h | 3 + ovn/northd/ovn-northd.8.xml | 112 +- ovn/northd/ovn-northd.c | 105 - ovn/ovn-architecture.7.xml | 78 ++- ovn/ovn-sb.ovsschema| 15 ++- ovn/ovn-sb.xml | 137 ++- ovn/utilities/ovn-sbctl.c | 4 + tests/ovn.at| 187 +--- tests/test-ovn.c| 1 + 18 files changed, 1053 insertions(+), 212 deletions(-) diff --git a/ovn/TODO b/ovn/TODO index b08a4f4..93ef4a3 100644 --- a/ovn/TODO +++ b/ovn/TODO @@ -4,18 +4,11 @@ ** New OVN logical actions -*** arp - -Generates an ARP packet based on the current IPv4 packet and allows it -to be processed as part of the current pipeline (and then pop back to -processing the original IPv4 packet). +*** rate_limit TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to one per second for a given target. We might need to do this too. -We probably need to buffer the packet that generated the ARP. I don't -know where to do that. - *** icmp4 { action... } Generates an ICMPv4 packet based on the current IPv4 packet and @@ -60,37 +53,13 @@ the "arp" action, and an action for generating ** Dynamic IP to MAC bindings -Some bindings from IP address to MAC will undoubtedly need to be -discovered dynamically through ARP requests. It's straightforward -enough for a logical L3 router to generate ARP requests and forward -them to the appropriate switch. - -It's more difficult to figure out where the reply should be processed -and stored. It might seem at first that a first-cut implementation -could just keep track of the binding on the hypervisor that needs to -know, but that can't happen easily because the VM that sends the reply -might not be on the same HV as the VM that needs the answer (that is, -the VM that sent the packet that needs the binding to be resolved) and -there isn't an easy way for it to know which HV needs the answer. - -Thus, the HV that processes the ARP reply (which is unknown when the -ARP is sent) has to tell all the HVs the binding. The most obvious -place for this in the OVN_Southbound database. - -Details need to be worked out, including: - -*** OVN_Southbound schema changes. +OVN has basic support for establishing IP to MAC bindings dynamically, +using ARP. -Possibly bindings could be added to the Port_Binding table by adding -or modifying columns. Another possibility is that another table -should be added. +*** Ratelimiting. -*** Logical_Flow representation - -It would be really nice to maintain the general-purpose nature of -logical flows, but these bindings might have to include some -hard-coded special cases, especially when it comes to the relationship -with populating the bindings into the OVN_Southbound table. +From casual observation, Linux appears to generate at most one ARP per +second per destination. *** Tracking queries @@ -104,16 +73,12 @@ into the database. Something needs to make sure that bindings remain valid and expire those that become stale. -** MTU handling (fragmentation on output) - -** Ratelimiting. +*** Table size limiting. -*** ARP. +The table of MAC bindings must not be allowed to grow unreasonably +large. -*** ICMP error generation, TCP reset, UDP unreachable, protocol unreachable, ... - -As a point of comparison, Linux doesn't ratelimit TCP resets but I -think it does everything else. +** MTU handling (fragmentation on output) * ovn-controller diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index bd08ad6..f16c5c9 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -191,15 +191,13 @@ is_switch(const struct sbrec_datapath_binding *ldp) } -/* Translates logical flows in the Logical_Flow table in the OVN_SB database - * into OpenFlow flows. See ovn-architecture(7) for more information. */ -void -lflow_run(struct controller_ctx *ctx,
[ovs-dev] [PATCH v5 2/5] ovn: Use callback function instead of simap for logical port number map.
An simap is convenient but it isn't very flexible. If the client wants to keep extra data with each node then it has to build a second parallel data structure. A callback function is kind of a pain for the clients from the point of view of having to write it and deal with auxiliary data, etc., but it allows the storage to be more flexible. An upcoming commit will make further use of this capability. Signed-off-by: Ben Pfaff --- ovn/controller/lflow.c | 18 +-- ovn/lib/actions.c | 3 +- ovn/lib/actions.h | 10 +++--- ovn/lib/expr.c | 88 +++--- ovn/lib/expr.h | 13 ++-- tests/test-ovn.c | 19 +-- 6 files changed, 105 insertions(+), 46 deletions(-) diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index d53213c..912c609 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -271,6 +271,18 @@ lflow_init(void) symtab_init(); } +static bool +lookup_port_cb(const void *ldp_, const char *port_name, unsigned int *portp) +{ +const struct logical_datapath *ldp = ldp_; +const struct simap_node *node = simap_find(&ldp->ports, port_name); +if (!node) { +return false; +} +*portp = node->data; +return true; +} + /* Translates logical flows in the Logical_Flow table in the OVN_SB database * into OpenFlow flows. See ovn-architecture(7) for more information. */ void @@ -346,7 +358,8 @@ lflow_run(struct controller_ctx *ctx, struct hmap *flow_table, ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub); struct action_params ap = { .symtab = &symtab, -.ports = &ldp->ports, +.lookup_port = lookup_port_cb, +.aux = ldp, .ct_zones = ct_zones, .n_tables = LOG_PIPELINE_LEN, @@ -387,7 +400,8 @@ lflow_run(struct controller_ctx *ctx, struct hmap *flow_table, expr = expr_simplify(expr); expr = expr_normalize(expr); -uint32_t n_conjs = expr_to_matches(expr, &ldp->ports, &matches); +uint32_t n_conjs = expr_to_matches(expr, lookup_port_cb, ldp, + &matches); expr_destroy(expr); /* Prepare the OpenFlow matches for adding to the flow table. */ diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c index f9f7ef7..7fc4d9c 100644 --- a/ovn/lib/actions.c +++ b/ovn/lib/actions.c @@ -108,7 +108,8 @@ parse_set_action(struct action_context *ctx) struct expr *prereqs; char *error; -error = expr_parse_assignment(ctx->lexer, ctx->ap->symtab, ctx->ap->ports, +error = expr_parse_assignment(ctx->lexer, ctx->ap->symtab, + ctx->ap->lookup_port, ctx->ap->aux, ctx->ofpacts, &prereqs); if (error) { action_error(ctx, "%s", error); diff --git a/ovn/lib/actions.h b/ovn/lib/actions.h index 5b2367c..a41088e 100644 --- a/ovn/lib/actions.h +++ b/ovn/lib/actions.h @@ -17,6 +17,7 @@ #ifndef OVN_ACTIONS_H #define OVN_ACTIONS_H 1 +#include #include #include "compiler.h" #include "util.h" @@ -47,10 +48,11 @@ struct action_params { * expr_parse()). */ const struct shash *symtab; - /* 'ports' must be a map from strings (presumably names of ports) to - * integers (as one would provide to expr_to_matches()). Strings used in - * the actions that are not in 'ports' are translated to zero. */ -const struct simap *ports; +/* Looks up logical port 'port_name'. If found, stores its port number in + * '*portp' and returns true; otherwise, returns false. */ +bool (*lookup_port)(const void *aux, const char *port_name, +unsigned int *portp); +const void *aux; /* A map from a port name to its connection tracking zone. */ const struct simap *ct_zones; diff --git a/ovn/lib/expr.c b/ovn/lib/expr.c index e196922..44fde84 100644 --- a/ovn/lib/expr.c +++ b/ovn/lib/expr.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2015 Nicira, Inc. + * Copyright (c) 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -2287,17 +2287,18 @@ expr_match_add(struct hmap *matches, struct expr_match *match) } static bool -constrain_match(const struct expr *expr, const struct simap *ports, -struct match *m) +constrain_match(const struct expr *expr, +bool (*lookup_port)(const void *aux, const char *port_name, +unsigned int *portp), +const void *aux, struct match *m) { ovs_assert(expr->type == EXPR_T_CMP); if (expr->cmp.symbol->width) { mf_mask_subfield(expr->cmp.symbol->field, &expr->cmp.value, &expr->cmp.mask, m); } else { -const struct simap_node *node; -node = ports ? simap_find(ports, expr->cmp.stri
Re: [ovs-dev] [PATCH 3/3] hmap: Add extra build-time iteration checks for types derived from hmap.
On Mon, Feb 08, 2016 at 06:54:54PM -0800, Andy Zhou wrote: > On Mon, Feb 8, 2016 at 4:52 PM, Ben Pfaff wrote: > > > Some of our data structures derived from hmap use the same member names. > > This means it's possible to confuse them in iteration, e.g. to iterate a > > shash with SIMAP_FOR_EACH. Of course this will crash at runtime, but it > > seems even better to catch it at compile time. > > > > An alternative would be to use unique member names, e.g. shash_map and > > simap_map instead of just map. I like short names, though. > > > > It's kind of nasty that we need support from the hmap code to do this. > > An alternative would be to insert the build assertions as statements before > > the for loop. But that would cause nasty surprises if someone forgets the > > {} around a block of statements; even though the OVS coding style requires > > them in all cases, I suspect that programmers doing debugging, etc. tend > > to omit them sometimes. > > > > It's not actually necessary to have multiple variants of these macros, > > e.g. one can write a C99-compliant HMAP_FOR_EACH that accepts 3 or 4 or > > more arguments. But such a macro is harder to read, so I don't know > > whether this is a good tradeoff. > > > > Signed-off-by: Ben Pfaff > > > I find the the comments for *_INIT macro to be very helpful. Thanks. > > Acked-by: Andy Zhou Thanks. I pushed this to master. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] tests : fix tests #46 and #48 failing on some filesystems
On Tue, Feb 16, 2016 at 08:38:52PM +, Zoltán Balogh wrote: > I agree, it's better to use the fixed value and not to depend on the 'stat'. > That's why the 'if' condition checks if 'stat' is available. But your > suggested patch is simpler and better. I think I like simple. Thanks for all the patches. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Finally get that job with a Degree
96 p{margin:10px 0;padding:0;} table{border-collapse:collapse;} h1,h2,h3,h4,h5,h6{display:block;margin:0;padding:0;} img,a img{border:0;height:auto;outline:none;text-decoration:none;} body,#bodyTable,#bodyCell{height:100%;margin:0;padding:0;width:100%;} #outlook a{padding:0;} img{-ms-interpolation-mode:bicubic;} table{mso-table-lspace:0pt;mso-table-rspace:0pt;} ReadMsgBody{width:100%;} ExternalClass{width:100%;} p,a,li,td,blockquote{mso-line-height-rule:exactly;} a[href^=tel],a[href^=sms]{color:inherit;cursor:default;text-decoration:none;} p,a,li,td,body,table,blockquote{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;} ExternalClass,.ExternalClass p,.ExternalClass td,.ExternalClass div,.ExternalClass span,.ExternalClass font{line-height:100%;} a[x-apple-data-detectors]{color:inherit !important;text-decoration:none !important;font-size:inherit !important;font-family:inherit !important;font-weight:inherit !important;line-height:inherit !important;} templateContainer{max-width:600px !important;} a.mcnButton{display:block;} mcnImage{vertical-align:bottom;} mcnTextContent{word-break:break-word;} mcnTextContent img{height:auto !important;} mcnDividerBlock{table-layout:fixed !important;} body,#bodyTable{background-color:#FAFAFA;} #bodyCell{border-top:0;} h1{color:#202020;font-family:Helvetica;font-size:26px;font-style:normal;font-weight:bold;line-height:125%;letter-spacing:normal;text-align:left;} h2{color:#202020;font-family:Helvetica;font-size:22px;font-style:normal;font-weight:bold;line-height:125%;letter-spacing:normal;text-align:left;} h3{color:#202020;font-family:Helvetica;font-size:20px;font-style:normal;font-weight:bold;line-height:125%;letter-spacing:normal;text-align:left;} h4{color:#202020;font-family:Helvetica;font-size:18px;font-style:normal;font-weight:bold;line-height:125%;letter-spacing:normal;text-align:left;} #templatePreheader{background-color:#FAFAFA;border-top:0;border-bottom:0;padding-top:9px;padding-bottom:9px;} #templatePreheader .mcnTextContent,#templatePreheader .mcnTextContent p{color:#656565;font-family:Helvetica;font-size:12px;line-height:150%;text-align:left;} #templatePreheader .mcnTextContent a,#templatePreheader .mcnTextContent p a{color:#656565;font-weight:normal;text-decoration:underline;} #templateHeader{background-color:#FF;border-top:0;border-bottom:0;padding-top:9px;padding-bottom:0;} #templateHeader .mcnTextContent,#templateHeader .mcnTextContent p{color:#202020;font-family:Helvetica;font-size:16px;line-height:150%;text-align:left;} #templateHeader .mcnTextContent a,#templateHeader .mcnTextContent p a{color:#2BAADF;font-weight:normal;text-decoration:underline;} #templateUpperBody{background-color:#FF;border-top:0;border-bottom:0;padding-top:0;padding-bottom:0;} #templateUpperBody .mcnTextContent,#templateUpperBody .mcnTextContent p{color:#202020;font-family:Helvetica;font-size:16px;line-height:150%;text-align:left;} #templateUpperBody .mcnTextContent a,#templateUpperBody .mcnTextContent p a{color:#2BAADF;font-weight:normal;text-decoration:underline;} #templateColumns{background-color:#FF;border-top:0;border-bottom:0;padding-top:0;padding-bottom:0;} #templateColumns .columnContainer .mcnTextContent,#templateColumns .columnContainer .mcnTextContent p{color:#202020;font-family:Helvetica;font-size:16px;line-height:150%;text-align:left;} #templateColumns .columnContainer .mcnTextContent a,#templateColumns .columnContainer .mcnTextContent p a{color:#2BAADF;font-weight:normal;text-decoration:underline;} #templateLowerBody{background-color:#FF;border-top:0;border-bottom:2px solid #EAEAEA;padding-top:0;padding-bottom:9px;} #templateLowerBody .mcnTextContent,#templateLowerBody .mcnTextContent p{color:#202020;font-family:Helvetica;font-size:16px;line-height:150%;text-align:left;} #templateLowerBody .mcnTextContent a,#templateLowperBody .mcnTextContent p a{color:#2BAADF;font-weight:normal;text-decoration:underline;} #templateFooter{background-color:#FAFAFA;border-top:0;border-bottom:0;padding-top:9px;padding-bottom:9px;} #templateFooter .mcnTextContent,#templateFooter .mcnTextContent p{color:#656565;font-family:Helvetica;font-size:12px;line-height:150%;text-align:center;} #templateFooter .mcnTextContent a,#templateFooter .mcnTextContent p a{color:#656565;font-weight:normal;text-decoration:underline;} @media only screen and (min-width:768px){.templateContainer{width:600px !important;} }@media only screen and (max-width: 480px){body,table,td,p,a,li,blockquote{-webkit-text-size-adjust:none !important;} }@media only screen and (max-width: 480px){body{width:100% !important;min-width:100% !important;} }@media only screen and (max-width: 480px){#bodyCell{padding-top:10px !important;} }@media only screen and (max-width: 480px){.columnWrapper{max-width:100% !important;width:100% !important;} }@media only screen and (max-width: 480px){.mcnImage{width:100% !important;} }@media only screen and (max-width: 480px){.mcnCaptionTopContent
[ovs-dev] miniflow patches
Hi Jarno. Among the oldest patches in patchwork are a couple from Simon that add some mew miniflow related functions: https://patchwork.ozlabs.org/patch/570523/ https://patchwork.ozlabs.org/patch/570522/ I think you're the best one to review these, will you have a look? Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/3] lib/netdev-dpdk: increase ring name length for dpdkr ports
On Mon, Jan 25, 2016 at 10:43:16AM -0500, Aaron Conole wrote: > Mauricio Vasquez B writes: > > A ring name length of 10 characters is not enough for dpdkr ports > > starting from dpdkr10, then it is increased to RTE_RING_NAMESIZE > > characters. > > > > Signed-off-by: Mauricio Vasquez B > > > > --- > > lib/netdev-dpdk.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index ac81f2f..7caa70c 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -1939,7 +1939,7 @@ dpdk_ring_create(const char dev_name[], unsigned int > > port_no, > > unsigned int *eth_port_id) > > { > > struct dpdk_ring *ivshmem; > > -char ring_name[10]; > > +char ring_name[RTE_RING_NAMESIZE]; > > int err; > > > > ivshmem = dpdk_rte_mzalloc(sizeof *ivshmem); > > @@ -1948,7 +1948,7 @@ dpdk_ring_create(const char dev_name[], unsigned int > > port_no, > > } > > > > /* XXX: Add support for multiquque ring. */ > > -err = snprintf(ring_name, 10, "%s_tx", dev_name); > > +err = snprintf(ring_name, sizeof(ring_name), "%s_tx", dev_name); > > if (err < 0) { > > return -err; > > } > > @@ -1961,7 +1961,7 @@ dpdk_ring_create(const char dev_name[], unsigned int > > port_no, > > return ENOMEM; > > } > > > > -err = snprintf(ring_name, 10, "%s_rx", dev_name); > > +err = snprintf(ring_name, sizeof(ring_name), "%s_rx", dev_name); > > if (err < 0) { > > return -err; > > } > > LGTM. > > Acked-by: Aaron Conole Hi Daniele, this is still in patchwork, do you want to apply it? Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 1/2] lib/netdev-dpdk: make device name parsing more robust
Daniele, are you the right person to review and apply this pair of patches? Thanks, Ben. On Mon, Jan 25, 2016 at 02:55:57PM -0500, Mauricio Vasquez B wrote: > Current implementation of dpdk_dev_parse_name does not perform a robust > error handling, port names as "dpdkr", "dpdkr1x", "dpdkr 5" are considered > valid. > > With this path only positive port numbers in decimal notation are considered > valid. > > Signed-off-by: Mauricio Vasquez B > --- > v2: > - replace strtol by strtoul > - more strict parsing, ports as "dpdr 5" are now not valid. > Thanks to Aaron Conole! > lib/netdev-dpdk.c | 30 -- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index de7e488..5d09230 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -18,7 +18,9 @@ > > #include > #include > +#include > #include > +#include > #include > #include > #include > @@ -187,7 +189,7 @@ struct dpdk_ring { > /* For the client rings */ > struct rte_ring *cring_tx; > struct rte_ring *cring_rx; > -int user_port_id; /* User given port no, parsed from port name */ > +unsigned int user_port_id; /* User given port no, parsed from port name > */ > int eth_port_id; /* ethernet device port id */ > struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); > }; > @@ -636,18 +638,42 @@ unlock: > return err; > } > > +/* dev_name must be the prefix followed by a positive decimal number. > + * (no leading + or - signs are allowed) */ > static int > dpdk_dev_parse_name(const char dev_name[], const char prefix[], > unsigned int *port_no) > { > const char *cport; > +unsigned long port; > +char *endptr; > > if (strncmp(dev_name, prefix, strlen(prefix))) { > return ENODEV; > } > > +errno = 0; > cport = dev_name + strlen(prefix); > -*port_no = strtol(cport, NULL, 0); /* string must be null terminated */ > + > +if(!isdigit(cport[0])) { > +return ENODEV; > +} > + > +port = strtoul(cport, &endptr, 10); > + > +if(errno != 0) { > +return errno; > +} > + > +if(endptr == NULL || *endptr != '\0' || endptr == cport) { > +return ENODEV; > +} > + > +if(port > UINT_MAX) { > +return ENODEV; > +} > + > +*port_no = port; > return 0; > } > > -- > 1.9.1 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv1 1/4] ovn-northd: Support Logical_Port.addresses to store multiple ips in each set
On Fri, Jan 29, 2016 at 09:27:33PM +0530, Numan Siddique wrote: > If a logical port has two ipv4 addresses and one ipv6 address > it will be stored as ["MAC IPv41 IPv42 IPv61"] instead of > ["MAC IPv41", "MAC IPv42", "MAC IPv61"]. > > Signed-off-by: Numan Siddique This appears at first to add a lot of duplicate code in packets.c. Can, perhaps, the existing parsing functions be written in terms of the new functions? It would be better to avoid having a lot of cut-and-paste code if we can. extract_lport_addresses() appears to just silently ignore errors. It would be better to log them (rate-limited) so that administrators have a chance to catch misconfiguration. The description in ovn-nb.xml implies that IPv4 and IPv6 addresses can be mixed together, but the implementation in extract_lport_addresses() requires IPv4 addresses to precede IPv6 addresses. It would be better to allow the freer form. The code appears to parse IPv6 addresses and then completely ignore them. Code of the form if (p) { free(p); } can be simply written free(p); extract_lport_addresses() does a lot of malloc()s. At the very least, the malloc of struct lport_addresses itself is unnecessary: the caller can simply provide one on the stack, by pointer. Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv1 2/4] ovn: Add port_security proposal
On Fri, Jan 29, 2016 at 09:28:21PM +0530, Numan Siddique wrote: > From: Ben Pfaff > > Signed-off-by: Numan Siddique It's kind of odd to add this separate from implementing it. Usually, we add code and its documentation in the same commit. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/4] ovn-northd: Add l3 port security for IPv4 and ARP
On Fri, Jan 29, 2016 at 09:28:54PM +0530, Numan Siddique wrote: > For every port security defined for a logical port, add following lflows > in "ls_in_port_sec" and "ls_out_port_sec" stage >- A priority 90 flow to allow ipv4 traffic for known ip addresses > and (broadcast ip - for ingress, mainly for dhcp) >- A priority 80 flow to drop all ipv4 traffic. >- For ingress, a priority 90 flow to allow arp traffic for known > ip addresses and priority 80 flow to drop all arp traffic >- A priority 90 flow to allow ipv6 traffic for all ipv6 addresses if > port security has ipv6 address(es) defined > (next patch will address ipv6) >- A priority 80 flow to drop all ipv6 traffic. >- A priority 50 flow to allow all traffic on that port with the matching > eth address > > Eg. if the port security is "00:00:00:00:00:01 10.0.0.2" > > priority=90, match=(inport == "portname" && eth.src == 00:00:00:00:00:01 > && arp && arp.sha == 00:00:00:00:00:01 && (arp.spa == 10.0.0.2)), > action=(next;) > > priority=90, match=(inport == "portname" && eth.src == 00:00:00:00:00:01 > && ip4 && ((ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255) || > ip4.src == 10.0.0.3)), action=(next;) > > priority=80, match=(inport == "portname" && eth.src == 00:00:00:00:00:01 > && (arp || ip4)), action=(drop;) > > priority=80, match=(inport == "portname" && eth.src == 00:00:00:00:00:01 > && ip6), action=(drop;) > > priority=50, match=(inport == "portname" && eth.src == 00:00:00:00:00:01), > action=(next;) > > Signed-off-by: Numan Siddique Please update ovn-northd.8.xml to describe the new flows. Thanks for writing a test. I know that they're difficult to write. (I intend to work on this at some point.) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv1 4/4] ovn-northd: Add l3 port security for IPv6
On Fri, Jan 29, 2016 at 09:29:22PM +0530, Numan Siddique wrote: > For each lport, adds a priority 90 lflow in ls_in_port_sec and ls_out_port_sec > stages to allow ipv6 traffic for > - known ipv6 addresses > - link local address of the lport > - ip6 packet with ip6.src = :: and > - ip6.dst=ff00::/8 > > Signed-off-by: Numan Siddique Any particular reason this is a separate patch? Please update ovn-northd.8.xml to describe the new flows. Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC v2] netdev-dpdk: vhost-user: Fix sending packets to queues not enabled by guest.
On 20.02.2016 00:03, Daniele Di Proietto wrote: > Would it be safe to assume that the enabled queues are sequential? > In this case we could just play with 'real_n_txq' instead of keeping > a mapping and the patch would be simpler. I'm not sure if that's a > reasonable assumption though. I thought about that. IMHO, if we can support this virtio features, we must support them, because we can't force guest's applications to not disable random queues. > Also, on my system, with qemu-2.5.0, vring_state_changed() is called > after new_device(), leading to the following scenario: > > 2016-02-19T19:04:13.617Z|1|dpdk(vhost_thread2)|DBG|TX queue mapping > for /home/daniele/ovs/_run/run/dv0 > 2016-02-19T19:04:13.618Z|2|dpdk(vhost_thread2)|DBG| 0 --> 0 > 2016-02-19T19:04:13.618Z|3|dpdk(vhost_thread2)|INFO|State of queue 0 ( > tx_qid 0 ) of vhost device '/home/daniele/ovs/_run/run/dv0' 0 changed to > 'enabled' > 2016-02-19T19:04:13.618Z|4|dpdk(vhost_thread2)|DBG|TX queue mapping > for /home/daniele/ovs/_run/run/dv0 > 2016-02-19T19:04:13.618Z|5|dpdk(vhost_thread2)|DBG| 0 --> 0 > 2016-02-19T19:04:13.618Z|6|dpdk(vhost_thread2)|INFO|State of queue 2 ( > tx_qid 1 ) of vhost device '/home/daniele/ovs/_run/run/dv0' 0 changed to > 'disabled' > 2016-02-19T19:04:13.619Z|7|dpdk(vhost_thread2)|DBG|TX queue mapping > for /home/daniele/ovs/_run/run/dv1 > 2016-02-19T19:04:13.619Z|8|dpdk(vhost_thread2)|DBG| 0 --> 0 > 2016-02-19T19:04:13.619Z|9|dpdk(vhost_thread2)|INFO|State of queue 0 ( > tx_qid 0 ) of vhost device '/home/daniele/ovs/_run/run/dv1' 1 changed to > 'enabled' > 2016-02-19T19:04:13.619Z|00010|dpdk(vhost_thread2)|DBG|TX queue mapping > for /home/daniele/ovs/_run/run/dv1 > 2016-02-19T19:04:13.619Z|00011|dpdk(vhost_thread2)|DBG| 0 --> 0 > 2016-02-19T19:04:13.619Z|00012|dpdk(vhost_thread2)|INFO|State of queue 2 ( > tx_qid 1 ) of vhost device '/home/daniele/ovs/_run/run/dv1' 1 changed to > 'disabled' > 2016-02-19T19:04:13.619Z|00013|dpdk(vhost_thread2)|INFO|vHost Device > '/home/daniele/ovs/_run/run/dv0' 0 has been added > 2016-02-19T19:04:13.620Z|00014|dpdk(vhost_thread2)|INFO|vHost Device > '/home/daniele/ovs/_run/run/dv1' 1 has been added > > > tx_qid 1 of both devices is still mapped on -1, causing the packets to > be lost Thanks for pointing that. There is an inconsistency between dpdk_tx_queue-s and real virtqueues. I'll fix that. > One more comment inline. > > Thanks, > > Daniele > > On 11/02/2016 22:21, "Ilya Maximets" wrote: > >> Currently virtio driver in guest operating system have to be configured >> to use exactly same number of queues. If number of queues will be less, >> some packets will get stuck in queues unused by guest and will not be >> received. >> >> Fix that by using new 'vring_state_changed' callback, which is >> available for vhost-user since DPDK 2.2. >> Implementation uses additional mapping from configured tx queues to >> enabled by virtio driver. This requires mandatory locking of TX queues >> in __netdev_dpdk_vhost_send(), but this locking was almost always anyway >> because of calling set_multiq with n_txq = 'ovs_numa_get_n_cores() + 1'. >> >> Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support") >> Signed-off-by: Ilya Maximets >> Reviewed-by: Aaron Conole >> Acked-by: Flavio Leitner >> --- >> lib/netdev-dpdk.c | 105 >> +++--- >> 1 file changed, 92 insertions(+), 13 deletions(-) >> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index e4f789b..c50b1dd 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -173,6 +173,8 @@ struct dpdk_tx_queue { >> * from concurrent access. It is >> used only >> * if the queue is shared among >> different >> * pmd threads (see >> 'txq_needs_locking'). */ >> +int map; /* Mapping of configured vhost-user >> queues >> +* to enabled by guest. */ >> uint64_t tsc; >> struct rte_mbuf *burst_pkts[MAX_TX_QUEUE_LEN]; >> }; >> @@ -559,6 +561,7 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *netdev, >> unsigned int n_txqs) >> unsigned i; >> >> netdev->tx_q = dpdk_rte_mzalloc(n_txqs * sizeof *netdev->tx_q); >> + >> for (i = 0; i < n_txqs; i++) { >> int numa_id = ovs_numa_get_numa_id(i); >> >> @@ -572,6 +575,9 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *netdev, >> unsigned int n_txqs) >> /* Queues are shared among CPUs. Always flush */ >> netdev->tx_q[i].flush_tx = true; >> } >> + >> +/* Initialize map for vhost devices. */ >> +netdev->tx_q[i].map = -1; >> rte_spinlock_init(&netdev->tx_q[i].tx_lock); >> } >> } >> @@ -1117,17 +1123,16 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, >> int qid, >> unsigned int total_pkts = cnt; >> uint64_t start = 0; >> >> -if (OVS_UNLIKELY
Re: [ovs-dev] [PATCH] rhel: Add 'rpm-fedora' and 'rpm-fedora-kmod' targets
On Fri, Feb 19, 2016 at 02:30:31PM -0500, Lance Richardson wrote: > Add make targets for Fedora and RHEL7 RPMs, update INSTALL.Fedora.md > to document their use > > Added distribution tarball and rpm build directory to .gitignore. > > Signed-off-by: Lance Richardson This seems reasonable to me. Russell or Flavio (or someone else who knows RPM and Red Hat better than me), you want to review this? Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH/RFC] openvswitch: loosen restriction of output of MPLS to tunnel vports
On Tue, Feb 16, 2016 at 02:53:39PM -0800, Jesse Gross wrote: > On Fri, Feb 12, 2016 at 11:25 AM, Simon Horman > wrote: > > If an skb was not MPLS initially then it may be GSO and in that case if it > > became MPLS then GSO can't be performed because both MPLS and tunnels make > > use of the inner_protocol field of struct skbuff in order to allow GSO to > > be performed in the inner packet. > > > > On the other hand if an skb was MPLS initially then it will not be GSO, > > as there is no support for GRO for MPLS. Thus in this case it is safe > > to allow output of MPLS on tunnel vports. > > > > Signed-off-by: Simon Horman > > I don't think that any tunnel implementations expose support for MPLS > offloads as part of their features. In that case, if we have an MPLS > GSO packet (regardless of how it came to be), I think it will be > broken apart in software before encapsulation. At that point, it > should be safe for the tunnel to overwrite any fields MPLS was > previously using for offloading. As a result, I believe we can allow > all combinations of MPLS with tunnels. (Note that historically this > wasn't true, the change is a result of lightweight tunnels.) Hi Jesse, wow, that does sound very promising. I would certainly be in favour of allowing MPLS with tunnels. I am wondering if you could point me in the general direction of the changes you mention above. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v9 7/7] userspace: add non-tap (l3) support to GRE vports
On Tue, Feb 16, 2016 at 11:15:20PM +0100, Jiri Benc wrote: > Sorry for the late answer, was busy with a conference and internal > meetings in the past two weeks. > > On Tue, 2 Feb 2016 17:15:15 -0800, Jesse Gross wrote: > > I think this sounds like a good idea if we can find a way to do it > > cleanly. From OVS's perspective, the main thing that we need is a way > > to indicate the first header that we expect to see. We used to have > > this in struct tnl_ptk_info proto but that is no longer exposed to > > OVS. We also want to make sure that a device that is configured in > > this mode behaves in a logical way when not connected to OVS - i.e. it > > knows whether to emit ARP for L2 ports but not L3. I suppose now that > > lightweight tunneling is here both interfaces are common and therefore > > the problem is the same in each case, which is a good thing. > > There's only one way to solve this cleanly in the kernel. The L2 vs. L3 > mode has to be selected while creating the tunnel interface and cannot > be changed afterwards (only by deleting and recreating the interface). > The reason is that the L3 interface needs to be of ARHRD_NONE type > instead of ARPHRD_ETHER. With additional flags set by the kernel > (IFF_NOARP in particular), this works as expected out of ovs, e.g. for > route based encapsulation. > > It's not possible to mix those two in a single interface. E.g. for > VXLAN-GPE, it's either Ethernet header is encapsulated or not for a > given interface (and thus for a given vport), and never both. If we did > that, such interface wouldn't work standalone, outside of ovs. > > I don't think it's a problem. The information whether Ethernet header > info is provided in the flow key or not can be directly deduced from the > net_device type. It's quite generic this way: if it's ARPHRD_ETHER, > there's Ethernet header, if it's ARPHRD_NONE, no L2 header is > available. In the future, it's easy to add different L2 transports if > desired in the same way. > > The user has to request the L2 or L3 mode when creating the VXLAN-GPE > interface. This will be the same for L3 Geneve, and likely for GRE, too > (I'll have to check the current implementation of that one). So yes, > we'll need a way to distinguish this when creating the vport. Either a > new vport type, or an L3 flag. It makes sense, actually: the vports are > very different, e.g. different flow rules are needed for L2 and L3 > tunnels (for L3, push_eth when switching to Ethernet will be needed to > be configured at least). > > Of course, currently, the kernel datapath allows only ARPHRD_ETHER > interfaces to be added to the ovs bridge and this will need to be > changed. I was hoping that my idea would work; thanks for saving me the effort of implementing it only to find out that it has the problem you describe. I think that a mode switch is possible, and perhaps it is the best way forwards. But I was hoping to arrange things so that L3 and L2 GRE vports could be used simultaneously, which is why I went for different vport types: unfortunately this complicated the user-space code. At this point I see three options. Jesse, do you have a preference? 1. Use a vport mode for L3 GRE as Jiri suggests. This seems like it may lead to the cleanest implementation. We could later move away from this approach if there is a need for L3 and L2 GRE to co-exist. 2. Add a vport type for L3 GRE as this patch does (or otherwise). This seems a bit more complex than 1, with the caveat that I'm yet to implement 1. But its also more flexible as it allows L3 and L2 GRE to co-exist. 3. Ok, I have no 3rd option other than "something else". > > Jiri (cc'ed) is working on GPE and NSH support to VXLAN at the moment. > > I think this is very closely related and complementary as it also > > depends on sending non-Ethernet frames to OVS. He might have some > > ideas on how to handle this. > > Jiri > > -- > Jiri Benc ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 12/14] expr: Generalize wording of error message in expand_symbol().
On Mon, Dec 21, 2015 at 03:25:42PM -0800, Justin Pettit wrote: > > > On Dec 8, 2015, at 5:08 PM, Ben Pfaff wrote: > > > > The existing wording was very specific to the actual operation being > > performed. While this is nice for users, it becomes difficult to maintain > > as more and more operations are added. This commit makes the wording less > > specific, because a third operation will start using this function in an > > upcoming commit. > > > > Signed-off-by: Ben Pfaff > > Acked-by: Justin Pettit Thanks, I applied this patch. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 13/14] ovn: Reorganize action parsing test.
On Mon, Dec 21, 2015 at 03:26:51PM -0800, Justin Pettit wrote: > > > On Dec 8, 2015, at 5:08 PM, Ben Pfaff wrote: > > > > It seems easier to understand if all of the tests for a given action > > are grouped together. > > > > Signed-off-by: Ben Pfaff > > Acked-by: Justin Pettit Thanks, applied. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v4 04/14] ofp-util: Rename struct ofputil_packet_in member 'len' to 'packet_len'.
An upcoming commit will introduce another member that has a length, and it seems weird that bare 'len' would be one or the other. Signed-off-by: Ben Pfaff --- lib/learning-switch.c| 2 +- lib/ofp-print.c | 10 +- lib/ofp-util.c | 42 +- lib/ofp-util.h | 2 +- ofproto/fail-open.c | 2 +- ofproto/ofproto-dpif-xlate.c | 2 +- 6 files changed, 30 insertions(+), 30 deletions(-) diff --git a/lib/learning-switch.c b/lib/learning-switch.c index 0027100..8f194b3 100644 --- a/lib/learning-switch.c +++ b/lib/learning-switch.c @@ -540,7 +540,7 @@ process_packet_in(struct lswitch *sw, const struct ofp_header *oh) } /* Extract flow data from 'pi' into 'flow'. */ -dp_packet_use_const(&pkt, pi.packet, pi.len); +dp_packet_use_const(&pkt, pi.packet, pi.packet_len); flow_extract(&pkt, &flow); flow.in_port.ofp_port = pi.flow_metadata.flow.in_port.ofp_port; flow.tunnel.tun_id = pi.flow_metadata.flow.tunnel.tun_id; diff --git a/lib/ofp-print.c b/lib/ofp-print.c index 3195376..74f0de6 100644 --- a/lib/ofp-print.c +++ b/lib/ofp-print.c @@ -127,27 +127,27 @@ ofp_print_packet_in(struct ds *string, const struct ofp_header *oh, reasonbuf, sizeof reasonbuf)); -ds_put_format(string, " data_len=%"PRIuSIZE, pin.len); +ds_put_format(string, " data_len=%"PRIuSIZE, pin.packet_len); if (buffer_id == UINT32_MAX) { ds_put_format(string, " (unbuffered)"); -if (total_len != pin.len) { +if (total_len != pin.packet_len) { ds_put_format(string, " (***total_len != data_len***)"); } } else { ds_put_format(string, " buffer=0x%08"PRIx32, buffer_id); -if (total_len < pin.len) { +if (total_len < pin.packet_len) { ds_put_format(string, " (***total_len < data_len***)"); } } ds_put_char(string, '\n'); if (verbosity > 0) { -char *packet = ofp_packet_to_string(pin.packet, pin.len); +char *packet = ofp_packet_to_string(pin.packet, pin.packet_len); ds_put_cstr(string, packet); free(packet); } if (verbosity > 2) { -ds_put_hex_dump(string, pin.packet, pin.len, 0, false); +ds_put_hex_dump(string, pin.packet, pin.packet_len, 0, false); } } diff --git a/lib/ofp-util.c b/lib/ofp-util.c index d9dc0ad..38ee117 100644 --- a/lib/ofp-util.c +++ b/lib/ofp-util.c @@ -3325,7 +3325,7 @@ decode_nx_packet_in2(const struct ofp_header *oh, switch (type) { case NXPINT_PACKET: pin->packet = payload.msg; -pin->len = ofpbuf_msgsize(&payload); +pin->packet_len = ofpbuf_msgsize(&payload); break; case NXPINT_FULL_LEN: { @@ -3368,12 +3368,12 @@ decode_nx_packet_in2(const struct ofp_header *oh, } } -if (!pin->len) { +if (!pin->packet_len) { VLOG_WARN_RL(&bad_ofmsg_rl, "NXT_PACKET_IN2 lacks packet"); return OFPERR_OFPBRC_BAD_LEN; } else if (!*total_len) { -*total_len = pin->len; -} else if (*total_len < pin->len) { +*total_len = pin->packet_len; +} else if (*total_len < pin->packet_len) { VLOG_WARN_RL(&bad_ofmsg_rl, "NXT_PACKET_IN2 claimed full_len < len"); return OFPERR_OFPBRC_BAD_LEN; } @@ -3422,14 +3422,14 @@ ofputil_decode_packet_in(const struct ofp_header *oh, } pin->packet = b.data; -pin->len = b.size; +pin->packet_len = b.size; } else if (raw == OFPRAW_OFPT10_PACKET_IN) { const struct ofp10_packet_in *opi; opi = ofpbuf_pull(&b, offsetof(struct ofp10_packet_in, data)); pin->packet = CONST_CAST(uint8_t *, opi->data); -pin->len = b.size; +pin->packet_len = b.size; match_init_catchall(&pin->flow_metadata); match_set_in_port(&pin->flow_metadata, @@ -3445,7 +3445,7 @@ ofputil_decode_packet_in(const struct ofp_header *oh, opi = ofpbuf_pull(&b, sizeof *opi); pin->packet = b.data; -pin->len = b.size; +pin->packet_len = b.size; *buffer_id = ntohl(opi->buffer_id); error = ofputil_port_from_ofp11(opi->in_port, &in_port); @@ -3480,7 +3480,7 @@ ofputil_decode_packet_in(const struct ofp_header *oh, *total_len = ntohs(npi->total_len); pin->packet = b.data; -pin->len = b.size; +pin->packet_len = b.size; } else if (raw == OFPRAW_NXT_PACKET_IN2) { return decode_nx_packet_in2(oh, pin, total_len, buffer_id); } else { @@ -3525,9 +3525,9 @@ ofputil_encode_ofp10_packet_in(const struct ofputil_packet_in *pin, struct ofpbuf *msg; msg = ofpraw_alloc_xid(OFPRAW_OFPT10_PACKET_IN, OFP10_VERSION, - htonl(0), pin->len); +
[ovs-dev] [PATCH v4 02/14] ofp-util: Remove 'const' from struct ofputil_packet_in's 'packet' member.
It's not const in all cases so it doesn't entirely make sense to mark it const here. Signed-off-by: Ben Pfaff --- lib/ofp-util.c| 2 +- lib/ofp-util.h| 2 +- ofproto/connmgr.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/ofp-util.c b/lib/ofp-util.c index 23f7eda..d057915 100644 --- a/lib/ofp-util.c +++ b/lib/ofp-util.c @@ -3341,7 +3341,7 @@ ofputil_decode_packet_in(const struct ofp_header *oh, opi = ofpbuf_pull(&b, offsetof(struct ofp10_packet_in, data)); -pin->packet = opi->data; +pin->packet = CONST_CAST(uint8_t *, opi->data); pin->len = b.size; match_init_catchall(&pin->flow_metadata); diff --git a/lib/ofp-util.h b/lib/ofp-util.h index 866e1bc..19bfc4b 100644 --- a/lib/ofp-util.h +++ b/lib/ofp-util.h @@ -426,7 +426,7 @@ struct ofputil_packet_in { * On decoding, the 'len' bytes in 'packet' might only be the first part of * the original packet. ofputil_decode_packet_in() reports the full * original length of the packet using its 'total_len' output parameter. */ -const void *packet; /* The packet. */ +void *packet; /* The packet. */ size_t len; /* Length of 'packet' in bytes. */ /* Input port and other metadata for packet. */ diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c index d4f64b2..cc947f0 100644 --- a/ofproto/connmgr.c +++ b/ofproto/connmgr.c @@ -2247,6 +2247,6 @@ ofmonitor_wait(struct connmgr *mgr) void ofproto_async_msg_free(struct ofproto_async_msg *am) { -free(CONST_CAST(void *, am->pin.up.packet)); +free(am->pin.up.packet); free(am); } -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v4 01/14] ofpbuf: New function ofpbuf_const_initializer().
A number of times I've looked at code and thought that it would be easier to understand if I could write an initializer instead of ofpbuf_use_const(). This commit adds a function for that purpose and adapts a lot of code to use it, in the places where I thought it made the code better. In theory this could improve code generation since the new function can be inlined whereas ofpbuf_use_const() isn't. But I guess that's probably insignificant; the intent of this change is code readability. Signed-off-by: Ben Pfaff --- lib/dpif-netlink.c | 72 -- lib/learning-switch.c| 6 +- lib/netdev-windows.c | 18 +-- lib/netlink-socket.c | 9 +- lib/nx-match.c | 15 +- lib/ofp-actions.c| 79 --- lib/ofp-errors.c | 8 +- lib/ofp-msgs.c | 4 +- lib/ofp-print.c | 67 +++-- lib/ofp-util.c | 318 +++ lib/ofpbuf.h | 26 +++- ofproto/ofproto-dpif-xlate.c | 8 +- ofproto/ofproto.c| 35 +++-- ovn/controller/pinctrl.c | 9 ++ utilities/ovs-ofctl.c| 8 +- 15 files changed, 258 insertions(+), 424 deletions(-) diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 5e48393..f1eaa51 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc. + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -1972,18 +1972,12 @@ parse_odp_packet(const struct dpif_netlink *dpif, struct ofpbuf *buf, [OVS_PACKET_ATTR_MRU] = { .type = NL_A_U16, .optional = true } }; -struct ovs_header *ovs_header; -struct nlattr *a[ARRAY_SIZE(ovs_packet_policy)]; -struct nlmsghdr *nlmsg; -struct genlmsghdr *genl; -struct ofpbuf b; -int type; - -ofpbuf_use_const(&b, buf->data, buf->size); +struct ofpbuf b = ofpbuf_const_initializer(buf->data, buf->size); +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); +struct genlmsghdr *genl = ofpbuf_try_pull(&b, sizeof *genl); +struct ovs_header *ovs_header = ofpbuf_try_pull(&b, sizeof *ovs_header); -nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); -genl = ofpbuf_try_pull(&b, sizeof *genl); -ovs_header = ofpbuf_try_pull(&b, sizeof *ovs_header); +struct nlattr *a[ARRAY_SIZE(ovs_packet_policy)]; if (!nlmsg || !genl || !ovs_header || nlmsg->nlmsg_type != ovs_packet_family || !nl_policy_parse(&b, 0, ovs_packet_policy, a, @@ -1991,9 +1985,9 @@ parse_odp_packet(const struct dpif_netlink *dpif, struct ofpbuf *buf, return EINVAL; } -type = (genl->cmd == OVS_PACKET_CMD_MISS ? DPIF_UC_MISS -: genl->cmd == OVS_PACKET_CMD_ACTION ? DPIF_UC_ACTION -: -1); +int type = (genl->cmd == OVS_PACKET_CMD_MISS ? DPIF_UC_MISS +: genl->cmd == OVS_PACKET_CMD_ACTION ? DPIF_UC_ACTION +: -1); if (type < 0) { return EINVAL; } @@ -2469,18 +2463,14 @@ dpif_netlink_vport_from_ofpbuf(struct dpif_netlink_vport *vport, [OVS_VPORT_ATTR_OPTIONS] = { .type = NL_A_NESTED, .optional = true }, }; -struct nlattr *a[ARRAY_SIZE(ovs_vport_policy)]; -struct ovs_header *ovs_header; -struct nlmsghdr *nlmsg; -struct genlmsghdr *genl; -struct ofpbuf b; - dpif_netlink_vport_init(vport); -ofpbuf_use_const(&b, buf->data, buf->size); -nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); -genl = ofpbuf_try_pull(&b, sizeof *genl); -ovs_header = ofpbuf_try_pull(&b, sizeof *ovs_header); +struct ofpbuf b = ofpbuf_const_initializer(buf->data, buf->size); +struct nlmsghdr *nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); +struct genlmsghdr *genl = ofpbuf_try_pull(&b, sizeof *genl); +struct ovs_header *ovs_header = ofpbuf_try_pull(&b, sizeof *ovs_header); + +struct nlattr *a[ARRAY_SIZE(ovs_vport_policy)]; if (!nlmsg || !genl || !ovs_header || nlmsg->nlmsg_type != ovs_vport_family || !nl_policy_parse(&b, 0, ovs_vport_policy, a, @@ -2637,18 +2627,14 @@ dpif_netlink_dp_from_ofpbuf(struct dpif_netlink_dp *dp, const struct ofpbuf *buf .optional = true }, }; -struct nlattr *a[ARRAY_SIZE(ovs_datapath_policy)]; -struct ovs_header *ovs_header; -struct nlmsghdr *nlmsg; -struct genlmsghdr *genl; -struct ofpbuf b; - dpif_netlink_dp_init(dp); -ofpbuf_use_const(&b, buf->data, buf->size); -nlmsg = ofpbuf_try_pull(&b, sizeof *nlmsg); -genl = ofpbuf_try_pull(&b, sizeof *genl); -ovs_header = ofpbuf_try_pull(&b, sizeof *ovs_header); +struct ofpbuf b = ofpbuf_const_initializer(buf->data, buf->size); +struct nlmsghdr *nlmsg = ofpbuf_try
[ovs-dev] [PATCH v4 00/14] Implement continuation (aka closure) support
This series is also available at: https://github.com/blp/ovs-reviews/tree/l3-4 g...@github.com:blp/ovs-reviews.git l3-4 This is the fourth revision of my "closures" series. The first version was: http://openvswitch.org/pipermail/dev/2016-January/064607.html The second version, unfortunately not marked v2, was: https://patchwork.ozlabs.org/patch/574946/ and the v1->v2 changes were described at: http://openvswitch.org/pipermail/dev/2016-January/064875.html Changes v2->v3: - Patches 1 through 6 are new. - There's a terminology change starting in patch 6. Now the process of pausing translation is called "freezing", and there are two users of freezing, which are recirculation and closures. - I added the ability to provide arbitrary user data on NXAST_PAUSE which is passed to the controller in a property on NXT_CLOSURE. - I made all the changes described in: http://openvswitch.org/pipermail/dev/2016-February/065984.html - I added a few more tests to ofp-print.at and ofp-actions.at. - Probably other refinements I've forgotten. Changes v3->v4: - Several patches were applied and thus dropped from the series. - I added the previous L3 ARP patches at the end, revised (see below). - I had a major change of heart on the approach here. Until now, I had an idea that "closures" were fundamentally different from OpenFlow "packet-ins" and thus deserved separate but parallel infrastructure. Now that I started to use them for ARP, I realized that there in fact isn't that much difference. Thus, this revision merges closures into packet-ins and drops the "closure" naming. Instead, what was previously a closure is just a packet-in that includes a "continuation", that is, the state that can be used to resume the pipeline. - The ARP series from before is tacked on the end here, but it is revised to use packet-ins instead of adding a specialized OpenFlow action for ARP. - The ARP patches (patches 8 and later) are not as mature as the rest of the series. In particular, I haven't been able to figure out why the final patch is necessary; it may indicate a bug in my test for the ARP support. In testing, I've definitely figured out that debugging is harder than it should be. Ben Pfaff (14): ofpbuf: New function ofpbuf_const_initializer(). ofp-util: Remove 'const' from struct ofputil_packet_in's 'packet' member. Implement new packet-in format NXT_PACKET_IN2. ofp-util: Rename struct ofputil_packet_in member 'len' to 'packet_len'. Support userdata in NXT_PACKET_IN2. ofp-prop: Add support for putting and parsing nested properties. Implement serializing the state of packet traversal in "continuations". pinctrl: Fix header guard. ofp-actions: Introduce macro for padding struct members. actions: Implement OVN "arp" action. ovn: Use callback function instead of simap for logical port number map. ovn-controller: Add data structure for indexing lports, multicast groups. ovn: Implement basic ARP support for L3 logical routers. [RFC] lflow: Disable egress table optimization. NEWS| 5 + include/openflow/nicira-ext.h | 136 +- lib/dpif-netlink.c | 72 ++- lib/learning-switch.c | 11 +- lib/meta-flow.c | 9 +- lib/meta-flow.h | 3 +- lib/netdev-windows.c| 18 +- lib/netlink-socket.c| 9 +- lib/nx-match.c | 15 +- lib/ofp-actions.c | 269 --- lib/ofp-actions.h | 67 +-- lib/ofp-errors.c| 8 +- lib/ofp-errors.h| 16 +- lib/ofp-msgs.c | 4 +- lib/ofp-msgs.h | 7 + lib/ofp-print.c | 150 --- lib/ofp-prop.c | 49 ++ lib/ofp-prop.h | 4 + lib/ofp-util.c | 957 lib/ofp-util.h | 65 ++- lib/ofpbuf.h| 26 +- lib/packets.c | 38 +- lib/packets.h | 3 +- lib/rconn.c | 3 +- ofproto/connmgr.c | 26 +- ofproto/connmgr.h | 2 +- ofproto/fail-open.c | 16 +- ofproto/ofproto-dpif-xlate.c| 211 +++-- ofproto/ofproto-dpif-xlate.h| 4 + ofproto/ofproto-dpif.c | 35 ++ ofproto/ofproto-provider.h | 3 + ofproto/ofproto.c | 61 ++- ovn/TODO| 112 + ovn/controller/automake.mk | 2 + ovn/controller/lflow.c | 251 +-- ovn/controller/lflow.h | 11 +- ovn/controller/lport.c | 157 +++ ovn/controller/lport.h | 67 +++ ovn/controller/ovn-controller.c | 47 +- ovn/controller/pinctrl.c| 323 -- ovn/controller/pinctrl.h| 14 +- ovn/lib/actions.c
[ovs-dev] [PATCH v4 03/14] Implement new packet-in format NXT_PACKET_IN2.
Signed-off-by: Ben Pfaff --- NEWS | 1 + include/openflow/nicira-ext.h | 43 - lib/ofp-msgs.h| 3 + lib/ofp-util.c| 214 ++ ofproto/connmgr.c | 4 +- ofproto/ofproto.c | 2 +- tests/ofp-print.at| 22 - tests/ofproto-dpif.at | 174 +- tests/ofproto.at | 53 --- tests/tunnel-push-pop.at | 2 +- utilities/ovs-ofctl.8.in | 49 +++--- utilities/ovs-ofctl.c | 70 +++--- 12 files changed, 444 insertions(+), 193 deletions(-) diff --git a/NEWS b/NEWS index 18fca10..696168b 100644 --- a/NEWS +++ b/NEWS @@ -5,6 +5,7 @@ Post-v2.5.0 - OpenFlow: * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY. * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported. + * New property-based packet-in message format NXT_PACKET_IN2. - ovs-ofctl: * queue-get-config command now allows a queue ID to be specified. - DPDK: diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h index dad8707..bcc8758 100644 --- a/include/openflow/nicira-ext.h +++ b/include/openflow/nicira-ext.h @@ -1,5 +1,5 @@ /* - * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc. + * Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -179,11 +179,25 @@ struct nx_flow_mod_table_id { OFP_ASSERT(sizeof(struct nx_flow_mod_table_id) == 8); enum nx_packet_in_format { -NXPIF_OPENFLOW10 = 0, /* Standard OpenFlow 1.0 compatible. */ -NXPIF_NXM = 1 /* Nicira Extended. */ +NXPIF_STANDARD = 0, /* OFPT_PACKET_IN for this OpenFlow version. */ +NXPIF_NXT_PACKET_IN = 1,/* NXT_PACKET_IN (since OVS v1.1). */ +NXPIF_NXT_PACKET_IN2 = 2, /* NXT_PACKET_IN2 (since OVS v2.6). */ }; -/* NXT_SET_PACKET_IN_FORMAT request. */ +/* NXT_SET_PACKET_IN_FORMAT request. + * + * For any given OpenFlow version, Open vSwitch supports multiple formats for + * "packet-in" messages. The default is always the standard format for the + * OpenFlow version in question, but NXT_SET_PACKET_IN_FORMAT can be used to + * set an alternative format. + * + * From OVS v1.1 to OVS v2.5, this request was only honored for OpenFlow 1.0. + * Requests to set format NXPIF_NXT_PACKET_IN were accepted for OF1.1+ but they + * had no effect. (Requests to set formats other than NXPIF_STANDARD or + * NXPIF_NXT_PACKET_IN2 were rejected with OFPBRC_EPERM.) + * + * From OVS v2.6 onward, this request is honored for all OpenFlow versions. + */ struct nx_set_packet_in_format { ovs_be32 format;/* One of NXPIF_*. */ }; @@ -246,6 +260,27 @@ struct nx_packet_in { }; OFP_ASSERT(sizeof(struct nx_packet_in) == 24); +/* NXT_PACKET_IN2. + * + * NXT_PACKET_IN2 is conceptually similar to OFPT_PACKET_IN but it is expressed + * as an extensible set of properties instead of using a fixed structure. + * + * Added in Open vSwitch 2.6. */ +enum nx_packet_in2_prop_type { +/* Packet. */ +NXPINT_PACKET, /* Raw packet data. */ +NXPINT_FULL_LEN,/* ovs_be32: Full packet len, if truncated. */ +NXPINT_BUFFER_ID, /* ovs_be32: Buffer ID, if buffered. */ + +/* Information about the flow that triggered the packet-in. */ +NXPINT_TABLE_ID,/* uint8_t: Table ID. */ +NXPINT_COOKIE, /* ovs_be64: Flow cookie (all-1s if none). */ + +/* Other. */ +NXPINT_REASON, /* uint8_t, one of OFPR_*. */ +NXPINT_METADATA,/* NXM or OXM for metadata fields. */ +}; + /* Configures the "role" of the sending controller. The default role is: * *- Other (NX_ROLE_OTHER), which allows the controller access to all diff --git a/lib/ofp-msgs.h b/lib/ofp-msgs.h index a15efb6..8ce1f35 100644 --- a/lib/ofp-msgs.h +++ b/lib/ofp-msgs.h @@ -152,6 +152,8 @@ enum ofpraw { OFPRAW_OFPT13_PACKET_IN, /* NXT 1.0+ (17): struct nx_packet_in, uint8_t[]. */ OFPRAW_NXT_PACKET_IN, +/* NXT 1.0+ (30): uint8_t[]. */ +OFPRAW_NXT_PACKET_IN2, /* OFPT 1.0 (11): struct ofp10_flow_removed. */ OFPRAW_OFPT10_FLOW_REMOVED, @@ -514,6 +516,7 @@ enum ofptype { * OFPRAW_OFPT11_PACKET_IN. * OFPRAW_OFPT12_PACKET_IN. * OFPRAW_OFPT13_PACKET_IN. + * OFPRAW_NXT_PACKET_IN2. * OFPRAW_NXT_PACKET_IN. */ OFPTYPE_FLOW_REMOVED,/* OFPRAW_OFPT10_FLOW_REMOVED. * OFPRAW_OFPT11_FLOW_REMOVED. diff --git a/lib/ofp-util.c b/lib/ofp-util.c index d057915..d9dc0ad 100644 --- a/lib/ofp-util.
[ovs-dev] [PATCH v4 06/14] ofp-prop: Add support for putting and parsing nested properties.
It hadn't occurred to me before that any special support was actually necessary or useful for nested properties, but the functions introduced in this commit are nice wrappers to deal with the extra 4-byte padding that ensures that the nested properties begin on 8-byte boundaries just like the outer properties. Signed-off-by: Ben Pfaff --- lib/ofp-prop.c | 49 + lib/ofp-prop.h | 4 2 files changed, 53 insertions(+) diff --git a/lib/ofp-prop.c b/lib/ofp-prop.c index d6fd3a6..b6395b6 100644 --- a/lib/ofp-prop.c +++ b/lib/ofp-prop.c @@ -265,6 +265,27 @@ ofpprop_parse_uuid(const struct ofpbuf *property, struct uuid *uuid) return 0; } +/* Attempts to parse 'property' as a property that contains nested properties. + * If successful, stores the nested data into '*nested' and returns 0; + * otherwise returns an OpenFlow error. + * + * The only thing special about nested properties is that the property header + * is followed by 4 bytes of padding, so that the nested properties begin at an + * 8-byte aligned offset. This function can be used in other situations where + * this is the case. */ +enum ofperr +ofpprop_parse_nested(const struct ofpbuf *property, struct ofpbuf *nested) +{ +size_t nested_offset = ROUND_UP(ofpbuf_headersize(property), 8); +if (property->size < nested_offset) { +return OFPERR_OFPBPC_BAD_LEN; +} + +ofpbuf_use_const(nested, property->data, property->size); +ofpbuf_pull(nested, nested_offset); +return 0; +} + /* Adds a property with the given 'type' and 'len'-byte contents 'value' to * 'msg', padding the property out to a multiple of 8 bytes. */ void @@ -392,6 +413,18 @@ ofpprop_put_uuid(struct ofpbuf *msg, uint64_t type, const struct uuid *uuid) ofpprop_put(msg, type, uuid, sizeof *uuid); } +/* Appends a property of type 'type' to 'msg' whose contents are padding to + * 8-byte alignment followed by 'nested'. This is a suitable way to add nested + * properties to 'msg'. */ +void +ofpprop_put_nested(struct ofpbuf *msg, uint64_t type, + const struct ofpbuf *nested) +{ +size_t start = ofpprop_start_nested(msg, type); +ofpbuf_put(msg, nested->data, nested->size); +ofpprop_end(msg, start); +} + /* Appends a header for a property of type 'type' to 'msg'. The caller should * add the contents of the property to 'msg', then finish it by calling * ofpprop_end(). Returns the offset of the beginning of the property (to pass @@ -429,6 +462,22 @@ ofpprop_end(struct ofpbuf *msg, size_t start_ofs) ofpbuf_padto(msg, ROUND_UP(msg->size, 8)); } +/* Appends a header for a property of type 'type' to 'msg', followed by padding + * suitable for putting nested properties into the property; that is, padding + * to an 8-byte alignment. + * + * This otherwise works like ofpprop_start(). + * + * There's no need for ofpprop_end_nested(), because ofpprop_end() works fine + * for this case. */ +size_t +ofpprop_start_nested(struct ofpbuf *msg, uint64_t type) +{ +size_t start_ofs = ofpprop_start(msg, type); +ofpbuf_put_zeros(msg, 4); +return start_ofs; +} + enum ofperr ofpprop_unknown(struct vlog_module *module, bool loose, const char *msg, uint64_t type) diff --git a/lib/ofp-prop.h b/lib/ofp-prop.h index 921b6c1..4f8e78d 100644 --- a/lib/ofp-prop.h +++ b/lib/ofp-prop.h @@ -85,6 +85,7 @@ enum ofperr ofpprop_parse_u16(const struct ofpbuf *, uint16_t *value); enum ofperr ofpprop_parse_u32(const struct ofpbuf *, uint32_t *value); enum ofperr ofpprop_parse_u64(const struct ofpbuf *, uint64_t *value); enum ofperr ofpprop_parse_uuid(const struct ofpbuf *, struct uuid *); +enum ofperr ofpprop_parse_nested(const struct ofpbuf *, struct ofpbuf *); /* Serializing properties. */ void ofpprop_put(struct ofpbuf *, uint64_t type, @@ -100,10 +101,13 @@ void ofpprop_put_u64(struct ofpbuf *, uint64_t type, uint64_t value); void ofpprop_put_bitmap(struct ofpbuf *, uint64_t type, uint64_t bitmap); void ofpprop_put_flag(struct ofpbuf *, uint64_t type); void ofpprop_put_uuid(struct ofpbuf *, uint64_t type, const struct uuid *); +void ofpprop_put_nested(struct ofpbuf *, uint64_t type, const struct ofpbuf *); size_t ofpprop_start(struct ofpbuf *, uint64_t type); void ofpprop_end(struct ofpbuf *, size_t start_ofs); +size_t ofpprop_start_nested(struct ofpbuf *, uint64_t type); + /* Logging errors while deserializing properties. * * The attitude that a piece of code should take when it deserializes an -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v4 08/14] pinctrl: Fix header guard.
Signed-off-by: Ben Pfaff --- ovn/controller/pinctrl.h | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/ovn/controller/pinctrl.h b/ovn/controller/pinctrl.h index 65d5dfe..fd279ff 100644 --- a/ovn/controller/pinctrl.h +++ b/ovn/controller/pinctrl.h @@ -1,4 +1,3 @@ - /* Copyright (c) 2015 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); @@ -14,8 +13,8 @@ * limitations under the License. */ -#ifndef DHCP_H -#define DHCP_H 1 +#ifndef PINCTRL_H +#define PINCTRL_H 1 #include @@ -31,4 +30,4 @@ void pinctrl_run(struct controller_ctx *ctx, void pinctrl_wait(void); void pinctrl_destroy(void); -#endif /* ovn/dhcp.h */ +#endif /* ovn/pinctrl.h */ -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v4 10/14] actions: Implement OVN "arp" action.
An upcoming commit will use this as a building block in adding ARP support to the OVN L3 logical router implementation. Signed-off-by: Ben Pfaff Conflicts: tests/ovn.at --- lib/ofp-actions.h | 30 +++ lib/packets.c | 38 + lib/packets.h | 3 +- ovn/controller/ovn-controller.c | 4 +- ovn/controller/pinctrl.c| 177 +++- ovn/controller/pinctrl.h| 5 +- ovn/lib/actions.c | 57 - ovn/lib/actions.h | 18 +++- ovn/ovn-sb.xml | 21 +++-- tests/ovn.at| 3 + 10 files changed, 274 insertions(+), 82 deletions(-) diff --git a/lib/ofp-actions.h b/lib/ofp-actions.h index 24143d3..55058ef 100644 --- a/lib/ofp-actions.h +++ b/lib/ofp-actions.h @@ -254,20 +254,22 @@ struct ofpact_output { * * Used for NXAST_CONTROLLER. */ struct ofpact_controller { -struct ofpact ofpact; -uint16_t max_len; /* Maximum length to send to controller. */ -uint16_t controller_id; /* Controller ID to send packet-in. */ -enum ofp_packet_in_reason reason; /* Reason to put in packet-in. */ - -/* If true, this action freezes packet traversal of the OpenFlow tables and - * adds a continuation to the packet-in message, that a controller can use - * to resume that traversal. */ -bool pause; - -/* Arbitrary data to include in the packet-in message (currently, only in - * NXT_PACKET_IN2). */ -uint16_t userdata_len; -uint8_t userdata[]; +OFPACT_PADDED_MEMBERS( +struct ofpact ofpact; +uint16_t max_len; /* Max length to send to controller. */ +uint16_t controller_id; /* Controller ID to send packet-in. */ +enum ofp_packet_in_reason reason; /* Reason to put in packet-in. */ + +/* If true, this action freezes packet traversal of the OpenFlow + * tables and adds a continuation to the packet-in message, that + * a controller can use to resume that traversal. */ +bool pause; + +/* Arbitrary data to include in the packet-in message (currently, + * only in NXT_PACKET_IN2). */ +uint16_t userdata_len; +); +uint8_t userdata[0]; }; /* OFPACT_ENQUEUE. diff --git a/lib/packets.c b/lib/packets.c index d82341d..b693f78 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015 Nicira, Inc. + * Copyright (c) 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -1191,35 +1191,45 @@ packet_format_tcp_flags(struct ds *s, uint16_t tcp_flags) * 'arp_op', 'arp_sha', 'arp_tha', 'arp_spa', and 'arp_tpa'. The outer * Ethernet frame is initialized with Ethernet source 'arp_sha' and destination * 'arp_tha', except that destination ff:ff:ff:ff:ff:ff is used instead if - * 'broadcast' is true. */ + * 'broadcast' is true. Points the L3 header to the ARP header. */ void compose_arp(struct dp_packet *b, uint16_t arp_op, const struct eth_addr arp_sha, const struct eth_addr arp_tha, bool broadcast, ovs_be32 arp_spa, ovs_be32 arp_tpa) { -struct eth_header *eth; -struct arp_eth_header *arp; +compose_arp__(b); + +struct eth_header *eth = dp_packet_l2(b); +eth->eth_dst = broadcast ? eth_addr_broadcast : arp_tha; +eth->eth_src = arp_sha; + +struct arp_eth_header *arp = dp_packet_l3(b); +arp->ar_op = htons(arp_op); +arp->ar_sha = arp_sha; +arp->ar_tha = arp_tha; +put_16aligned_be32(&arp->ar_spa, arp_spa); +put_16aligned_be32(&arp->ar_tpa, arp_tpa); +} +/* Clears 'b' and replaces its contents by an ARP frame. Sets the fields in + * the Ethernet and ARP headers that are fixed for ARP frames to those fixed + * values, and zeroes the other fields. Points the L3 header to the ARP + * header. */ +void +compose_arp__(struct dp_packet *b) +{ dp_packet_clear(b); dp_packet_prealloc_tailroom(b, ARP_PACKET_SIZE); dp_packet_reserve(b, 2 + VLAN_HEADER_LEN); -eth = dp_packet_put_uninit(b, sizeof *eth); -eth->eth_dst = broadcast ? eth_addr_broadcast : arp_tha; -eth->eth_src = arp_sha; +struct eth_header *eth = dp_packet_put_zeros(b, sizeof *eth); eth->eth_type = htons(ETH_TYPE_ARP); -arp = dp_packet_put_uninit(b, sizeof *arp); +struct arp_eth_header *arp = dp_packet_put_zeros(b, sizeof *arp); arp->ar_hrd = htons(ARP_HRD_ETHERNET); arp->ar_pro = htons(ARP_PRO_IP); arp->ar_hln = sizeof arp->ar_sha; arp->ar_pln = sizeof arp->ar_spa; -arp->ar_op = htons(arp_op); -arp->ar_sha = arp_sha; -arp->ar_tha = arp_tha; - -put_16aligned_be32(&arp->ar_spa, arp_spa); -put_16aligned_be32(&arp->ar_tpa, arp_tpa); dp_packet_reset_offsets(b); dp_packet
[ovs-dev] [PATCH v4 09/14] ofp-actions: Introduce macro for padding struct members.
An upcoming commit will add another case where it's desirable to ensure that a variable-length array is aligned on an 8-byte boundary. This macro makes that a little easier. Signed-off-by: Ben Pfaff CC: Joe Stringer --- lib/ofp-actions.h | 43 ++- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/lib/ofp-actions.h b/lib/ofp-actions.h index 58d7857..24143d3 100644 --- a/lib/ofp-actions.h +++ b/lib/ofp-actions.h @@ -185,6 +185,19 @@ BUILD_ASSERT_DECL(sizeof(struct ofpact) == 4); #define OFPACT_ALIGNTO 8 #define OFPACT_ALIGN(SIZE) ROUND_UP(SIZE, OFPACT_ALIGNTO) +/* Expands to an anonymous union that contains: + * + *- MEMBERS in a nested anonymous struct. + * + *- An array as large as MEMBERS plus padding to a multiple of 8 bytes. + * + * The effect is to pad MEMBERS to a multiple of 8 bytes. */ +#define OFPACT_PADDED_MEMBERS(MEMBERS) \ +union { \ +struct { MEMBERS }; \ +uint8_t pad[OFPACT_ALIGN(sizeof(struct { MEMBERS }))]; \ +} + /* Returns the ofpact following 'ofpact'. */ static inline struct ofpact * ofpact_next(const struct ofpact *ofpact) @@ -487,8 +500,7 @@ struct ofpact_meter { * * Used for OFPIT11_WRITE_ACTIONS. */ struct ofpact_nest { -struct ofpact ofpact; -uint8_t pad[PAD_SIZE(sizeof(struct ofpact), OFPACT_ALIGNTO)]; +OFPACT_PADDED_MEMBERS(struct ofpact ofpact;); struct ofpact actions[]; }; BUILD_ASSERT_DECL(offsetof(struct ofpact_nest, actions) % OFPACT_ALIGNTO == 0); @@ -507,21 +519,6 @@ enum nx_conntrack_flags { * that the packet should not be recirculated. */ #define NX_CT_RECIRC_NONE OFPTT_ALL -/* We want to determine the size of these elements at compile time to ensure - * actions alignment, but we also want to allow ofpact_conntrack to have - * basic _put(), _get(), etc accessors defined below which access these - * members directly from ofpact_conntrack. An anonymous struct will serve - * both of these purposes. */ -#define CT_MEMBERS \ -struct {\ -struct ofpact ofpact; \ -uint16_t flags; \ -uint16_t zone_imm; \ -struct mf_subfield zone_src;\ -uint16_t alg; \ -uint8_t recirc_table; \ -} - #if !defined(IPPORT_FTP) #defineIPPORT_FTP 21 #endif @@ -530,10 +527,14 @@ struct {\ * * Used for NXAST_CT. */ struct ofpact_conntrack { -union { -CT_MEMBERS; -uint8_t pad[OFPACT_ALIGN(sizeof(CT_MEMBERS))]; -}; +OFPACT_PADDED_MEMBERS( +struct ofpact ofpact; +uint16_t flags; +uint16_t zone_imm; +struct mf_subfield zone_src; +uint16_t alg; +uint8_t recirc_table; +); struct ofpact actions[0]; }; BUILD_ASSERT_DECL(offsetof(struct ofpact_conntrack, actions) -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v4 07/14] Implement serializing the state of packet traversal in "continuations".
One purpose of OpenFlow packet-in messages is to allow a controller to interpose on the path of a packet through the flow tables. If, for example, the controller needs to modify a packet in some way that the switch doesn't directly support, the controller should be able to program the switch to send it the packet, then modify the packet and send it back to the switch to continue through the flow table. That's the theory. In practice, this doesn't work with any but the simplest flow tables. Packet-in messages simply don't include enough context to allow the flow table traversal to continue. For example: * Via "resubmit" actions, an Open vSwitch packet can have an effective "call stack", but a packet-in can't describe it, and so it would be lost. * Via "patch ports", an Open vSwitch packet can traverse multiple OpenFlow logical switches. A packet-in can't describe or resume this context. * A packet-in can't preserve the stack used by NXAST_PUSH and NXAST_POP actions. * A packet-in can't preserve the OpenFlow 1.1+ action set. * A packet-in can't preserve the state of Open vSwitch mirroring or connection tracking. This commit introduces a solution called "continuations". A continuation is the state of a packet's traversal through OpenFlow flow tables. A "controller" action with the "pause" flag, which is newly implemented in this comit, generates a continuation and sends it to the OpenFlow controller in a packet-in asynchronous message (only NXT_PACKET_IN2 supports continuations, so the controller must configure them with NXT_SET_PACKET_IN_FORMAT). The controller processes the packet-in, possibly modifying some of its data, and sends it back to the switch with an NXT_RESUME request, which causes flow table traversal to continue. In principle, a single packet can be paused and resumed multiple times. Another way to look at it is: - "pause" is an extension of the existing OFPAT_CONTROLLER action. It sends the packet to the controller, with full pipeline context (some of which is switch implementation dependent, and may thus vary from switch to switch). - A continuation is an extension of OFPT_PACKET_IN, allowing for implementation dependent metadata. - NXT_RESUME is an extension of OFPT_PACKET_OUT, with the semantics that the pipeline processing is continued with the original translation context from where it was left at the time it was paused. Signed-off-by: Ben Pfaff Acked-by: Jarno Rajahalme --- NEWS | 5 +- include/openflow/nicira-ext.h | 96 - lib/learning-switch.c | 3 +- lib/meta-flow.c | 9 +- lib/meta-flow.h | 3 +- lib/ofp-actions.c | 28 ++- lib/ofp-actions.h | 5 + lib/ofp-errors.h | 16 +- lib/ofp-msgs.h| 4 + lib/ofp-print.c | 78 +-- lib/ofp-util.c| 470 -- lib/ofp-util.h| 57 - lib/rconn.c | 3 +- ofproto/connmgr.c | 23 ++- ofproto/connmgr.h | 2 +- ofproto/fail-open.c | 16 +- ofproto/ofproto-dpif-xlate.c | 199 ++ ofproto/ofproto-dpif-xlate.h | 4 + ofproto/ofproto-dpif.c| 34 +++ ofproto/ofproto-provider.h| 3 + ofproto/ofproto.c | 24 +++ ovn/TODO | 57 - ovn/controller/pinctrl.c | 4 +- tests/ofp-actions.at | 13 +- tests/ofp-print.at| 12 ++ tests/ofproto-dpif.at | 172 tests/ofproto-macros.at | 35 +++- utilities/ovs-ofctl.8.in | 11 +- utilities/ovs-ofctl.c | 109 +++--- 29 files changed, 1239 insertions(+), 256 deletions(-) diff --git a/NEWS b/NEWS index 9ab6cae..ba4b7f7 100644 --- a/NEWS +++ b/NEWS @@ -6,7 +6,10 @@ Post-v2.5.0 * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY. * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported. * New property-based packet-in message format NXT_PACKET_IN2 with support - for arbitrary user-provided data. + for arbitrary user-provided data and for serializing flow table + traversal into a continuation for later resumption. + * New extension message NXT_SET_ASYNC_CONFIG2 to allow OpenFlow 1.4-like + control over asynchronous messages in earlier versions of OpenFlow. - ovs-ofctl: * queue-get-config command now allows a queue ID to be specified. - DPDK: diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h index 7e56066..77a735d 100644 --- a/include/openflow/nicira-ext.h +++ b/include/openflow/nicira-ext.h @@ -260,12 +260,103 @@ struct nx_packet_in { }; OFP_ASSERT(sizeof(struct nx_packet_in) == 24); -/* NXT_PACKET_IN2. +/* NXT_PACKET_IN2 + * == * * NXT_PACKET_IN2 is concep
[ovs-dev] [PATCH v4 11/14] ovn: Use callback function instead of simap for logical port number map.
An simap is convenient but it isn't very flexible. If the client wants to keep extra data with each node then it has to build a second parallel data structure. A callback function is kind of a pain for the clients from the point of view of having to write it and deal with auxiliary data, etc., but it allows the storage to be more flexible. An upcoming commit will make further use of this capability. Signed-off-by: Ben Pfaff --- ovn/controller/lflow.c | 18 +-- ovn/lib/actions.c | 3 +- ovn/lib/actions.h | 10 +++--- ovn/lib/expr.c | 88 +++--- ovn/lib/expr.h | 13 ++-- tests/test-ovn.c | 19 +-- 6 files changed, 105 insertions(+), 46 deletions(-) diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index d53213c..912c609 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -271,6 +271,18 @@ lflow_init(void) symtab_init(); } +static bool +lookup_port_cb(const void *ldp_, const char *port_name, unsigned int *portp) +{ +const struct logical_datapath *ldp = ldp_; +const struct simap_node *node = simap_find(&ldp->ports, port_name); +if (!node) { +return false; +} +*portp = node->data; +return true; +} + /* Translates logical flows in the Logical_Flow table in the OVN_SB database * into OpenFlow flows. See ovn-architecture(7) for more information. */ void @@ -346,7 +358,8 @@ lflow_run(struct controller_ctx *ctx, struct hmap *flow_table, ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub); struct action_params ap = { .symtab = &symtab, -.ports = &ldp->ports, +.lookup_port = lookup_port_cb, +.aux = ldp, .ct_zones = ct_zones, .n_tables = LOG_PIPELINE_LEN, @@ -387,7 +400,8 @@ lflow_run(struct controller_ctx *ctx, struct hmap *flow_table, expr = expr_simplify(expr); expr = expr_normalize(expr); -uint32_t n_conjs = expr_to_matches(expr, &ldp->ports, &matches); +uint32_t n_conjs = expr_to_matches(expr, lookup_port_cb, ldp, + &matches); expr_destroy(expr); /* Prepare the OpenFlow matches for adding to the flow table. */ diff --git a/ovn/lib/actions.c b/ovn/lib/actions.c index 36f0047..c1583e4 100644 --- a/ovn/lib/actions.c +++ b/ovn/lib/actions.c @@ -108,7 +108,8 @@ parse_set_action(struct action_context *ctx) struct expr *prereqs; char *error; -error = expr_parse_assignment(ctx->lexer, ctx->ap->symtab, ctx->ap->ports, +error = expr_parse_assignment(ctx->lexer, ctx->ap->symtab, + ctx->ap->lookup_port, ctx->ap->aux, ctx->ofpacts, &prereqs); if (error) { action_error(ctx, "%s", error); diff --git a/ovn/lib/actions.h b/ovn/lib/actions.h index 95a8279..d8b26d0 100644 --- a/ovn/lib/actions.h +++ b/ovn/lib/actions.h @@ -17,6 +17,7 @@ #ifndef OVN_ACTIONS_H #define OVN_ACTIONS_H 1 +#include #include #include "compiler.h" #include "util.h" @@ -47,10 +48,11 @@ struct action_params { * expr_parse()). */ const struct shash *symtab; - /* 'ports' must be a map from strings (presumably names of ports) to - * integers (as one would provide to expr_to_matches()). Strings used in - * the actions that are not in 'ports' are translated to zero. */ -const struct simap *ports; +/* Looks up logical port 'port_name'. If found, stores its port number in + * '*portp' and returns true; otherwise, returns false. */ +bool (*lookup_port)(const void *aux, const char *port_name, +unsigned int *portp); +const void *aux; /* A map from a port name to its connection tracking zone. */ const struct simap *ct_zones; diff --git a/ovn/lib/expr.c b/ovn/lib/expr.c index e196922..44fde84 100644 --- a/ovn/lib/expr.c +++ b/ovn/lib/expr.c @@ -1,5 +1,5 @@ /* - * Copyright (c) 2015 Nicira, Inc. + * Copyright (c) 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -2287,17 +2287,18 @@ expr_match_add(struct hmap *matches, struct expr_match *match) } static bool -constrain_match(const struct expr *expr, const struct simap *ports, -struct match *m) +constrain_match(const struct expr *expr, +bool (*lookup_port)(const void *aux, const char *port_name, +unsigned int *portp), +const void *aux, struct match *m) { ovs_assert(expr->type == EXPR_T_CMP); if (expr->cmp.symbol->width) { mf_mask_subfield(expr->cmp.symbol->field, &expr->cmp.value, &expr->cmp.mask, m); } else { -const struct simap_node *node; -node = ports ? simap_find(ports, expr->cmp.stri
[ovs-dev] [PATCH v4 05/14] Support userdata in NXT_PACKET_IN2.
Signed-off-by: Ben Pfaff --- NEWS | 3 +- include/openflow/nicira-ext.h | 1 + lib/ofp-actions.c | 168 ++ lib/ofp-actions.h | 9 ++- lib/ofp-print.c | 11 +++ lib/ofp-util.c| 9 +++ lib/ofp-util.h| 4 + ofproto/connmgr.c | 1 + ofproto/ofproto-dpif-xlate.c | 18 +++-- ofproto/ofproto-dpif.c| 1 + tests/ofp-actions.at | 7 ++ tests/ofp-print.at| 7 +- tests/ofproto.at | 7 +- utilities/ovs-ofctl.8.in | 20 +++-- 14 files changed, 232 insertions(+), 34 deletions(-) diff --git a/NEWS b/NEWS index 696168b..9ab6cae 100644 --- a/NEWS +++ b/NEWS @@ -5,7 +5,8 @@ Post-v2.5.0 - OpenFlow: * OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY. * OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported. - * New property-based packet-in message format NXT_PACKET_IN2. + * New property-based packet-in message format NXT_PACKET_IN2 with support + for arbitrary user-provided data. - ovs-ofctl: * queue-get-config command now allows a queue ID to be specified. - DPDK: diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h index bcc8758..7e56066 100644 --- a/include/openflow/nicira-ext.h +++ b/include/openflow/nicira-ext.h @@ -279,6 +279,7 @@ enum nx_packet_in2_prop_type { /* Other. */ NXPINT_REASON, /* uint8_t, one of OFPR_*. */ NXPINT_METADATA,/* NXM or OXM for metadata fields. */ +NXPINT_USERDATA,/* From NXAST_CONTROLLER2 userdata. */ }; /* Configures the "role" of the sending controller. The default role is: diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 7dc852e..38011c3 100644 --- a/lib/ofp-actions.c +++ b/lib/ofp-actions.c @@ -30,6 +30,7 @@ #include "nx-match.h" #include "odp-netlink.h" #include "ofp-parse.h" +#include "ofp-prop.h" #include "ofp-util.h" #include "ofpbuf.h" #include "unaligned.h" @@ -273,6 +274,8 @@ enum ofp_raw_action_type { /* NX1.0+(20): struct nx_action_controller. */ NXAST_RAW_CONTROLLER, +/* NX1.0+(37): struct nx_action_controller2, ... */ +NXAST_RAW_CONTROLLER2, /* NX1.0+(22): struct nx_action_write_metadata. */ NXAST_RAW_WRITE_METADATA, @@ -625,6 +628,27 @@ struct nx_action_controller { }; OFP_ASSERT(sizeof(struct nx_action_controller) == 16); +/* Properties for NXAST_CONTROLLER2. */ +enum nx_action_controller2_prop_type { +NXAC2PT_MAX_LEN,/* ovs_be16 max length to send controller. */ +NXAC2PT_CONTROLLER_ID, /* ovs_be16 controller ID of destination. */ +NXAC2PT_REASON, /* uint8_t reason (OFPR_*). */ +NXAC2PT_USERDATA, /* Data to copy into NXPINT_USERDATA. */ +}; + +/* Action structure for NXAST_CONTROLLER2. + * + * This replacement for NXAST_CONTROLLER makes it extensible via properties. */ +struct nx_action_controller2 { +ovs_be16 type; /* OFPAT_VENDOR. */ +ovs_be16 len; /* Length is 16 or more. */ +ovs_be32 vendor;/* NX_VENDOR_ID. */ +ovs_be16 subtype; /* NXAST_CONTROLLER2. */ +uint8_t zeros[6]; /* Must be zero. */ +/* Followed by NXAC2PT_* properties. */ +}; +OFP_ASSERT(sizeof(struct nx_action_controller2) == 16); + static enum ofperr decode_NXAST_RAW_CONTROLLER(const struct nx_action_controller *nac, enum ofp_version ofp_version OVS_UNUSED, @@ -633,9 +657,77 @@ decode_NXAST_RAW_CONTROLLER(const struct nx_action_controller *nac, struct ofpact_controller *oc; oc = ofpact_put_CONTROLLER(out); +oc->ofpact.raw = NXAST_RAW_CONTROLLER; oc->max_len = ntohs(nac->max_len); oc->controller_id = ntohs(nac->controller_id); oc->reason = nac->reason; +ofpact_finish(out, &oc->ofpact); + +return 0; +} + +static enum ofperr +decode_NXAST_RAW_CONTROLLER2(const struct nx_action_controller2 *nac2, + enum ofp_version ofp_version OVS_UNUSED, + struct ofpbuf *out) +{ +if (!is_all_zeros(nac2->zeros, sizeof nac2->zeros)) { +return OFPERR_NXBRC_MUST_BE_ZERO; +} + +size_t start_ofs = out->size; +struct ofpact_controller *oc = ofpact_put_CONTROLLER(out); +oc->ofpact.raw = NXAST_RAW_CONTROLLER2; +oc->max_len = UINT16_MAX; +oc->reason = OFPR_ACTION; + +struct ofpbuf properties; +ofpbuf_use_const(&properties, nac2, ntohs(nac2->len)); +ofpbuf_pull(&properties, sizeof *nac2); + +while (properties.size > 0) { +struct ofpbuf payload; +uint64_t type; + +enum ofperr error = ofpprop_pull(&properties, &payload, &type); +if (error) { +return error; +} + +switch (type) { +case NXAC2PT_MAX_LEN: +error = ofpprop_parse_u16(&payload,
[ovs-dev] [PATCH v4 12/14] ovn-controller: Add data structure for indexing lports, multicast groups.
This was more or less implemented inside lflow.c until now, but some upcoming code that shouldn't be in that file needs to use it too. This also adds a second index on lports, so that lports can be looked up based on the logical datapath tunnel key and the logical port tunnel key. An upcoming commit will add a user for this new index. Signed-off-by: Ben Pfaff --- ovn/controller/automake.mk | 2 + ovn/controller/lflow.c | 181 ++-- ovn/controller/lflow.h | 10 ++- ovn/controller/lport.c | 157 ++ ovn/controller/lport.h | 67 +++ ovn/controller/ovn-controller.c | 36 6 files changed, 299 insertions(+), 154 deletions(-) create mode 100644 ovn/controller/lport.c create mode 100644 ovn/controller/lport.h diff --git a/ovn/controller/automake.mk b/ovn/controller/automake.mk index cadfa9c..cf57bbd 100644 --- a/ovn/controller/automake.mk +++ b/ovn/controller/automake.mk @@ -8,6 +8,8 @@ ovn_controller_ovn_controller_SOURCES = \ ovn/controller/encaps.h \ ovn/controller/lflow.c \ ovn/controller/lflow.h \ + ovn/controller/lport.c \ + ovn/controller/lport.h \ ovn/controller/ofctrl.c \ ovn/controller/ofctrl.h \ ovn/controller/pinctrl.c \ diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index 912c609..bd08ad6 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2015 Nicira, Inc. +/* Copyright (c) 2015, 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -15,12 +15,13 @@ #include #include "lflow.h" +#include "lport.h" #include "dynamic-string.h" #include "ofctrl.h" #include "ofp-actions.h" #include "ofpbuf.h" #include "openvswitch/vlog.h" -#include "ovn/controller/ovn-controller.h" +#include "ovn-controller.h" #include "ovn/lib/actions.h" #include "ovn/lib/expr.h" #include "ovn/lib/ovn-sb-idl.h" @@ -43,8 +44,8 @@ add_logical_register(struct shash *symtab, enum mf_field_id id) expr_symtab_add_field(symtab, name, id, NULL, false); } -static void -symtab_init(void) +void +lflow_init(void) { shash_init(&symtab); @@ -155,161 +156,62 @@ symtab_init(void) expr_symtab_add_field(&symtab, "sctp.dst", MFF_SCTP_DST, "sctp", false); } -/* Logical datapaths and logical port numbers. */ - -enum ldp_type { -LDP_TYPE_ROUTER, -LDP_TYPE_SWITCH, -}; - -/* A logical datapath. - * - * 'ports' maps 'logical_port' names to 'tunnel_key' values in the OVN_SB - * Port_Binding table within the logical datapath. */ -struct logical_datapath { -struct hmap_node hmap_node; /* Indexed on 'uuid'. */ -struct uuid uuid; /* UUID from Datapath_Binding row. */ -uint32_t tunnel_key;/* 'tunnel_key' from Datapath_Binding row. */ -struct simap ports; /* Logical port name to port number. */ -enum ldp_type type; /* Type of logical datapath */ +struct lookup_port_aux { +const struct lport_index *lports; +const struct mcgroup_index *mcgroups; +const struct sbrec_datapath_binding *dp; }; -/* Contains "struct logical_datapath"s. */ -static struct hmap logical_datapaths = HMAP_INITIALIZER(&logical_datapaths); - -/* Finds and returns the logical_datapath for 'binding', or NULL if no such - * logical_datapath exists. */ -static struct logical_datapath * -ldp_lookup(const struct sbrec_datapath_binding *binding) -{ -struct logical_datapath *ldp; -HMAP_FOR_EACH_IN_BUCKET (ldp, hmap_node, uuid_hash(&binding->header_.uuid), - &logical_datapaths) { -if (uuid_equals(&ldp->uuid, &binding->header_.uuid)) { -return ldp; -} -} -return NULL; -} - -/* Creates a new logical_datapath for the given 'binding'. */ -static struct logical_datapath * -ldp_create(const struct sbrec_datapath_binding *binding) -{ -struct logical_datapath *ldp; - -ldp = xmalloc(sizeof *ldp); -hmap_insert(&logical_datapaths, &ldp->hmap_node, -uuid_hash(&binding->header_.uuid)); -ldp->uuid = binding->header_.uuid; -ldp->tunnel_key = binding->tunnel_key; -const char *ls = smap_get(&binding->external_ids, "logical-switch"); -ldp->type = ls ? LDP_TYPE_SWITCH : LDP_TYPE_ROUTER; -simap_init(&ldp->ports); -return ldp; -} - -static struct logical_datapath * -ldp_lookup_or_create(const struct sbrec_datapath_binding *binding) -{ -struct logical_datapath *ldp = ldp_lookup(binding); -return ldp ? ldp : ldp_create(binding); -} - -static void -ldp_free(struct logical_datapath *ldp) +static bool +lookup_port_cb(const void *aux_, const char *port_name, unsigned int *portp) { -simap_destroy(&ldp->ports); -hmap_remove(&logical_datapaths, &ldp->hmap_node); -free(ldp); -} +const struct lookup_port_aux *aux = aux_
[ovs-dev] [PATCH v4 14/14] [RFC] lflow: Disable egress table optimization.
I don't understand why, but without this change, the test in the previous commit does not pass. Signed-off-by: Ben Pfaff --- ovn/controller/lflow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index f16c5c9..a18c760 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -209,7 +209,7 @@ add_logical_flows(struct controller_ctx *ctx, const struct lport_index *lports, if (!ldp) { continue; } -if (!ingress && is_switch(ldp)) { +if (0 && !ingress && is_switch(ldp)) { /* For a logical switch datapath, local_datapaths tells us if there * are any local ports for this datapath. If not, processing * logical flows for the egress pipeline of this datapath is -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v4 13/14] ovn: Implement basic ARP support for L3 logical routers.
This is sufficient support that an L3 logical router can now transmit packets to VMs (and other destinations) without having to know the IP-to-MAC binding in advance. The details are carefully documented in all of the appropriate places. There are several important caveats that need to be fixed before this can be taken seriously in production. These are documented in ovn/TODO. The most important of these are renewal, expiration, and limiting the size of the ARP table. Signed-off-by: Ben Pfaff --- ovn/TODO| 55 ++- ovn/controller/lflow.c | 86 +++-- ovn/controller/lflow.h | 1 + ovn/controller/ovn-controller.c | 9 +- ovn/controller/pinctrl.c| 197 +- ovn/controller/pinctrl.h| 8 +- ovn/lib/actions.c | 203 +--- ovn/lib/actions.h | 11 +++ ovn/lib/expr.c | 53 +++ ovn/lib/expr.h | 3 + ovn/northd/ovn-northd.8.xml | 112 +- ovn/northd/ovn-northd.c | 105 - ovn/ovn-architecture.7.xml | 78 ++- ovn/ovn-sb.ovsschema| 15 ++- ovn/ovn-sb.xml | 137 ++- ovn/utilities/ovn-sbctl.c | 4 + tests/ovn.at| 187 +--- tests/test-ovn.c| 1 + 18 files changed, 1053 insertions(+), 212 deletions(-) diff --git a/ovn/TODO b/ovn/TODO index b08a4f4..93ef4a3 100644 --- a/ovn/TODO +++ b/ovn/TODO @@ -4,18 +4,11 @@ ** New OVN logical actions -*** arp - -Generates an ARP packet based on the current IPv4 packet and allows it -to be processed as part of the current pipeline (and then pop back to -processing the original IPv4 packet). +*** rate_limit TCP/IP stacks typically limit the rate at which ARPs are sent, e.g. to one per second for a given target. We might need to do this too. -We probably need to buffer the packet that generated the ARP. I don't -know where to do that. - *** icmp4 { action... } Generates an ICMPv4 packet based on the current IPv4 packet and @@ -60,37 +53,13 @@ the "arp" action, and an action for generating ** Dynamic IP to MAC bindings -Some bindings from IP address to MAC will undoubtedly need to be -discovered dynamically through ARP requests. It's straightforward -enough for a logical L3 router to generate ARP requests and forward -them to the appropriate switch. - -It's more difficult to figure out where the reply should be processed -and stored. It might seem at first that a first-cut implementation -could just keep track of the binding on the hypervisor that needs to -know, but that can't happen easily because the VM that sends the reply -might not be on the same HV as the VM that needs the answer (that is, -the VM that sent the packet that needs the binding to be resolved) and -there isn't an easy way for it to know which HV needs the answer. - -Thus, the HV that processes the ARP reply (which is unknown when the -ARP is sent) has to tell all the HVs the binding. The most obvious -place for this in the OVN_Southbound database. - -Details need to be worked out, including: - -*** OVN_Southbound schema changes. +OVN has basic support for establishing IP to MAC bindings dynamically, +using ARP. -Possibly bindings could be added to the Port_Binding table by adding -or modifying columns. Another possibility is that another table -should be added. +*** Ratelimiting. -*** Logical_Flow representation - -It would be really nice to maintain the general-purpose nature of -logical flows, but these bindings might have to include some -hard-coded special cases, especially when it comes to the relationship -with populating the bindings into the OVN_Southbound table. +From casual observation, Linux appears to generate at most one ARP per +second per destination. *** Tracking queries @@ -104,16 +73,12 @@ into the database. Something needs to make sure that bindings remain valid and expire those that become stale. -** MTU handling (fragmentation on output) - -** Ratelimiting. +*** Table size limiting. -*** ARP. +The table of MAC bindings must not be allowed to grow unreasonably +large. -*** ICMP error generation, TCP reset, UDP unreachable, protocol unreachable, ... - -As a point of comparison, Linux doesn't ratelimit TCP resets but I -think it does everything else. +** MTU handling (fragmentation on output) * ovn-controller diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index bd08ad6..f16c5c9 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -191,15 +191,13 @@ is_switch(const struct sbrec_datapath_binding *ldp) } -/* Translates logical flows in the Logical_Flow table in the OVN_SB database - * into OpenFlow flows. See ovn-architecture(7) for more information. */ -void -lflow_run(struct controller_ctx *ctx,
Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS functionality.
> On Wed, 17 Feb 2016 14:40:26 + > "Stokes, Ian" wrote: > > > > -Original Message- > > > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Stokes, > > > Ian > > > Sent: Thursday, February 11, 2016 2:42 PM > > > To: Flavio Leitner > > > Cc: dev@openvswitch.org > > > Subject: Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS > functionality. > > > > > > > On Thu, 11 Feb 2016 12:32:56 + "Stokes, Ian" > > > > wrote: > > > > > > > > > Thanks For the review Flavio, much appreciated, comments inline. > > > > > I'll > > > > re-spin a new version also. > > > > > > > > Thank you for the patch! > > > > comments inline. > > > > > > > > > > > > > > > > > > > -Original Message- > > > > > > From: Flavio Leitner [mailto:f...@sysclose.org] > > > > > > Sent: Wednesday, February 10, 2016 7:55 PM > > > > > > To: Stokes, Ian > > > > > > Cc: dev@openvswitch.org > > > > > > Subject: Re: [ovs-dev] [PATCH v3] netdev_dpdk.c: Add QoS > > > > functionality. > > > > > > > > > > > > On Mon, 1 Feb 2016 20:47:25 + Ian Stokes > > > > > > wrote: > > > > > > > > > > > > > This patch provides the modifications required in > > > > > > > netdev-dpdk.c and vswitch.xml to allow for a DPDK user space > QoS algorithm. > > > > > > > > > > > > > > This patch adds a QoS configuration structure for > > > > > > > netdev-dpdk and expected QoS operations 'dpdk_qos_ops'. > > > > > > > Various helper functions are also supplied. > > > > > > > > > > > > > > Also included are the modifications required for vswitch.xml > > > > > > > to allow a new QoS implementation for netdev-dpdk devices. > > > > > > > This includes a new QoS type `egress-policer` as well as its > > > > > > > expected > > > > QoS table entries. > > > > > > > > > > > > > > The QoS functionality implemented for DPDK devices is > > > > > > > `egress- > > > > > > policer`. > > > > > > > This can be used to drop egress packets at a configurable > rate. > > > > > > > > > > > > > > The INSTALL.DPDK.md guide has also been modified to provide > > > > > > > an example configuration of `egress-policer` QoS. > > > > > > > > > > > > > > Signed-off-by: Ian Stokes > > > > > > > --- > > > > > > > INSTALL.DPDK.md | 20 +++ > > > > > > > lib/netdev-dpdk.c| 439 > > > > > > -- > > > > > > > vswitchd/vswitch.xml | 52 ++ > > > > > > > 3 files changed, 498 insertions(+), 13 deletions(-) > > > > > > > > > > > > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index > > > > > > > e8ef4b5..dac70cd > > > > > > > 100644 > > > > > > > --- a/INSTALL.DPDK.md > > > > > > > +++ b/INSTALL.DPDK.md > > > > > > > @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd: > > > > > > > ./ovs-ofctl add-flow br0 in_port=2,action=output:1 > > > > > > > ``` > > > > > > > > > > > > > > +8. QoS usage example > > > > > > > + > > > > > > > + Assuming you have a vhost-user port transmitting traffic > > > > > > consisting of > > > > > > > + packets of size 64 bytes, the following command would > > > > > > > + limit the > > > > > > egress > > > > > > > + transmission rate of the port to ~1,000,000 packets per > > > > second: > > > > > > > + > > > > > > > + `ovs-vsctl set port vhost-user0 qos=@newqos -- > > > > > > > + --id=@newqos create > > > > > > qos > > > > > > > + type=egress-policer other-config:cir=4600 > > > > > > > + other-config:cbs=2048` > > > > > > > + > > > > > > > + To examine the QoS configuration of the port: > > > > > > > + > > > > > > > + `ovs-appctl -t ovs-vswitchd qos/show vhost-user0` > > > > > > > + > > > > > > > + To clear the QoS configuration from the port and ovsdb > > > > > > > + use the > > > > > > following: > > > > > > > + > > > > > > > + `ovs-vsctl -- destroy QoS vhost-user0 -- clear Port > > > > > > > + vhost-user0 qos` > > > > > > > + > > > > > > > + For more details regarding egress-policer parameters > > > > > > > + please refer > > > > > > to the > > > > > > > + vswitch.xml. > > > > > > > + > > > > > > > Performance Tuning: > > > > > > > --- > > > > > > > > > > > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index > > > > > > > 09ccc2c..716da79 100644 > > > > > > > --- a/lib/netdev-dpdk.c > > > > > > > +++ b/lib/netdev-dpdk.c > > > > > > > @@ -44,6 +44,7 @@ > > > > > > > #include "ovs-rcu.h" > > > > > > > #include "packets.h" > > > > > > > #include "shash.h" > > > > > > > +#include "smap.h" > > > > > > > #include "sset.h" > > > > > > > #include "unaligned.h" > > > > > > > #include "timeval.h" > > > > > > > @@ -52,12 +53,14 @@ > > > > > > > > > > > > > > #include "rte_config.h" > > > > > > > #include "rte_mbuf.h" > > > > > > > +#include "rte_meter.h" > > > > > > > #include "rte_virtio_net.h" > > > > > > > > > > > > > > VLOG_DEFINE_THIS_MODULE(dpdk); static struct > > > > > > > vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); > > > > > > > > > > > > > > #define DPDK_PORT_WATCHDOG_INTERVAL 5 > > > > > > > +#define DPDK_MAX_Q
Re: [ovs-dev] : ovs-appctl dump and flush command for userspace conntrack
Hi Joe, Thanks for pointing out my mistake. Please find below updated one. m 69e63a45e2773c124deb885bbc3d5deb3e032126 Mon Sep 17 00:00:00 2001 From: soumyadeep chowdhury Date: Fri, 19 Feb 2016 07:39:34 -0500 Subject: [PATCH 4/4] Write the functions for dump-conntrack and flush-conntrack Signed-off-by: Sourabh Bansal --- ovs-userconntrack_20151115/lib/dpif-netdev.c | 82 1 file changed, 82 insertions(+) diff --git a/ovs-userconntrack_20151115/lib/dpif-netdev.c b/ovs-userconntrack_20151115/lib/dpif-netdev.c index 340e37c..f950780 100644 --- a/ovs-userconntrack_20151115/lib/dpif-netdev.c +++ b/ovs-userconntrack_20151115/lib/dpif-netdev.c @@ -715,6 +715,84 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], unixctl_command_reply(conn, ds_cstr(&reply)); ds_destroy(&reply); } + +/*This function will dump the entries present in conntrack table*/ +static void +dpif_netdev_dump_conntrack(struct unixctl_conn *conn, int argc, const char *argv[], + void *aux OVS_UNUSED) +{ +struct dp_netdev *dp = NULL; +struct ct_dpif_dump_state *dump; +struct ct_dpif_entry cte; +uint16_t *pzone = NULL; +struct dpif *dpif; + +ovs_mutex_lock(&dp_netdev_mutex); + +if (argc == 2) { +dp = shash_find_data(&dp_netdevs, argv[1]); +} else if (shash_count(&dp_netdevs) == 1) { + /* There's only one datapath */ +dp = shash_first(&dp_netdevs)->data; +} + +if (!dp) { + ovs_mutex_unlock(&dp_netdev_mutex); + unixctl_command_reply_error(conn, + "please specify an existing datapath"); + return; +} +ovs_mutex_unlock(&dp_netdev_mutex); +dpif = dp->dpif; + +int verbosity = 1; +int print_statistics = 0; + +struct ds s = DS_EMPTY_INITIALIZER; + +ct_dpif_dump_start(dpif, &dump, pzone); +while (!ct_dpif_dump_next(dump, &cte)) { + +ct_dpif_format_entry(&cte, &s, verbosity, +print_statistics); +ct_dpif_entry_uninit(&cte); +} + +unixctl_command_reply(conn, ds_cstr(&s)); +ds_destroy(&s); + +ct_dpif_dump_done(dump); +return; +} + +/*This function will flush the entries present in conntrack table*/ +static void +dpif_netdev_flush_conntrack(struct unixctl_conn *conn, int argc, const char *argv[], + void *aux OVS_UNUSED) +{ +struct dp_netdev *dp = NULL; +struct dpif *dpif; +uint16_t *pzone = NULL; + +ovs_mutex_lock(&dp_netdev_mutex); +if (argc == 2) { +dp = shash_find_data(&dp_netdevs, argv[1]); +} else if (shash_count(&dp_netdevs) == 1) { +dp = shash_first(&dp_netdevs)->data; +} + +ovs_mutex_unlock(&dp_netdev_mutex); + +struct ds s = DS_EMPTY_INITIALIZER; + +dpif = dp->dpif; +ct_dpif_flush(dpif,pzone); + +unixctl_command_reply(conn, ds_cstr(&s)); +ds_destroy(&s); + +} + static int dpif_netdev_init(void) @@ -728,6 +806,10 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/pmd-stats-clear", "[dp]", 0, 1, dpif_netdev_pmd_info, (void *)&clear_aux); +unixctl_command_register("dump-conntrack", "Dump the entire conntrack table", + 0, 0, dpif_netdev_dump_conntrack, NULL); +unixctl_command_register("flush-conntrack", "Flush the entire conntrack table", + 0, 0, dpif_netdev_flush_conntrack, NULL); return 0; } Regards, Sourabh Bansal -Original Message- From: Joe Stringer [mailto:j...@ovn.org] Sent: Friday, February 19, 2016 3:08 AM To: Sourabh Bansal (NEP) Cc: Daniele Di Proietto ; ovs dev Subject: Re: [ovs-dev] : ovs-appctl dump and flush command for userspace conntrack On 17 February 2016 at 04:50, wrote: > Hi Daniele, > > Like to contribute the following changes to your userconntrack branch. > Please review and provide your comments > > Change Description: > > This implementation is based on top of this branch (By Daniele Di Proietto): > https://github.com/ddiproietto/ovs/tree/userconntrack_20151115 > > and this provides following two commands to ovs-appctl command - > > > 1. Ovs-appctl dump-conntrack - To dump the conntrack table entries > > 2. Ovs-appctl flush-conntrack - To flush the conntrack table > entries > > Corresponding function implementations are - > dpif_netdev_dump_conntrack() > dpif_netdev_flush_conntrack() > > and these functions are registered with unixctl_command_register() so > that it can be accessed via ovs-appctl command > > List of change files: > > 1. /lib/dpif-netdev.c This functionality is already available here, in the same codebase: https://github.com/ddiproietto/ovs/blob/userconntrack_20151115/lib/dpctl.c#L1246 > Change Diff: > > *** /home/bansal/submission_code/ovs-userconntrack_20151115/lib/dpif-netdev.c > 2016-02-17 10:00:12.114559489 -0500 > --- /ho
[ovs-dev] [PATCH V5 1/2] netdev-dpdk: clean up mbuf initialization
Current mbuf initialization relies on magic numbers and does not accomodate mbufs of different sizes. Resolve this issue by ensuring that mbufs are always aligned to a 1k boundary (a typical DPDK NIC Rx buffer alignment). Signed-off-by: Mark Kavanagh --- lib/netdev-dpdk.c | 87 +++ 1 file changed, 42 insertions(+), 45 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index e4f789b..2a06bb5 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -69,14 +69,14 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); * The minimum mbuf size is limited to avoid scatter behaviour and drop in * performance for standard Ethernet MTU. */ -#define MTU_TO_MAX_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) -#define MBUF_SIZE_MTU(mtu) (MTU_TO_MAX_LEN(mtu)\ - + sizeof(struct dp_packet) \ - + RTE_PKTMBUF_HEADROOM) -#define MBUF_SIZE_DRIVER (2048 \ - + sizeof (struct rte_mbuf) \ - + RTE_PKTMBUF_HEADROOM) -#define MBUF_SIZE(mtu) MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER) +#define ETHER_HDR_MAX_LEN (ETHER_HDR_LEN + ETHER_CRC_LEN + (2 * VLAN_HEADER_LEN)) +#define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) +#define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) +#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN - ETHER_CRC_LEN) +#define MBUF_SIZE(mtu) ( MTU_TO_MAX_FRAME_LEN(mtu) \ ++ sizeof(struct dp_packet)\ ++ RTE_PKTMBUF_HEADROOM) +#define NETDEV_DPDK_MBUF_ALIGN 1024 /* Max and min number of packets in the mempool. OVS tries to allocate a * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have @@ -252,6 +252,22 @@ is_dpdk_class(const struct netdev_class *class) return class->construct == netdev_dpdk_construct; } +/* DPDK NIC drivers allocate RX buffers at a particular granularity, typically + * aligned at 1k or less. If a declared mbuf size is not a multiple of this + * value, insufficient buffers are allocated to accomodate the packet in its + * entirety. Furthermore, certain drivers need to ensure that there is also + * sufficient space in the Rx buffer to accommodate two VLAN tags (for QinQ + * frames). If the RX buffer is too small, then the driver enables scatter RX + * behaviour, which reduces performance. To prevent this, use a buffer size that + * is closest to 'mtu', but which satisfies the aforementioned criteria. + */ +static uint32_t +dpdk_buf_size(int mtu) +{ +return ROUND_UP((MTU_TO_MAX_FRAME_LEN(mtu) + RTE_PKTMBUF_HEADROOM), + NETDEV_DPDK_MBUF_ALIGN); +} + /* XXX: use dpdk malloc for entire OVS. in fact huge page should be used * for all other segments data, bss and text. */ @@ -278,34 +294,6 @@ free_dpdk_buf(struct dp_packet *p) } static void -__rte_pktmbuf_init(struct rte_mempool *mp, - void *opaque_arg OVS_UNUSED, - void *_m, - unsigned i OVS_UNUSED) -{ -struct rte_mbuf *m = _m; -uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); - -RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); - -memset(m, 0, mp->elt_size); - -/* start of buffer is just after mbuf structure */ -m->buf_addr = (char *)m + sizeof(struct dp_packet); -m->buf_physaddr = rte_mempool_virt2phy(mp, m) + -sizeof(struct dp_packet); -m->buf_len = (uint16_t)buf_len; - -/* keep some headroom between start of buffer and data */ -m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); - -/* init some constant fields */ -m->pool = mp; -m->nb_segs = 1; -m->port = 0xff; -} - -static void ovs_rte_pktmbuf_init(struct rte_mempool *mp, void *opaque_arg OVS_UNUSED, void *_m, @@ -313,7 +301,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp, { struct rte_mbuf *m = _m; -__rte_pktmbuf_init(mp, opaque_arg, _m, i); +rte_pktmbuf_init(mp, opaque_arg, _m, i); dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); } @@ -324,6 +312,7 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex) struct dpdk_mp *dmp = NULL; char mp_name[RTE_MEMPOOL_NAMESIZE]; unsigned mp_size; +struct rte_pktmbuf_pool_private mbp_priv; LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { if (dmp->socket_id == socket_id && dmp->mtu == mtu) { @@ -336,6 +325,8 @@ dpdk_mp_get(int socket_id, int mtu) OVS_REQUIRES(dpdk_mutex) dmp->socket_id = socket_id; dmp->mtu = mtu; dmp->refcount = 1; +mbp_priv.mbuf_data_room_size = MBUF_SIZE(mtu) - sizeof(struct dp_packet); +mbp_priv.mbuf_priv_size = sizeof (struct dp_packet) - sizeof (struct rte_mbuf); mp_size = M
[ovs-dev] [PATCH V5 0/2] netdev-dpdk: add jumbo frame support
This patchset enables DPDK ports to support Jumbo frames up to, and including 13K in size. Jumbo frame support is implemented by increasing the amount of data that each mbuf can encompass, thus allowing each mbuf segment to accomodate a single jumbo frame. The series additionally cleans up mbuf initialization, which previously relied on hardcoded values. v5: Squash NEWS and INSTALL.DPDK.md patches into Jumbo Frame patch Minor rework v4: Split mbuf initialization, NEWS, and INSTALL.DPDK.md into separate patches. http://openvswitch.org/pipermail/dev/2016-February/066191.html v3: Minor rework http://openvswitch.org/pipermail/dev/2016-February/065391.html v2: Add runtime support for Jumbo Frames - Add support for per-port MTUs http://openvswitch.org/pipermail/dev/2016-January/065090.html v1: Add compile time support for Jumbo Frames - Add static support for jumbo frames; the same MTU applies to all DPDK port types. http://openvswitch.org/pipermail/dev/2015-November/062121.html ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH V5 2/2] netdev-dpdk: add jumbo frame support
Add support for Jumbo Frames to DPDK-enabled port types, using single-segment-mbufs. Using this approach, the amount of memory allocated for each mbuf to store frame data is increased to a value greater than 1518B (typical Ethernet maximum frame length). The increased space available in the mbuf means that an entire Jumbo Frame can be carried in a single mbuf, as opposed to partitioning it across multiple mbuf segments. The amount of space allocated to each mbuf to hold frame data is defined dynamically by the user when adding a DPDK port to a bridge. If an MTU value is not supplied, or the user-supplied value is invalid, the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B). Signed-off-by: Mark Kavanagh --- INSTALL.DPDK.md | 60 - NEWS | 3 +- lib/netdev-dpdk.c | 248 +- 3 files changed, 233 insertions(+), 78 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index d892788..4ca98cb 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -878,10 +878,63 @@ by adding the following string: to sections of all network devices used by DPDK. Parameter 'N' determines how many queues can be used by the guest. +Jumbo Frames + + +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports. + +To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl +'add-port' command-line, along with the required MTU for the port. +e.g. + + ``` + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:mtu_request=9000 + ``` + +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are +increased, such that a full Jumbo Frame may be accommodated inside a single +mbuf segment. Once set, the MTU for a DPDK port is immutable. + +Note that from an OVSDB perspective, the `mtu_request` option for a specific +port may be disregarded once initially set, as subsequent modifications to this +field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK +ports is reported by the `Interface` table's 'mtu' field. + +Jumbo frame support has been validated against 13312B frames, using the +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may +theoretically be supported. Supported port types excludes vHost-Cuse ports, as +that feature is pending deprecation. + +vHost Ports and Jumbo Frames + +Jumbo frame support is available for DPDK vHost-User ports only. Some additional +configuration is needed to take advantage of this feature: + + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in + the QEMU command line snippet below: + + ``` + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' + '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' + ``` + + 2. Where virtio devices are bound to the Linux kernel driver in a guest + environment (i.e. interfaces are not bound to an in-guest DPDK driver), the + MTU of those logical network interfaces must also be increased. This + avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers + to the length of the IP packet only, and not that of the entire frame. + + e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2 + header and CRC lengths (i.e. 18B) from the max supported frame size. + So, to set the MTU for a 13312B Jumbo Frame: + + ``` + ifconfig eth1 mtu 13294 + ``` + Restrictions: - - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. - Currently DPDK port does not make use any offload functionality. - DPDK-vHost support works with 1G huge pages. @@ -922,6 +975,11 @@ Restrictions: the next release of DPDK (which includes the above patch) is available and integrated into OVS. + Jumbo Frames: + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK + issue that is currently being investigated. + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports. + Bug Reporting: -- diff --git a/NEWS b/NEWS index 3e33871..43127f9 100644 --- a/NEWS +++ b/NEWS @@ -10,10 +10,11 @@ Post-v2.5.0 - DPDK: * New option "n_rxq" for PMD interfaces. Old 'other_config:n-dpdk-rxqs' is no longer supported. + * Support Jumbo Frames + - ovs-benchmark: This utility has been removed due to lack of use and bitrot. - v2.5.0 - xx xxx - - Dropped support for Python older than version 2.7. As a consequence, diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 2a06bb5..ac89ee6 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -77,6 +77,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + sizeof(struct dp_packet)\ + RTE_PKTMBUF_HEADROO
[ovs-dev] failing of rte_eth_dev_start function
Hi Team, When I am trying to call rte_eth_dev_start() function for kni interface it is failing. Can you please let me know what are all the reasons responsible for failing of rte_eth_dev_start() function. Thanks & Regards, Ravali The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovn: Add a section on containers in OVN Tutorial
Signed-Off-by: Numan Siddique --- tutorial/OVN-Tutorial.md | 77 tutorial/automake.mk | 7 ++- tutorial/ovn/env7/add-container-ports.sh | 60 + tutorial/ovn/env7/packet1.sh | 19 tutorial/ovn/env7/packet2.sh | 19 tutorial/ovn/env7/setup.sh | 36 +++ 6 files changed, 216 insertions(+), 2 deletions(-) create mode 100755 tutorial/ovn/env7/add-container-ports.sh create mode 100755 tutorial/ovn/env7/packet1.sh create mode 100755 tutorial/ovn/env7/packet2.sh create mode 100755 tutorial/ovn/env7/setup.sh diff --git a/tutorial/OVN-Tutorial.md b/tutorial/OVN-Tutorial.md index 1188faa..2bcfc59 100644 --- a/tutorial/OVN-Tutorial.md +++ b/tutorial/OVN-Tutorial.md @@ -709,6 +709,78 @@ though. perspective and also provides an example of what the resulting OpenFlow flows look like. +7) Container Ports +-- + +OVN supports containers running directly on the hypervisors and running +containers inside VMs. This example shows how OVN supports network +virtualization to containers when run inside VMs. Details about how to use +docker containers in OVS can be found [here][openvswitch-docker]. + +To support container traffic created inside a VM and to distinguish network +traffic coming from different container vifs, for each container a logical +port needs to be created with parent name set to the VM's logical port and +the tag set to the vlan tag of the container vif. + +Start with a simple logical switch with 3 logical ports. + +[View ovn/env7/setup.sh][env7setup]. + +$ ovn/env7/setup.sh + +Lets create a container vif attached to the logical port 'sw0-port1' and +another container vif attached to the logical port 'sw0-port2'. + +[View ovn/env7/add-container-ports.sh][env7contports] + +$ ovn/env7/add-container-ports.sh + +Run the `ovn-nbctl` command to see the logical ports + +$ovn-nbctl show + + +As you can see a logical port 'csw0-cport1' is created on a logical +switch 'csw0' whose parent is 'sw0-port1' and it has tag set to 42. +And a logical port 'csw0-cport2' is created on the logical switch 'csw0' +whose parent is 'sw0-port2' and it has tag set to 43. + +Bridge 'br-vmport1' represents the ovs bridge running inside the VM +connected to the logical port 'sw0-port1'. In this tutorial the ovs port +to 'sw0-port1' is created as a patch port with its peer connected to the +ovs bridge 'br-vmport1'. An ovs port 'cport1' is added to 'br-vmport1' +which represents the container interface connected to the ovs bridge +and vlan tag set to 42. Similarly 'br-vmport2' represents the ovs bridge +for the logical port 'sw0-port2' and 'cport2' connected to 'br-vmport2' +with vlan tag set to 43. + +This first trace shows a packet from 'csw0-port1' with a destination mac +address of 'csw0-port2'. You can see ovs bridge of the vm 'br-vmport1' tags +the traffic with vlan id 42 and the traffic reaches to the br-int because +of the patch port. As you can see below `ovn-controller` has added a flow +to strip the vlan tag and set the reg6 and metadata appropriately. + +$ ovs-ofctl -O OpenFlow13 dump-flows br-int +OFPST_FLOW reply (OF1.3) (xid=0x2): +cookie=0x0, duration=2767.032s, table=0, n_packets=0, n_bytes=0, priority=150,in_port=3,dl_vlan=42 actions=pop_vlan,set_field:0x3->reg5,set_field:0x2->metadata,set_field:0x1->reg6,resubmit(,16) +cookie=0x0, duration=2767.002s, table=0, n_packets=0, n_bytes=0, priority=150,in_port=4,dl_vlan=43 actions=pop_vlan,set_field:0x4->reg5,set_field:0x2->metadata,set_field:0x2->reg6,resubmit(,16) +cookie=0x0, duration=2767.032s, table=0, n_packets=0, n_bytes=0, priority=100,in_port=3 actions=set_field:0x1->reg5,set_field:0x1->metadata,set_field:0x1->reg6,resubmit(,16) +cookie=0x0, duration=2767.001s, table=0, n_packets=0, n_bytes=0, priority=100,in_port=4 actions=set_field:0x2->reg5,set_field:0x1->metadata,set_field:0x2->reg6,resubmit(,16) + +[View ovn/env7/packet1.sh][env7packet1]. + +$ ovn/env5/packet1.sh + + +The second trace shows a packet from 'csw0-port2' to 'csw0-port1'. + +[View ovn/env7/packet2.sh][env7packet2]. + +$ ovn/env5/packet1.sh + +You can extend this setup by adding additional container ports with two +hypervisors. Please see the tutorial 3 above. + [ovn-architecture(7)]:http://openvswitch.org/support/dist-docs/ovn-architecture.7.html [Tutorial.md]:https://github.com/openvswitch/ovs/blob/master/tutorial/Tutorial.md [ovn-nb(5)]:http://openvswitch.org/support/dist-docs/ovn-nb.5.html @@ -742,4 +814,9 @@ look like. [env5packet2]:https://github.com/openvswitch/ovs/blob/master/tutorial/ovn/env5/packet2.sh [env6setup]:https://github.com/openvswitch/ovs/blob/master/tutorial/ovn/env6/setup.sh [env6acls]:https://github.com/openvswitch/ovs/blob/master/tutorial/ovn/env6/add-acls.sh +[env7setup]:https://github.com/openvswitch/ovs/blob/master/tutorial/ovn/env7/
[ovs-dev] Invoice FEB-93645071
Good morning, Please see the attached invoice and remit payment according to the terms listed at the bottom of the invoice. If you have any questions please let us know. Thank you! Anastasia Dotson Accounting Specialist ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] {RFC] Events that should trigger restarting incremental processing
All- Having chatted with Ben on IRC late yesterday afternoon, I sat down and made a list of events that would require incremental processing to restart at sequence number 0 because they might change how a Logical Flow row gets mapped to OF flows without the Logical Flow changing. Looking at the code, I currently see three events that should trigger a restart: 1) the list of logical_datapaths changes 2) the list of local_datapaths changes 3) the simap of ports associated with a logical_datapath changes My question to the list is: "can anybody think of any other events that should be added to the above?" In the meantime, I'm off looking at what it might take to detect said changes and restart the incremental processing. Ryan (regXboi) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [OVN] Applying ACL changes to existing connections
On 02/17/2016 09:12 PM, Justin Pettit wrote: > >> On Feb 5, 2016, at 1:30 PM, Russell Bryant wrote: >> >> >> Thank you for the write-up! This approach sounds great to me. Some >> small questions... >> >> 1) If we're only using 1 bit for now, is there any reason to use >> ct_label over ct_mark? The docs in ovs-ofctl(8) seem to suggest they're >> identical other than being 32-bit vs 128-bit. Would using the 32-bit >> ct_mark be beneficial in any way instead? > > I think ct_label is intended more for this bit-level twiddling, > whereas ct_mark is usually treated as a single 32-bit number. > Clearly, this doesn't really matter in practice, since OVS interfaces > with them the same. Also, I figure that we're going to need to > identify the ACL rule at some point, and I've been thinking ct_mark > is what we'd use for that. > It seems the most convenient way to map conntrack entries to ACL entries is by UUID (128-bit), so ct_label would certainly be convenient for that. That'd leave us 32-bits for state - "32 bits of state should be enough for anyone!". We could easily change it, at least in the short term. Eventually it will become an upgrade pain point if we wanted to change it. >> 2) One thing not explicitly addressed in this write-up is traffic marked >> as related. I think the proposal means just adding a match on >> ct_label=0x1 where we match ct_state=+rel today and we just rely on a >> packet in the request direction of the main connection to set ct_label. >> That seems fine, but I wanted to clarify that point. > > That's a good point. Do you know if the existing OpenStack integration > handles related flows? It doesn't look like they're handled explicitly. What it's doing is pretty brute force. It actually looks like it's deleting things based on IP addresses, including connections that should still be allowed. I imagine the current code can result in a few dropped packets when rules change, depending on how rules are configured... -- Russell Bryant ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [OVN] Applying ACL changes to existing connections
On 02/18/2016 07:37 PM, Joe Stringer wrote: > On 17 February 2016 at 18:12, Justin Pettit wrote: >> >>> On Feb 5, 2016, at 1:30 PM, Russell Bryant wrote: >>> >>> >>> Thank you for the write-up! This approach sounds great to me. Some >>> small questions... >>> >>> 1) If we're only using 1 bit for now, is there any reason to use >>> ct_label over ct_mark? The docs in ovs-ofctl(8) seem to suggest they're >>> identical other than being 32-bit vs 128-bit. Would using the 32-bit >>> ct_mark be beneficial in any way instead? >> >> I think ct_label is intended more for this bit-level twiddling, whereas >> ct_mark is usually treated as a single 32-bit number. Clearly, this doesn't >> really matter in practice, since OVS interfaces with them the same. Also, I >> figure that we're going to need to identify the ACL rule at some point, and >> I've been thinking ct_mark is what we'd use for that. >> >>> 2) One thing not explicitly addressed in this write-up is traffic marked >>> as related. I think the proposal means just adding a match on >>> ct_label=0x1 where we match ct_state=+rel today and we just rely on a >>> packet in the request direction of the main connection to set ct_label. >>> That seems fine, but I wanted to clarify that point. >> >> That's a good point. Do you know if the existing OpenStack integration >> handles related flows? >> >> Jarno or Joe, do you know how the connection tracker handles children of >> deleted flows? There's the following comment in nl_ct_flush() in >> lib/netlink-conntrack.c: >> >> /* Expectations are flushed automatically, because they do not >> * have a master connection anymore */ >> >> If we delete a specific entry, will its children get removed, too? If >> that's true, this problem shouldn't be that difficult--although it is a >> little ugly. I think we just need to have ovn-controller periodically sweep >> through the connection tracking entries looking for that bit, and blowing >> them away. > > No. The parent connection holds a list of expected connections, and > when it is cleaned up, those expectations are cleaned up as well. > However, once an expectation has been fulfilled by packets matching > the expectation, if you commit that connection then it has a life of > its own. I don't see any link from parent to child connections in the > conntrack structures, so it seems unlikely that this is possible with > current APIs. > > I think the way to do this currently is to uniquely identify > connection families with a ct_mark, and delete connections based on > this identifier. Right now, I don't think we ever do ct(commit) for a packet with the related state bit set. In that case, if we set ct_mark/ct_label on the parent connection, will we see that value set when we see another related packet? I think that will result in the behavior we're looking for ... -- Russell Bryant ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH/RFC] openvswitch: loosen restriction of output of MPLS to tunnel vports
On Thu, Feb 18, 2016 at 11:59 PM, Simon Horman wrote: > On Tue, Feb 16, 2016 at 02:53:39PM -0800, Jesse Gross wrote: >> On Fri, Feb 12, 2016 at 11:25 AM, Simon Horman >> wrote: >> > If an skb was not MPLS initially then it may be GSO and in that case if it >> > became MPLS then GSO can't be performed because both MPLS and tunnels make >> > use of the inner_protocol field of struct skbuff in order to allow GSO to >> > be performed in the inner packet. >> > >> > On the other hand if an skb was MPLS initially then it will not be GSO, >> > as there is no support for GRO for MPLS. Thus in this case it is safe >> > to allow output of MPLS on tunnel vports. >> > >> > Signed-off-by: Simon Horman >> >> I don't think that any tunnel implementations expose support for MPLS >> offloads as part of their features. In that case, if we have an MPLS >> GSO packet (regardless of how it came to be), I think it will be >> broken apart in software before encapsulation. At that point, it >> should be safe for the tunnel to overwrite any fields MPLS was >> previously using for offloading. As a result, I believe we can allow >> all combinations of MPLS with tunnels. (Note that historically this >> wasn't true, the change is a result of lightweight tunnels.) > > Hi Jesse, > > wow, that does sound very promising. > I would certainly be in favour of allowing MPLS with tunnels. > > I am wondering if you could point me in the general direction of the changes > you mention above. I actually don't think that any changes are really needed to make this work. All we should have to do is remove the checks similar to the one you are modifying here in flow_netlink.c and the the runtime one in actions.c. It's probably good to look through the overall code path to double check but as long as tunnel devices don't enable support for MPLS offloading by the time that encapsulation happens MPLS should really just look like any other packet received off the wire. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v4] netdev_dpdk.c: Add QoS functionality.
This patch provides the modifications required in netdev-dpdk.c and vswitch.xml to allow for a DPDK user space QoS algorithm. This patch adds a QoS configuration structure for netdev-dpdk and expected QoS operations 'dpdk_qos_ops'. Various helper functions are also supplied. Also included are the modifications required for vswitch.xml to allow a new QoS implementation for netdev-dpdk devices. This includes a new QoS type `egress-policer` as well as its expected QoS table entries. The QoS functionality implemented for DPDK devices is `egress-policer`. This can be used to drop egress packets at a configurable rate. The INSTALL.DPDK.md guide has also been modified to provide an example configuration of `egress-policer` QoS. Signed-off-by: Ian Stokes --- v4: *INSTALL.DPDK -Remove unneeded "--" in ovs-vsctl destroy QoS command. *NEWS -Add QoS functionality to DPDK in post 2.5 section *vswitch.xml -Remove references to srtcm & DPDK sample app from egress-policer description. *netdev-dpdk.c -Remove DPDK_MAX_QOS_NAME_SIZE. -Replace 'qos_alg_process()' with 'qos_run()'. -Replace 'netdev_dpdk_qos_process__()' with 'netdev_dpdk_qos_run__()'. -Replace 'egress_policer_process()' with 'egress_policer_run()'. -Remove enum egress_policer_action to avoid mixed internal & external declarations. -Replace strncmp with strcmp in qos_lookup_name function. -Simplify 'egress_pkt_handle' to make direct reference to e_RTE_METER_GREEN instead of using enum. -egress_policer_run() fixed so as not to return incorrect packet count containing freed packets. -In netdev_dpdk_set_qos() add mutex lock and unlock. -In netdev_dpdk_set_qos() treat 'type' not found or 'type'=="" equally and delete QoS configuration. --- INSTALL.DPDK.md | 20 +++ NEWS |1 + lib/netdev-dpdk.c| 428 -- vswitchd/vswitch.xml | 47 ++ 4 files changed, 483 insertions(+), 13 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index d892788..d8e04dd 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -207,6 +207,26 @@ Using the DPDK with ovs-vswitchd: ./ovs-ofctl add-flow br0 in_port=2,action=output:1 ``` +8. QoS usage example + + Assuming you have a vhost-user port transmitting traffic consisting of + packets of size 64 bytes, the following command would limit the egress + transmission rate of the port to ~1,000,000 packets per second: + + `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create qos + type=egress-policer other-config:cir=4600 other-config:cbs=2048` + + To examine the QoS configuration of the port: + + `ovs-appctl -t ovs-vswitchd qos/show vhost-user0` + + To clear the QoS configuration from the port and ovsdb use the following: + + `ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos` + + For more details regarding egress-policer parameters please refer to the + vswitch.xml. + Performance Tuning: --- diff --git a/NEWS b/NEWS index 18fca10..484975f 100644 --- a/NEWS +++ b/NEWS @@ -10,6 +10,7 @@ Post-v2.5.0 - DPDK: * New option "n_rxq" for PMD interfaces. Old 'other_config:n-dpdk-rxqs' is no longer supported. + * QoS functionality with sample egress-policer implementation. - ovs-benchmark: This utility has been removed due to lack of use and bitrot. - ovs-appctl: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index e4f789b..e6673f4 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -45,6 +45,7 @@ #include "ovs-rcu.h" #include "packets.h" #include "shash.h" +#include "smap.h" #include "sset.h" #include "unaligned.h" #include "timeval.h" @@ -53,6 +54,7 @@ #include "rte_config.h" #include "rte_mbuf.h" +#include "rte_meter.h" #include "rte_virtio_net.h" VLOG_DEFINE_THIS_MODULE(dpdk); @@ -143,6 +145,101 @@ static int rte_eal_init_ret = ENODEV; static struct ovs_mutex dpdk_mutex = OVS_MUTEX_INITIALIZER; +/* Quality of Service */ + +/* An instance of a QoS configuration. Always associated with a particular + * network device. + * + * Each QoS implementation subclasses this with whatever additional data it + * needs. + */ +struct qos_conf { +const struct dpdk_qos_ops *ops; +}; + +/* A particular implementation of dpdk QoS operations. + * + * The functions below return 0 if successful or a positive errno value on + * failure, except where otherwise noted. All of them must be provided, except + * where otherwise noted. + */ +struct dpdk_qos_ops { + +/* Name of the QoS type */ +const char *qos_name; + +/* Called to construct the QoS implementation on 'netdev'. The + * implementation should make the appropriate calls to configure QoS + * according to 'details'. The implementation may assume that any current + * QoS configuration already installed should be destroyed before + * constructing the new configuration. + * + * The contents of 'details' should be documented as valid
[ovs-dev] [PATCH v7 3/6] [ovn-controller] Make flow table persistent
From: RYAN D. MOATS This is a prerequisite for incremental processing. Signed-off-by: RYAN D. MOATS --- ovn/controller/ofctrl.c | 118 +++ ovn/controller/ofctrl.h |2 + ovn/controller/ovn-controller.c |4 +- 3 files changed, 87 insertions(+), 37 deletions(-) diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c index 3297fb3..7280c8b 100644 --- a/ovn/controller/ofctrl.c +++ b/ovn/controller/ofctrl.c @@ -484,16 +484,53 @@ ofctrl_add_flow(struct hmap *desired_flows, f->ofpacts_len = actions->size; f->hmap_node.hash = ovn_flow_hash(f); -if (ovn_flow_lookup(desired_flows, f)) { -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); -if (!VLOG_DROP_INFO(&rl)) { -char *s = ovn_flow_to_string(f); -VLOG_INFO("dropping duplicate flow: %s", s); -free(s); -} +/* loop through all the flows to see if there is an old flow to be + * removed - do so if the old flow has the same priority, table, and match + * but a different action or if the old flow has the same priority, table + * and action, but the new match is either a superset or subset of the + * old match */ + +struct ovn_flow *d, *next; +HMAP_FOR_EACH_SAFE (d, next, hmap_node, desired_flows) { +if (f->table_id == d->table_id && f->priority == d->priority) { +if ((match_equal(&f->match, &d->match) + && !ofpacts_equal(f->ofpacts, f->ofpacts_len, + d->ofpacts, d->ofpacts_len)) +|| (ofpacts_equal(f->ofpacts, f->ofpacts_len, + d->ofpacts, d->ofpacts_len) +&& ((flow_wildcards_has_extra(&f->match.wc,&d->match.wc) + && flow_equal_except(&f->match.flow, &d->match.flow, + &f->match.wc)) +|| (flow_wildcards_has_extra(&d->match.wc,&f->match.wc) +&& flow_equal_except(&d->match.flow, + &f->match.flow, + &d->match.wc) { +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); +if (!VLOG_DROP_INFO(&rl)) { +char *s = ovn_flow_to_string(d); +VLOG_INFO("removing superceded flow: %s", s); +free(s); +} -ovn_flow_destroy(f); -return; +hmap_remove(desired_flows, &d->hmap_node); +ovn_flow_destroy(d); +} + +/* if this is a complete duplicate, remove the new flow */ +if (match_equal(&f->match, &d->match) +&& ofpacts_equal(f->ofpacts, f->ofpacts_len, + d->ofpacts, d->ofpacts_len)) { +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5); +if (!VLOG_DROP_INFO(&rl)) { +char *s = ovn_flow_to_string(f); +VLOG_INFO("dropping duplicate flow: %s", s); +free(s); +} + +ovn_flow_destroy(f); +return; +} +} } hmap_insert(desired_flows, &f->hmap_node, f->hmap_node.hash); @@ -501,6 +538,20 @@ ofctrl_add_flow(struct hmap *desired_flows, /* ovn_flow. */ +/* duplicate an ovn_flow structure */ +struct ovn_flow * +ofctrl_dup_flow(struct ovn_flow *source) +{ +struct ovn_flow *answer = xmalloc(sizeof *answer); +answer->table_id = source->table_id; +answer->priority = source->priority; +answer->match = source->match; +answer->ofpacts = xmemdup(source->ofpacts, source->ofpacts_len); +answer->ofpacts_len = source->ofpacts_len; +answer->hmap_node.hash = ovn_flow_hash(answer); +return answer; +} + /* Returns a hash of the key in 'f'. */ static uint32_t ovn_flow_hash(const struct ovn_flow *f) @@ -595,19 +646,16 @@ queue_flow_mod(struct ofputil_flow_mod *fm) * flows from 'flow_table' and frees them. (The hmap itself isn't * destroyed.) * - * This called be called be ofctrl_run() within the main loop. */ + * This can be called by ofctrl_run() within the main loop. */ void ofctrl_put(struct hmap *flow_table) { /* The flow table can be updated if the connection to the switch is up and * in the correct state and not backlogged with existing flow_mods. (Our * criteria for being backlogged appear very conservative, but the socket - * between ovn-controller and OVS provides some buffering.) Otherwise, - * discard the flows. A solution to either of those problems will cause us - * to wake up and retry. */ + * between ovn-controller and OVS provides some buffering.) */ if (state != S_UPDATE_FLOWS || rconn_packet_counter_n_packets(tx_counter)) { -ovn_flow_table_
[ovs-dev] [PATCH v7 2/6] [ovn-controller] Reverse change tracking order
From: RYAN D. MOATS Currently changes are added to the front of the track list, so they are looped through in LIFO order. Incremental processing is more efficient with a FIFO presentation, so add changes to the back of the track list. --- lib/ovsdb-idl.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index 4cb1c81..c973d37 100644 --- a/lib/ovsdb-idl.c +++ b/lib/ovsdb-idl.c @@ -1351,8 +1351,8 @@ ovsdb_idl_row_change__(struct ovsdb_idl_row *row, const struct json *row_json, = row->table->idl->change_seqno + 1; if (table->modes[column_idx] & OVSDB_IDL_TRACK) { if (list_is_empty(&row->track_node)) { -list_push_front(&row->table->track_list, -&row->track_node); +list_push_back(&row->table->track_list, + &row->track_node); } if (!row->updated) { row->updated = bitmap_allocate(class->n_columns); @@ -1572,7 +1572,7 @@ ovsdb_idl_row_destroy(struct ovsdb_idl_row *row) = row->table->idl->change_seqno + 1; } if (list_is_empty(&row->track_node)) { -list_push_front(&row->table->track_list, &row->track_node); +list_push_back(&row->table->track_list, &row->track_node); } } } -- 1.7.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v7 4/6] [ovn-controller] Persist ports simap in logical_datapath
From: RYAN D. MOATS Persist across runs so that a change to this simap can be used as a trigger for resetting incremental processing. Signed-off-by: RYAN D. MOATS --- ovn/controller/lflow.c | 18 -- 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index d53213c..18f0970 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -231,21 +231,27 @@ static void ldp_run(struct controller_ctx *ctx) { struct logical_datapath *ldp; -HMAP_FOR_EACH (ldp, hmap_node, &logical_datapaths) { -simap_clear(&ldp->ports); -} +//HMAP_FOR_EACH (ldp, hmap_node, &logical_datapaths) { +//simap_clear(&ldp->ports); +//} const struct sbrec_port_binding *binding; SBREC_PORT_BINDING_FOR_EACH (binding, ctx->ovnsb_idl) { struct logical_datapath *ldp = ldp_lookup_or_create(binding->datapath); - -simap_put(&ldp->ports, binding->logical_port, binding->tunnel_key); +struct simap_node *old = simap_find(&ldp->ports, +binding->logical_port); +if (!old || old->data != binding->tunnel_key) { +simap_put(&ldp->ports, binding->logical_port, binding->tunnel_key); +} } const struct sbrec_multicast_group *mc; SBREC_MULTICAST_GROUP_FOR_EACH (mc, ctx->ovnsb_idl) { struct logical_datapath *ldp = ldp_lookup_or_create(mc->datapath); -simap_put(&ldp->ports, mc->name, mc->tunnel_key); +struct simap_node *old = simap_find(&ldp->ports, mc->name); +if (!old || old->data != mc->tunnel_key) { +simap_put(&ldp->ports, mc->name, mc->tunnel_key); +} } struct logical_datapath *next_ldp; -- 1.7.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v7 5/6] [ovn-controller] Persist logical_datapaths
From: RYAN D. MOATS Persist logical_datapaths across runs so that a change can be used as a trigger to reset incremental flow processing. Signed-off-by: RYAN D. MOATS --- ovn/controller/ovn-controller.c | 14 +- 1 files changed, 1 insertions(+), 13 deletions(-) diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 5a4174e..c34dce9 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -206,6 +206,7 @@ main(int argc, char *argv[]) int retval; struct hmap flow_table = HMAP_INITIALIZER(&flow_table); +struct hmap local_datapaths = HMAP_INITIALIZER(&local_datapaths); ovs_cmdl_proctitle_init(argc, argv); set_program_name(argv[0]); @@ -282,7 +283,6 @@ main(int argc, char *argv[]) /* Contains bare "struct hmap_node"s whose hash values are the tunnel_key * of datapaths with at least one local port binding. */ -struct hmap local_datapaths = HMAP_INITIALIZER(&local_datapaths); const struct ovsrec_bridge *br_int = get_br_int(&ctx); const char *chassis_id = get_chassis_id(ctx.ovs_idl); @@ -309,18 +309,6 @@ main(int argc, char *argv[]) ofctrl_put(&flow_table); } -/* local_datapaths contains bare hmap_node instances. - * We use this wrapper so that we can make use of - * HMAP_FOR_EACH_SAFE to tear down the hmap. */ -struct { -struct hmap_node node; -} *cur_node, *next_node; -HMAP_FOR_EACH_SAFE (cur_node, next_node, node, &local_datapaths) { -hmap_remove(&local_datapaths, &cur_node->node); -free(cur_node); -} -hmap_destroy(&local_datapaths); - unixctl_server_run(unixctl); unixctl_server_wait(unixctl); -- 1.7.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v7 6/6] [ovn-controller] Add incremental logical flow proessing
From: RYAN D. MOATS This code changes lflow_run to do incremental process of the logical flow table rather than processing the full table each run. Signed-off-by: RYAN D. MOATS --- ovn/controller/binding.c|2 + ovn/controller/lflow.c | 53 +++--- ovn/controller/lflow.h |7 +++- ovn/controller/ofctrl.c |4 +- ovn/controller/ofctrl.h |2 + ovn/controller/ovn-controller.c | 14 +- 6 files changed, 72 insertions(+), 10 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index cb12cea..24c89f4 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -15,6 +15,7 @@ #include #include "binding.h" +#include "lflow.h" #include "lib/bitmap.h" #include "lib/hmap.h" @@ -132,6 +133,7 @@ add_local_datapath(struct hmap *local_datapaths, ld = xmalloc(sizeof *ld); hmap_insert(local_datapaths, ld, binding_rec->datapath->tunnel_key); +reset_flow_processing(); } } diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index 18f0970..f6eb24a 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -174,6 +174,13 @@ struct logical_datapath { enum ldp_type type; /* Type of logical datapath */ }; +bool restart_flow_processing = false; + +void +reset_flow_processing(void) { +restart_flow_processing = true; +} + /* Contains "struct logical_datapath"s. */ static struct hmap logical_datapaths = HMAP_INITIALIZER(&logical_datapaths); @@ -206,6 +213,7 @@ ldp_create(const struct sbrec_datapath_binding *binding) const char *ls = smap_get(&binding->external_ids, "logical-switch"); ldp->type = ls ? LDP_TYPE_SWITCH : LDP_TYPE_ROUTER; simap_init(&ldp->ports); +reset_flow_processing(); return ldp; } @@ -222,6 +230,7 @@ ldp_free(struct logical_datapath *ldp) simap_destroy(&ldp->ports); hmap_remove(&logical_datapaths, &ldp->hmap_node); free(ldp); +reset_flow_processing(); } /* Iterates through all of the records in the Port_Binding table, updating the @@ -242,6 +251,7 @@ ldp_run(struct controller_ctx *ctx) binding->logical_port); if (!old || old->data != binding->tunnel_key) { simap_put(&ldp->ports, binding->logical_port, binding->tunnel_key); +reset_flow_processing(); } } @@ -251,6 +261,7 @@ ldp_run(struct controller_ctx *ctx) struct simap_node *old = simap_find(&ldp->ports, mc->name); if (!old || old->data != mc->tunnel_key) { simap_put(&ldp->ports, mc->name, mc->tunnel_key); +reset_flow_processing(); } } @@ -279,22 +290,53 @@ lflow_init(void) /* Translates logical flows in the Logical_Flow table in the OVN_SB database * into OpenFlow flows. See ovn-architecture(7) for more information. */ -void +unsigned int lflow_run(struct controller_ctx *ctx, struct hmap *flow_table, const struct simap *ct_zones, - struct hmap *local_datapaths) + struct hmap *local_datapaths, + unsigned int seqno) { struct hmap flows = HMAP_INITIALIZER(&flows); uint32_t conj_id_ofs = 1; +unsigned int processed_seqno = seqno; ldp_run(ctx); +if (restart_flow_processing) { +seqno = 0; +ovn_flow_table_clear(flow_table); +restart_flow_processing = false; +} + const struct sbrec_logical_flow *lflow; -SBREC_LOGICAL_FLOW_FOR_EACH (lflow, ctx->ovnsb_idl) { -/* Find the "struct logical_datapath" associated with this +SBREC_LOGICAL_FLOW_FOR_EACH_TRACKED (lflow, ctx->ovnsb_idl) { +unsigned int del_seqno = sbrec_logical_flow_row_get_seqno(lflow, +OVSDB_IDL_CHANGE_DELETE); +unsigned int mod_seqno = sbrec_logical_flow_row_get_seqno(lflow, +OVSDB_IDL_CHANGE_MODIFY); +unsigned int ins_seqno = sbrec_logical_flow_row_get_seqno(lflow, +OVSDB_IDL_CHANGE_INSERT); +if (del_seqno <= seqno && mod_seqno <= seqno && ins_seqno <= seqno) { +continue; +} +if (del_seqno > 0) { +if (del_seqno > processed_seqno) { +processed_seqno = del_seqno; +} +/* we really don't want to re-process a deleted record so */ +continue; +} +if (mod_seqno > processed_seqno) { +processed_seqno = mod_seqno; +} +if (ins_seqno > processed_seqno) { +processed_seqno = ins_seqno; +} + +/* Find the "struct logical_datapath" asssociated with this * Logical_Flow row. If there's no such struct, that must be because * no logical ports are bound to that logical datapath, so there's no - * point in maintaining any flows for it anyway, so skip it. */ + * point in maintaining any flows for it anyway, so stop
[ovs-dev] [PATCH v7 1/6] [ovn-controller] Add useful information to ovn E2E test
From: RYAN D. MOATS Make test 1737 output the OF flows from all three hypervisors to help in case something goes wrong. Signed-off-by: RYAN D. MOATS --- tests/ovn.at | 12 1 files changed, 12 insertions(+), 0 deletions(-) diff --git a/tests/ovn.at b/tests/ovn.at index f4117b6..e13a9c3 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -559,6 +559,18 @@ ovn_populate_arp sleep 1 ovn-sbctl dump-flows -- list multicast_group +echo "-- hv1 dump --" +as hv1 ovs-vsctl show +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int + +echo "-- hv2 dump --" +as hv2 ovs-vsctl show +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int + +echo "-- hv3 dump --" +as hv3 ovs-vsctl show +as hv3 ovs-ofctl -O OpenFlow13 dump-flows br-int + # Given the name of a logical port, prints the name of the hypervisor # on which it is located. vif_to_hv() { -- 1.7.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v7 0/6] [ovn-controller] Add incremental logical flow processing
From: RYAN D. MOATS This series of patches changes lflow_run from processing the entire southbound logical flow table to doing incremental processing during each run. To make this work, several pre-requisites are necessary, and these are contained in the initial patches of the series. RYAN D. MOATS (6): Add useful information to ovn E2E test Reverse change tracking order Make flow table persistent in ovn controller Persist ports simap in logical_datapath Persist logical_datapaths Add incremental proessing lib/ovsdb-idl.c |6 +- ovn/controller/binding.c|2 + ovn/controller/lflow.c | 71 +++ ovn/controller/lflow.h |7 ++- ovn/controller/ofctrl.c | 122 +++ ovn/controller/ofctrl.h |4 + ovn/controller/ovn-controller.c | 32 +- tests/ovn.at| 12 8 files changed, 187 insertions(+), 69 deletions(-) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] : ovs-appctl dump and flush command for userspace conntrack
On 19 February 2016 at 02:35, wrote: > Hi Joe, > > Thanks for pointing out my mistake. Please find below updated one. > > > m 69e63a45e2773c124deb885bbc3d5deb3e032126 Mon Sep 17 00:00:00 2001 > From: soumyadeep chowdhury > Date: Fri, 19 Feb 2016 07:39:34 -0500 > Subject: [PATCH 4/4] Write the functions for dump-conntrack and > flush-conntrack > > Signed-off-by: Sourabh Bansal While the format of the patch looks closer to correct, I think my primary piece of feedback may have been misunderstood. What does this patch provide which isn't already available via the "ovs-appctl dpctl/{dump,flush}-conntrack" commands? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCHv2 1/6] datapath: vxlan: Relax MTU constraints.
From: David Wragg Upstream commit: Allow the MTU of vxlan devices without an underlying device to be set to larger values (up to a maximum based on IP packet limits and vxlan overhead). Previously, their MTUs could not be set to higher than the conventional ethernet value of 1500. This is a very arbitrary value in the context of vxlan, and prevented vxlan devices from being able to take advantage of jumbo frames etc. The default MTU remains 1500, for compatibility. Signed-off-by: David Wragg Acked-by: Roopa Prabhu Signed-off-by: David S. Miller Upstream: 72564b59ffc4 ("vxlan: Relax MTU constraints") Signed-off-by: Joe Stringer --- datapath/linux/compat/vxlan.c | 36 +--- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/datapath/linux/compat/vxlan.c b/datapath/linux/compat/vxlan.c index dd0015d01c32..f443a1b0352a 100644 --- a/datapath/linux/compat/vxlan.c +++ b/datapath/linux/compat/vxlan.c @@ -1608,29 +1608,43 @@ static void vxlan_set_multicast_list(struct net_device *dev) { } -static int vxlan_change_mtu(struct net_device *dev, int new_mtu) +static int __vxlan_change_mtu(struct net_device *dev, + struct net_device *lowerdev, + struct vxlan_rdst *dst, int new_mtu, bool strict) { - struct vxlan_dev *vxlan = netdev_priv(dev); - struct vxlan_rdst *dst = &vxlan->default_dst; - struct net_device *lowerdev; - int max_mtu; + int max_mtu = IP_MAX_MTU; - lowerdev = __dev_get_by_index(vxlan->net, dst->remote_ifindex); - if (lowerdev == NULL) - return eth_change_mtu(dev, new_mtu); + if (lowerdev) + max_mtu = lowerdev->mtu; if (dst->remote_ip.sa.sa_family == AF_INET6) - max_mtu = lowerdev->mtu - VXLAN6_HEADROOM; + max_mtu -= VXLAN6_HEADROOM; else - max_mtu = lowerdev->mtu - VXLAN_HEADROOM; + max_mtu -= VXLAN_HEADROOM; - if (new_mtu < 68 || new_mtu > max_mtu) + if (new_mtu < 68) return -EINVAL; + if (new_mtu > max_mtu) { + if (strict) + return -EINVAL; + + new_mtu = max_mtu; + } + dev->mtu = new_mtu; return 0; } +static int vxlan_change_mtu(struct net_device *dev, int new_mtu) +{ + struct vxlan_dev *vxlan = netdev_priv(dev); + struct vxlan_rdst *dst = &vxlan->default_dst; + struct net_device *lowerdev = __dev_get_by_index(vxlan->net, +dst->remote_ifindex); + return __vxlan_change_mtu(dev, lowerdev, dst, new_mtu, true); +} + static netdev_tx_t vxlan_dev_xmit(struct sk_buff *skb, struct net_device *dev) { /* Drop All packets coming from networking stack. OVS-CB is -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCHv2 3/6] datapath: Set a large MTU on tunnel devices.
From: David Wragg Upstream commit: Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg Signed-off-by: David S. Miller Upstream: 7e059158d57b ("vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices") Signed-off-by: Joe Stringer --- acinclude.m4 | 1 + datapath/linux/compat/geneve.c | 18 ++ datapath/linux/compat/include/net/ip_tunnels.h | 6 ++ datapath/linux/compat/ip_gre.c | 8 datapath/linux/compat/ip_tunnel.c | 19 --- datapath/linux/compat/vxlan.c | 11 --- datapath/vport-vxlan.c | 2 ++ 7 files changed, 55 insertions(+), 10 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index dc06be6323ee..11c77877d46c 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -354,6 +354,7 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [ OVS_GREP_IFELSE([$KSRC/include/net/ip.h], [IPSKB_FRAG_PMTU], [OVS_DEFINE([HAVE_CORRECT_MRU_HANDLING])]) + OVS_GREP_IFELSE([$KSRC/include/net/ip_tunnels.h], [__ip_tunnel_change_mtu]) OVS_GREP_IFELSE([$KSRC/include/net/inet_frag.h], [hashfn.*const], [OVS_DEFINE([HAVE_INET_FRAGS_CONST])]) OVS_GREP_IFELSE([$KSRC/include/net/inet_frag.h], [last_in], diff --git a/datapath/linux/compat/geneve.c b/datapath/linux/compat/geneve.c index 29a65a0a49d2..50ed3936d038 100644 --- a/datapath/linux/compat/geneve.c +++ b/datapath/linux/compat/geneve.c @@ -1039,11 +1039,21 @@ struct net_device *rpl_geneve_dev_create_fb(struct net *net, const char *name, return dev; err = geneve_configure(net, dev, 0, 0, 0, 0, htons(dst_port), true); - if (err) { - free_netdev(dev); - return ERR_PTR(err); - } + if (err) + goto err; + + /* openvswitch users expect packet sizes to be unrestricted, +* so set the largest MTU we can. +*/ + err = geneve_change_mtu(dev, IP_MAX_MTU); + if (err) + goto err; + return dev; + +err: + free_netdev(dev); + return ERR_PTR(err); } EXPORT_SYMBOL_GPL(rpl_geneve_dev_create_fb); diff --git a/datapath/linux/compat/include/net/ip_tunnels.h b/datapath/linux/compat/include/net/ip_tunnels.h index 3e1ceef21930..5eda8a2e3b0f 100644 --- a/datapath/linux/compat/include/net/ip_tunnels.h +++ b/datapath/linux/compat/include/net/ip_tunnels.h @@ -327,4 +327,10 @@ int rpl_ip_tunnel_get_iflink(const struct net_device *dev); #define ip_tunnel_get_link_net rpl_ip_tunnel_get_link_net struct net *rpl_ip_tunnel_get_link_net(const struct net_device *dev); #endif /* HAVE_METADATA_DST */ + +#ifndef HAVE___IP_TUNNEL_CHANGE_MTU +#define __ip_tunnel_change_mtu rpl___ip_tunnel_change_mtu +int rpl___ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict); +#endif + #endif /* __NET_IP_TUNNELS_H */ diff --git a/datapath/linux/compat/ip_gre.c b/datapath/linux/compat/ip_gre.c index c9197e9652fd..f6a841fbc4d1 100644 --- a/datapath/linux/compat/ip_gre.c +++ b/datapath/linux/compat/ip_gre.c @@ -613,6 +613,14 @@ struct net_device *rpl_gretap_fb_dev_create(struct net *net, const char *name, #endif if (err < 0) goto out; + + /* openvswitch users expect packet sizes to be unrestricted, +* so set the largest MTU we can. +*/ + err = __ip_tunnel_change_mtu(dev, IP_MAX_MTU, false); + if (err) + goto out; + return dev; out: free_netdev(dev); diff --git a/datapath/linux/compat/ip_tunnel.c b/datapath/linux/compat/ip_tunnel.c index 2d4070eccb01..81909370271c 100644 --- a/datapath/linux/compat/ip_tunnel.c +++ b/datapath/linux/compat/ip_tunnel.c @@ -137,18 +137,31 @@ static int ip_tunnel_bind_dev(struct net_device *dev) return mtu; } -int rpl_ip_tunnel_change_mtu(struct net_device *dev, int new_mtu) +int rpl___ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict) { struct ip_tunnel *tunnel = netdev_priv(dev); int t_hlen = tunnel->hlen + sizeof(struct iphdr); + int max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen; - if (new_mtu < 68 || - new_mtu
[ovs-dev] [PATCHv2 2/6] datapath: geneve: Relax MTU constraints.
From: David Wragg Upstream commit: Allow the MTU of geneve devices to be set to large values, in order to exploit underlying networks with larger frame sizes. GENEVE does not have a fixed encapsulation overhead (an openvswitch rule can add variable length options), so there is no relevant maximum MTU to enforce. A maximum of IP_MAX_MTU is used instead. Encapsulated packets that are too big for the underlying network will get dropped on the floor. Signed-off-by: David Wragg Signed-off-by: David S. Miller Upstream: 55e5bfb53cff ("geneve: Relax MTU constraints") Signed-off-by: Joe Stringer --- datapath/linux/compat/geneve.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/datapath/linux/compat/geneve.c b/datapath/linux/compat/geneve.c index 297593ce6e1a..29a65a0a49d2 100644 --- a/datapath/linux/compat/geneve.c +++ b/datapath/linux/compat/geneve.c @@ -742,6 +742,17 @@ static netdev_tx_t geneve_dev_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } +static int geneve_change_mtu(struct net_device *dev, int new_mtu) +{ + /* GENEVE overhead is not fixed, so we can't enforce a more +* precise max MTU. +*/ + if (new_mtu < 68 || new_mtu > IP_MAX_MTU) + return -EINVAL; + dev->mtu = new_mtu; + return 0; +} + static const struct net_device_ops geneve_netdev_ops = { #ifdef HAVE_DEV_TSTATS .ndo_init = geneve_init, @@ -751,7 +762,7 @@ static const struct net_device_ops geneve_netdev_ops = { .ndo_open = geneve_open, .ndo_stop = geneve_stop, .ndo_start_xmit = geneve_dev_xmit, - .ndo_change_mtu = eth_change_mtu, + .ndo_change_mtu = geneve_change_mtu, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address= eth_mac_addr, }; -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCHv2 4/6] datapath: geneve: Refine MTU limit.
From: David Wragg Upstream commit: Calculate the maximum MTU taking into account the size of headers involved in GENEVE encapsulation, as for other tunnel types. Changes in v3: - Correct comment style Changes in v2: - Conform more closely to ip_tunnel_change_mtu - Exclude GENEVE options from max MTU calculation Signed-off-by: David Wragg Acked-by: Jesse Gross Signed-off-by: David S. Miller Upstream: aeee0e66c6b4 ("geneve: Refine MTU limit") Signed-off-by: Joe Stringer --- datapath/linux/compat/geneve.c | 27 ++- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/datapath/linux/compat/geneve.c b/datapath/linux/compat/geneve.c index 50ed3936d038..92feeef0900c 100644 --- a/datapath/linux/compat/geneve.c +++ b/datapath/linux/compat/geneve.c @@ -742,17 +742,34 @@ static netdev_tx_t geneve_dev_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; } -static int geneve_change_mtu(struct net_device *dev, int new_mtu) +static int __geneve_change_mtu(struct net_device *dev, int new_mtu, bool strict) { - /* GENEVE overhead is not fixed, so we can't enforce a more -* precise max MTU. + /* The max_mtu calculation does not take account of GENEVE +* options, to avoid excluding potentially valid +* configurations. */ - if (new_mtu < 68 || new_mtu > IP_MAX_MTU) + int max_mtu = IP_MAX_MTU - GENEVE_BASE_HLEN - sizeof(struct iphdr) + - dev->hard_header_len; + + if (new_mtu < 68) return -EINVAL; + + if (new_mtu > max_mtu) { + if (strict) + return -EINVAL; + + new_mtu = max_mtu; + } + dev->mtu = new_mtu; return 0; } +static int geneve_change_mtu(struct net_device *dev, int new_mtu) +{ + return __geneve_change_mtu(dev, new_mtu, true); +} + static const struct net_device_ops geneve_netdev_ops = { #ifdef HAVE_DEV_TSTATS .ndo_init = geneve_init, @@ -1045,7 +1062,7 @@ struct net_device *rpl_geneve_dev_create_fb(struct net *net, const char *name, /* openvswitch users expect packet sizes to be unrestricted, * so set the largest MTU we can. */ - err = geneve_change_mtu(dev, IP_MAX_MTU); + err = __geneve_change_mtu(dev, IP_MAX_MTU, false); if (err) goto err; -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCHv2 0/6] Loosen tunnel device MTU constraints.
Since the commit e23775f20e1a ("datapath: Add support for lwtunnel"), devices underlying each tunnel type have applied quite strict restrictions on the MTU which can be set for the tunnel device, even if the underlying physical device has no such restriction. This made it impossible to, for instance, send jumbo frames over a tunnel which should otherwise support it. This series loosens the restrictions for these tunnel devices and sets the default MTU to a very high value, meaning that in practice they will handle any size packet that the underlying devices support sending. David Wragg (4): datapath: vxlan: Relax MTU constraints. datapath: geneve: Relax MTU constraints. datapath: Set a large MTU on tunnel devices. datapath: geneve: Refine MTU limit. Joe Stringer (2): datapath: stt: Relax MTU constraints. datapath: lisp: Relax MTU constraints. acinclude.m4 | 1 + datapath/linux/compat/geneve.c | 48 +++--- datapath/linux/compat/include/net/ip_tunnels.h | 6 datapath/linux/compat/ip_gre.c | 8 + datapath/linux/compat/ip_tunnel.c | 19 -- datapath/linux/compat/lisp.c | 16 - datapath/linux/compat/stt.c| 30 +++- datapath/linux/compat/vxlan.c | 47 + datapath/vport-vxlan.c | 2 ++ 9 files changed, 153 insertions(+), 24 deletions(-) -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCHv2 6/6] datapath: lisp: Relax MTU constraints.
Currently, even if the entire path supports jumbo frames, the LISP netdev limits the path MTU to 1500 bytes, and cannot be configured otherwise. Relax the constraints on modifying the device MTU, and set it to the maximum by default. Signed-off-by: Joe Stringer --- v2: Fix device registration leak on change_mtu failure. Fix max_mtu calculation. --- datapath/linux/compat/lisp.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/datapath/linux/compat/lisp.c b/datapath/linux/compat/lisp.c index e5a6a7fe00a4..fa6da6c9b17b 100644 --- a/datapath/linux/compat/lisp.c +++ b/datapath/linux/compat/lisp.c @@ -114,6 +114,7 @@ struct lisphdr { }; #define LISP_HLEN (sizeof(struct udphdr) + sizeof(struct lisphdr)) +#define LISP_MAX_MTU (IP_MAX_MTU - LISP_HLEN - sizeof(struct iphdr)) static inline struct lisphdr *lisp_hdr(const struct sk_buff *skb) { @@ -447,6 +448,15 @@ static netdev_tx_t lisp_dev_xmit(struct sk_buff *skb, struct net_device *dev) #endif } +static int lisp_change_mtu(struct net_device *dev, int new_mtu) +{ + if (new_mtu < 68 || new_mtu > LISP_MAX_MTU) + return -EINVAL; + + dev->mtu = new_mtu; + return 0; +} + static const struct net_device_ops lisp_netdev_ops = { #ifdef HAVE_DEV_TSTATS .ndo_init = lisp_init, @@ -456,7 +466,7 @@ static const struct net_device_ops lisp_netdev_ops = { .ndo_open = lisp_open, .ndo_stop = lisp_stop, .ndo_start_xmit = lisp_dev_xmit, - .ndo_change_mtu = eth_change_mtu, + .ndo_change_mtu = lisp_change_mtu, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address= eth_mac_addr, }; @@ -549,6 +559,10 @@ static int lisp_configure(struct net *net, struct net_device *dev, if (find_dev(net, dst_port)) return -EBUSY; + err = lisp_change_mtu(dev, LISP_MAX_MTU); + if (err) + return err; + err = register_netdevice(dev); if (err) return err; -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCHv2 5/6] datapath: stt: Relax MTU constraints.
Currently, even if the entire path supports jumbo frames, the STT netdev limits the path MTU to 1500 bytes, and cannot be configured otherwise. Relax the constraints on modifying the device MTU, and set it to the maximum by default. Signed-off-by: Joe Stringer --- v2: Use dev->hard_header_len. Follow upstream style. Fix device registration leak on change_mtu failure. --- datapath/linux/compat/stt.c | 30 +- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c index 527dfeeec236..34efc4f56468 100644 --- a/datapath/linux/compat/stt.c +++ b/datapath/linux/compat/stt.c @@ -1683,6 +1683,30 @@ static int stt_stop(struct net_device *dev) return 0; } +static int __stt_change_mtu(struct net_device *dev, int new_mtu, bool strict) +{ + int max_mtu = IP_MAX_MTU - STT_HEADER_LEN - sizeof(struct iphdr) + - dev->hard_header_len; + + if (new_mtu < 68) + return -EINVAL; + + if (new_mtu < 68) { + if (strict) + return -EINVAL; + + new_mtu = max_mtu; + } + + dev->mtu = new_mtu; + return 0; +} + +static int stt_change_mtu(struct net_device *dev, int new_mtu) +{ + return __stt_change_mtu(dev, new_mtu, true); +} + static const struct net_device_ops stt_netdev_ops = { .ndo_init = stt_init, .ndo_uninit = stt_uninit, @@ -1690,7 +1714,7 @@ static const struct net_device_ops stt_netdev_ops = { .ndo_stop = stt_stop, .ndo_start_xmit = stt_dev_xmit, .ndo_get_stats64= ip_tunnel_get_stats64, - .ndo_change_mtu = eth_change_mtu, + .ndo_change_mtu = stt_change_mtu, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address= eth_mac_addr, }; @@ -1782,6 +1806,10 @@ static int stt_configure(struct net *net, struct net_device *dev, if (find_dev(net, dst_port)) return -EBUSY; + err = __stt_change_mtu(dev, IP_MAX_MTU, false); + if (err) + return err; + err = register_netdevice(dev); if (err) return err; -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv2 5/6] datapath: stt: Relax MTU constraints.
On 19 February 2016 at 10:44, Joe Stringer wrote: > Currently, even if the entire path supports jumbo frames, the STT netdev > limits the path MTU to 1500 bytes, and cannot be configured otherwise. > Relax the constraints on modifying the device MTU, and set it to the > maximum by default. > > Signed-off-by: Joe Stringer > --- > v2: Use dev->hard_header_len. > Follow upstream style. > Fix device registration leak on change_mtu failure. I neglected to actually add this fragment into the commit, incremental: diff --git a/datapath/linux/compat/stt.c b/datapath/linux/compat/stt.c index 34efc4f56468..5b6857717091 100644 --- a/datapath/linux/compat/stt.c +++ b/datapath/linux/compat/stt.c @@ -1691,7 +1691,7 @@ static int __stt_change_mtu(struct net_device *dev, int new_mtu, bool strict) if (new_mtu < 68) return -EINVAL; - if (new_mtu < 68) { + if (new_mtu > max_mtu) { if (strict) return -EINVAL; ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] lib/ovs-thread: Add Transactional Memory (TM) support.
The patch (for references, not fully tested yet) shows the preliminary results of enabling RTM (Restricted Transactional Memory). A successful transactional execution elides the lock, i.e., the lock is by-passed, and exposes concurrency. However, transactions might abort due to several reasons such as data conflicts, I/O operations, syscall, etc. When transaction aborts, it falls back to the original locking mechanisms. Thus, the performance improvement depends on the abort rate, and the overhead of speculative execution and rollback. The patch adds OVS_LOCK_ELISION, OVS_TRYLOCK_ELISION, and OVS_UNLOCK_ELISION. Experiments show that for cmap, the TM does not seem to have observable improvements at 2% mutations. For cmap search over 5% mutation, the cmap search time of TM shows better numbers. The most significant improvement is at hmap search, where with TM, there is around x2 or x3 speedup. Results are shown by using test-cmap benchmark. $ tests/ovstest test-cmap benchmark 2000 4 2 1 Benchmarking with n=200, 4 threads, 2.00% mutations, batch size 1: < baseline > < TM > cmap insert 293 ms 293 ms cmap iterate37 ms 37 ms batch search197 ms 186 ms cmap destroy177 ms 166 ms cmap insert 284 ms 283 ms cmap iterate 38 ms 38 ms cmap search 121 ms 120 ms * cmap destroy 168 ms 245 ms hmap insert 141 ms 164 ms hmap iterate 138 ms 134 ms hmap search 780 ms 414 ms * hmap destroy 126 ms 122 ms Benchmarking with n=200, 4 threads, 5.00% mutations, batch size 1: cmap insert 296 ms 298 ms cmap iterate38 ms 38 ms batch search239 ms 332 ms cmap destroy162 ms 159 ms cmap insert 285 ms 284 ms cmap iterate37 ms 37 ms cmap search 143 ms 126 ms * cmap destroy163 ms 159 ms hmap insert 130 ms 128 ms hmap iterate140 ms 134 ms hmap search 1577 ms 489 ms * hmap destroy117 ms 108 ms Benchmarking with n=200, 4 threads, 10.00% mutations, batch size 1: cmap insert 301 ms 295 ms cmap iterate37 ms 37 ms batch search355 ms 380 ms cmap destroy148 ms 145 ms cmap insert 286 ms 288 ms cmap iterate37 ms 37 ms cmap search 240 ms 133 ms * cmap destroy146 ms 148 ms hmap insert 133 ms 145 ms hmap iterate139 ms 134 ms hmap search 2087 ms 575 ms * hmap destroy94 ms 87 ms === Analysis of Abort rate === Linux Perf shows PMU counters for transactional memory as below. $ perf record -e tx-start,tx-abort,tx-commit,tx-conflict The results from 'perf report' of the three TM experiments show: 2% 5% 10% tx-start 10K 14K 16K (count start transaction) tx-abort 7K 13K 14K (count all aborts) tx-commit10K 14K 16K (count commit of transactions) tx-conflict 7K 13K 14K (count aborts due to conflict with other CPUs) Note: While test-cmap benchmark works fine, the patch fails a few cases when doing "make check". Still working on it. Signed-off-by: William Tu --- lib/ovs-thread.c | 107 +++ 1 file changed, 107 insertions(+) diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c index b0e10ee..7be8524 100644 --- a/lib/ovs-thread.c +++ b/lib/ovs-thread.c @@ -52,6 +52,110 @@ static const char *must_not_fork; /* True if we created any threads beyond the main initial thread. */ static bool multithreaded; +/* Intel Transactional Memory (TSX) support */ +#ifdef __RTM__ +#define _XBEGIN_STARTED (~0u) +#define _ABORT_EXPLICIT (1 << 0) +#define _ABORT_RETRY(1 << 1) +#define _ABORT_CONFLICT (1 << 2) +#define _ABORT_CAPACITY (1 << 3) +#define _ABORT_DEBUG(1 << 4) +#define _ABORT_NESTED (1 << 5) +#define _XABORT_CODE(x) (((x) >> 24) & 0xff) +#define _ABORT_LOCK_BUSY0xff + +#define MAX_RETRY_XBEGIN10 +#define MAX_ABORT_RETRY 5 + +#define __force_inline __attribute__((__always_inline__)) inline + +/* See: Intel 64 and IA-32 Architectures Software Developer's Manual + * Instruction Set Reference. */ + +static __force_inline int _xbegin(void) +{ +int ret = _XBEGIN_STARTED; +__asm__ volatile (".byte 0xc7,0xf8 ; .long 0" : "+a" (ret) :: "memory"); +return ret; +} + +static __force_inline void _xend(void) +{ +__asm__ volatile (".byte 0x0f,0x01,0xd5" ::: "memory"); +} + +static __force_inline void _xabort(const unsigned int status) +{ + __asm__ volatile (".byte 0xc6,0xf8,%P0" :: "i" (status) : "memory"); +} + +static __force_inline int _xtest(void) +{ +unsigned char out; +/* return 1 if RTM_ACTIVE = 1 */ +__asm__ volatile (".byte 0x0f,0x01,0xd6 ; setnz %0" : "=r" (out) :: "memory"); +return out; +} + +#define OVS_LOCK_ELISION(mutex) \ +{\ +unsigned status; \ +int i
Re: [ovs-dev] [PATCHv2 0/6] Loosen tunnel device MTU constraints.
On Fri, Feb 19, 2016 at 10:44 AM, Joe Stringer wrote: > Since the commit e23775f20e1a ("datapath: Add support for lwtunnel"), > devices underlying each tunnel type have applied quite strict restrictions > on the MTU which can be set for the tunnel device, even if the underlying > physical device has no such restriction. This made it impossible to, for > instance, send jumbo frames over a tunnel which should otherwise support it. > > This series loosens the restrictions for these tunnel devices and sets the > default MTU to a very high value, meaning that in practice they will > handle any size packet that the underlying devices support sending. Acked-by: Jesse Gross ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] rhel: Add 'rpm-fedora' and 'rpm-fedora-kmod' targets
Add make targets for Fedora and RHEL7 RPMs, update INSTALL.Fedora.md to document their use Added distribution tarball and rpm build directory to .gitignore. Signed-off-by: Lance Richardson --- .gitignore| 2 + INSTALL.Fedora.md | 197 +++--- rhel/automake.mk | 18 + 3 files changed, 134 insertions(+), 83 deletions(-) diff --git a/.gitignore b/.gitignore index 97af69a..dbf33d2 100644 --- a/.gitignore +++ b/.gitignore @@ -69,3 +69,5 @@ odp-netlink.h OvsDpInterface.h /.vagrant/ testsuite.tmp.orig +/rpm/ +/openvswitch-*.tar.gz diff --git a/INSTALL.Fedora.md b/INSTALL.Fedora.md index 09defec..588760a 100644 --- a/INSTALL.Fedora.md +++ b/INSTALL.Fedora.md @@ -1,91 +1,120 @@ How to Install Open vSwitch on Fedora Linux === -This document describes how to build and install Open vSwitch on a Fedora -Linux host. If you want to install Open vSwitch on a generic Linux host, -see [INSTALL.md] instead. +This document provides instructions for building and installing Open vSwitch +RPM packages on a Fedora Linux host. Instructions for the installation of +Open vSwitch on a Fedora Linux host without using RPM packages can be found +in [INSTALL.md]. + +These instructions have been tested with Fedora 23, and are also applicable +for RHEL 7.x and its derivatives, including CentOS 7.x and Scientific Linux +7.x. + +Build Requirements +-- +The tools and packages that are required for building Open vSwitch are +documented in [INSTALL.md]. Specific packages (by package name) include: + + - rpm-build + - autoconf automake libtool + - systemd-units openssl openssl-devel + - python python-twisted-core python-zope-interface PyQt4 python-six + - desktop-file-utils + - groff graphviz + - procps-ng + +And (optionally): + + - libcap-ng libcap-ng-devel + - dpdk-devel + +Building Open vSwitch RPMs for Fedora +- + +RPMs may be built from an Open vSwitch distribution tarball or from an +Open vSwitch Git tree. The build procedure for each scenario is described +below. + +### Preparing to Build Open vSwitch RPMs with a GIT Tree +From the top-level directory of the git tree, execute the following +commands: + +``` +./boot.sh +./configure +``` + +### Preparing to Build Open vSwitch RPMs from a Tarball +From a directory with appropriate permissions, execute the following commands +(substituting the relevant Open vSwitch release version for "x.y.z"): + +``` +tar xzf openvswitch-x.y.z.tar.gz +cd openvswitch-x.y.z +./configure +``` + +### Building the User-Space RPMs +To build Open vSwitch user-space RPMs, after having completed the appropriate +preparation steps described above, execute the following from the directory +in which `./configure` was executed: + +``` +make rpm-fedora +``` + +This will create the RPMs `openvswitch`, `python-openvswitch`, +`openvswitch-test`, `openvswitch-devel`, `openvswitch-ovn`, +and `openvswitch-debuginfo`. + +To enable DPDK support in the openvswitch package, +the `--with dpdk` option can be added: + +``` +make rpm-fedora RPMBUILD_OPT="--with dpdk" +``` + +The above commands automatically run the Open vSwitch unit tests, +which can take several minutes. To reduce the build time by +disabling the execution of these tests, the `--without check` +option can be added: + +``` +make rpm-fedora RPMBUILD_OPT="--without check" +``` + +### Building the Kernel OVS Tree Datapath RPM +To build the Open vSwitch kernel module for the currently running +kernel version, execute: + +``` +make rpm-fedora-kmod +``` + +To build the Open vSwitch kernel module for another kernel version, +the desired kernel version can be specified via the `kversion` macro. +For example: + +``` +make rpm-fedora-kmod \ + RPMBUILD_OPT='-D "kversion 4.3.4-300.fc23.x86_64"' +``` + +Installing Open vSwitch RPMs + +RPM packages can be installed by using the command `rpm -i`. Package +installation requires superuser privileges. + +The openvswitch-kmod RPM should be installed first if the Linux OVS tree datapath +module is to be used. The openvswitch-kmod RPM should not be installed if +only the in-tree Linux datapath or user-space datapath is needed. See [FAQ.md] +for more information about the various Open vSwitch datapath options. -We have tested these instructions with Fedora 23. These instructions are also -used for RHEL 7.x and its derivatives, such as CentOS 7.x. +In most cases only the `openvswitch` and (when using OVN) `openvswitch-ovn` RPMs +will need to be installed. The `python-openvswitch`, `openvswitch-test`, `openvswitch-devel`, +and `openvswitch-debuginfo` RPMs are optional unless required for a specific purpose. -Building Open vSwitch for Fedora - - -You may build from an Open vSwitch distribution tarball or from an -Open vSwitch Git tree. - -The default RPM build directory (_topdir) has five director
Re: [ovs-dev] [PATCH V4] Separation of NB & SB dbs.
> From V3 removed any unnecessary code change and removed cloning of the > running ovsdb configuration database. We should address SSL in a specific > patch without depending on the default database. > > Signed-off-by: Michael Arnaldi > > --- > NEWS | 4 ++ > ovn/northd/ovn-northd.c | 33 ++--- > ovn/utilities/ovn-ctl | 167 +++ +-- [snip for bandwidth] > diff --git a/ovn/utilities/ovn-ctl b/ovn/utilities/ovn-ctl > index b171934..2ad0eb0 100755 > --- a/ovn/utilities/ovn-ctl > +++ b/ovn/utilities/ovn-ctl > @@ -30,32 +30,81 @@ done > ## start ## > ## - ## > > -upgrade_ovn_dbs () { > -ovn_dbs=$(ovs-appctl -t ovsdb-server ovsdb-server/list-dbs 2>/dev/null) > -for db in $ovn_dbs; do > -case $db in > -OVN*) > -action "Removing $db from ovsdb-server" \ > -ovs-appctl -t ovsdb-server ovsdb-server/remove-db $db > -;; > -esac > -done > -upgrade_db "$DB_NB_FILE" "$DB_NB_SCHEMA" > -upgrade_db "$DB_SB_FILE" "$DB_SB_SCHEMA" > -for db in $DB_NB_FILE $DB_SB_FILE; do > -action "Adding $db to ovsdb-server" \ > -ovs-appctl -t ovsdb-server ovsdb-server/add-db $db || exit 1 > -done > +pidfile_is_running () { > +pidfile=$1 > +test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" > +} >/dev/null 2>&1 > + > +stop_ovsdb () { > +if pidfile_is_running $DB_NB_PID; then > +kill -9 $(cat $DB_NB_PID) 1>/dev/null 2>/dev/null > +rm -f $DB_NB_PID 1>/dev/null 2>/dev/null > +fi > + > +if pidfile_is_running $DB_SB_PID; then > +kill -9 $(cat $DB_SB_PID) 1>/dev/null 2>/dev/null > +rm -f $DB_SB_PID 1>/dev/null 2>/dev/null > +fi > +} > + > +start_ovsdb () { > +# Check and eventually start ovsdb-server for Northbound DB > +if ! pidfile_is_running $DB_NB_PID; then > +upgrade_db "$DB_NB_FILE" "$DB_NB_SCHEMA" 1>/dev/null 2>/dev/null > + > +set ovsdb-server > + > +set "$@" --detach $OVN_OVSDB_LOG --remote=punix:$DB_NB_SOCK --remote=ptcp:$DB_NB_PORT --pidfile=$DB_NB_PID OVN_OVSDB_LOG really needs to also be split between the NB and SB processes, otherwise the logs conflate and are unusable > + > +$@ $DB_NB_FILE > +fi > + > +# Check and eventually start ovsdb-server for Southbound DB > +if ! pidfile_is_running $DB_SB_PID; then > +upgrade_db "$DB_SB_FILE" "$DB_SB_SCHEMA" 1>/dev/null 2>/dev/null > + > +set ovsdb-server > + > +set "$@" --detach $OVN_OVSDB_LOG --remote=punix:$DB_SB_SOCK --remote=ptcp:$DB_SB_PORT --pidfile=$DB_SB_PID See above > +$@ $DB_SB_FILE > +fi > +} > + > +status_ovsdb () { > + if ! pidfile_is_running $DB_NB_PID; then > + log_success_msg "OVN Northbound DB is not running" > + else > + log_success_msg "OVN Northbound DB is running" > + fi > + > + if ! pidfile_is_running $DB_SB_PID; then > + log_success_msg "OVN Southbound DB is not running" > + else > + log_success_msg "OVN Southbound DB is running" > + fi > } > > start_northd () { > -# We expect ovn-northd to be co-located with ovsdb-server handling both the > -# OVN_Northbound and OVN_Southbound dbs. > -upgrade_ovn_dbs > + if test X"$OVN_MANAGE_OVSDB" = Xyes; then > + start_ovsdb > + fi > > -set ovn-northd > -set "$@" -vconsole:emer -vsyslog:err -vfile:info > -OVS_RUNDIR=${OVN_RUNDIR} start_daemon "$OVN_NORTHD_PRIORITY" "$OVN_NORTHD_WRAPPER" "$@" > + if ! pidfile_is_running $DB_NB_PID; then > + log_failure_msg "OVN Northbound DB is not running" > + exit > + fi > + if ! pidfile_is_running $DB_SB_PID; then > + log_failure_msg "OVN Southbound DB is not running" > + exit > + fi > + > + if daemon_is_running ovn-northd; then > + log_success_msg "OVN Northbound is already running" > + else > + set ovn-northd > + set "$@" $OVN_NORTHD_LOG --ovnnb-db=unix:$DB_NB_SOCK --ovnsb-db=unix:$DB_SB_SOCK > + OVS_RUNDIR=${OVN_RUNDIR} start_daemon "$OVN_NORTHD_PRIORITY" "$OVN_NORTHD_WRAPPER" "$@" > + fi > } > > start_controller () { [snip] > @@ -90,24 +143,46 @@ restart_controller () { > start_controller > } > > +restart_ovsdb () { > +stop_ovsdb > +start_ovsdb > +} > + > ## ## > ## main ## > ## ## > > set_defaults () { > -DB_SOCK=$rundir/db.sock > -DB_NB_FILE=$dbdir/ovnnb.db > -DB_SB_FILE=$dbdir/ovnsb.db > -DB_NB_SCHEMA=$datadir/ovn-nb.ovsschema > -DB_SB_SCHEMA=$datadir/ovn-sb.ovsschema > - > -OVN_NORTHD_PRIORITY=-10 > -OVN_NORTHD_WRAPPER= > -OVN_CONTROLLER_PRIORITY=-10 > -OVN_CONTROLLER_WRAPPER= > - > -OVS_RUNDIR=${OVS_RUNDIR:-${rundir}} > -OVN_RUNDIR=${OVN_RUNDIR:-${OVS_RUNDIR}} > + OVN_DIR=$rundir > + OVN_MANAGE_OVSDB=yes > + > + DB_NB_SOCK=$OVN_DIR/ovnnb_db.sock > + DB_NB_PID=$OVN_DIR/ovnnb_db.pid > + DB_NB_FILE=$OVN_DIR/ovnnb_db.db > + DB_NB_PORT=6641 > +
[ovs-dev] [poll group RFC 6/6] ovsdb: Change jsonrpc server to make use of poll group
On Linux jsonrpc server now users poll group by default. It can be disabled by using an undocumented --disable-epoll command line options. For ovsdb-server to maintain 1000 idle connections over TCP with the default 5s probe interval, the CPU load dropped from 48% to 14%. Signed-off-by: Andy Zhou --- NEWS | 5 +++- ovsdb/jsonrpc-server.c | 75 ++ ovsdb/jsonrpc-server.h | 2 +- ovsdb/ovsdb-server.c | 43 + tests/ovsdb-server.at | 11 5 files changed, 117 insertions(+), 19 deletions(-) diff --git a/NEWS b/NEWS index 18fca10..e5eeba4 100644 --- a/NEWS +++ b/NEWS @@ -17,7 +17,10 @@ Post-v2.5.0 - ovsdb-server: * Remove max number of sessions limit, to enable connection scaling testing. - + - ovsdb-server: + * On Linux, ovsdb server uses poll group, which uses epoll(7) to + improve connection scaling. Add an undocumented --disable-epoll + switch for testing. v2.5.0 - xx xxx - diff --git a/ovsdb/jsonrpc-server.c b/ovsdb/jsonrpc-server.c index 0d23b77..d2f16c8 100644 --- a/ovsdb/jsonrpc-server.c +++ b/ovsdb/jsonrpc-server.c @@ -28,6 +28,7 @@ #include "ovsdb-error.h" #include "ovsdb-parser.h" #include "ovsdb.h" +#include "poll-group.h" #include "poll-loop.h" #include "reconnect.h" #include "row.h" @@ -103,8 +104,11 @@ struct ovsdb_jsonrpc_server { struct ovsdb_server up; unsigned int n_sessions; struct shash remotes; /* Contains "struct ovsdb_jsonrpc_remote *"s. */ +struct poll_group *poll_group; /* group poll jsonrpc sessions. */ }; +static void ovsdb_jsonrpc_server_run_poll_group(struct poll_group *); + /* A configured remote. This is either a passive stream listener plus a list * of the currently connected sessions, or a list of exactly one active * session. */ @@ -124,13 +128,24 @@ static void ovsdb_jsonrpc_server_del_remote(struct shash_node *); /* Creates and returns a new server to provide JSON-RPC access to an OVSDB. * * The caller must call ovsdb_jsonrpc_server_add_db() for each database to - * which 'server' should provide access. */ + * which 'server' should provide access. + * + * On platforms where epoll() is supported, using epoll() can be more + * efficient than using poll(), and it will be enabled by default. + * To disable it, set 'disable_epoll to false. + */ struct ovsdb_jsonrpc_server * -ovsdb_jsonrpc_server_create(void) +ovsdb_jsonrpc_server_create(bool disable_epoll) { struct ovsdb_jsonrpc_server *server = xzalloc(sizeof *server); +struct poll_group *poll_group = NULL; + ovsdb_server_init(&server->up); shash_init(&server->remotes); +if (!disable_epoll) { +poll_group = poll_group_create("jsonrpc-sessions-pg"); +} +server->poll_group = poll_group; return server; } @@ -318,6 +333,7 @@ void ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr) { struct shash_node *node; +struct poll_group *poll_group = svr->poll_group; SHASH_FOR_EACH (node, &svr->remotes) { struct ovsdb_jsonrpc_remote *remote = node->data; @@ -329,10 +345,13 @@ ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr) error = pstream_accept(remote->listener, &stream); if (!error) { struct jsonrpc_session *js; +struct ovsdb_jsonrpc_session *ovsdb_js; js = jsonrpc_session_open_unreliably(jsonrpc_open(stream), remote->dscp); -ovsdb_jsonrpc_session_create(remote, js); -} else if (error != EAGAIN) { +ovsdb_js = ovsdb_jsonrpc_session_create(remote, js); +stream_join(stream, svr->poll_group, ovsdb_js); +} +else if (error != EAGAIN) { VLOG_WARN_RL(&rl, "%s: accept failed: %s", pstream_get_name(remote->listener), ovs_strerror(error)); @@ -341,11 +360,16 @@ ovsdb_jsonrpc_server_run(struct ovsdb_jsonrpc_server *svr) ovsdb_jsonrpc_session_run_all(remote); } + +if (poll_group) { +ovsdb_jsonrpc_server_run_poll_group(poll_group); +} } void ovsdb_jsonrpc_server_wait(struct ovsdb_jsonrpc_server *svr) { +struct poll_group *poll_group = svr->poll_group; struct shash_node *node; SHASH_FOR_EACH (node, &svr->remotes) { @@ -357,6 +381,7 @@ ovsdb_jsonrpc_server_wait(struct ovsdb_jsonrpc_server *svr) ovsdb_jsonrpc_session_wait_all(remote); } +poll_group_poll_wait(poll_group); } /* Adds some memory usage statistics for 'svr' into 'usage', for use with @@ -491,6 +516,27 @@ ovsdb_jsonrpc_session_run_all(struct ovsdb_jsonrpc_remote *remote) struct ovsdb_jsonrpc_session *s, *next; LIST_FOR_EACH_SAFE (s, next, node, &remote->sessions) { +if (poll_block_waken_by_timer() || +
[ovs-dev] [poll group RFC 0/6] Poll group
Introduce a new 'poll group' feature to improve the efficiency of poll loop in dealing with large number of connections, and its first application to improve ovsdb-server to scale up connections. Patch 6/6 commit log as some performance numbers. They are copied here for reference. For ovsdb-server to maintain 1000 idle connections over TCP with the default 5s probe interval, the CPU load droped from 48% to 14%. Andy Zhou (6): lib: Introduce poll group provider class lib: Add epoll poll group implementation for the Linux platform lib: Update stream class to support poll group. lib: Add poll group support in jsonrpc library. lib: add poll_block_waken_by_timer ovsdb: Change jsonrpc server to make use of poll group NEWS | 5 +- lib/automake.mk | 4 + lib/jsonrpc.c | 74 ++- lib/jsonrpc.h | 7 + lib/poll-group-epoll.c| 321 ++ lib/poll-group-provider.h | 156 ++ lib/poll-group.c | 224 lib/poll-group.h | 51 lib/poll-loop.c | 11 ++ lib/poll-loop.h | 3 + lib/stream-fd.c | 32 + lib/stream-provider.h | 17 +++ lib/stream-ssl.c | 40 +- lib/stream-tcp.c | 3 + lib/stream-unix.c | 3 + lib/stream.c | 83 +++- lib/stream.h | 5 + ovsdb/jsonrpc-server.c| 75 ++- ovsdb/jsonrpc-server.h| 2 +- ovsdb/ovsdb-server.c | 43 +-- tests/ovsdb-server.at | 11 ++ 21 files changed, 1145 insertions(+), 25 deletions(-) create mode 100644 lib/poll-group-epoll.c create mode 100644 lib/poll-group-provider.h create mode 100644 lib/poll-group.c create mode 100644 lib/poll-group.h -- 1.9.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [poll group RFC 5/6] lib: add poll_block_waken_by_timer
--- lib/poll-loop.c | 11 +++ lib/poll-loop.h | 3 +++ 2 files changed, 14 insertions(+) diff --git a/lib/poll-loop.c b/lib/poll-loop.c index e83d989..63c7a42 100644 --- a/lib/poll-loop.c +++ b/lib/poll-loop.c @@ -53,6 +53,7 @@ struct poll_loop { * wake up immediately, or LLONG_MAX to wait forever. */ long long int timeout_when; /* In msecs as returned by time_msec(). */ const char *timeout_where; /* Where 'timeout_when' was set. */ +bool waken_by_timer; }; static struct poll_loop *poll_loop(void); @@ -365,11 +366,14 @@ poll_block(void) loop->timeout_when, &elapsed); if (retval < 0) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); +loop->waken_by_timer = false; VLOG_ERR_RL(&rl, "poll: %s", ovs_strerror(-retval)); } else if (!retval) { +loop->waken_by_timer = true; log_wakeup(loop->timeout_where, NULL, elapsed); } else if (get_cpu_usage() > 50 || VLOG_IS_DBG_ENABLED()) { i = 0; +loop->waken_by_timer = false; HMAP_FOR_EACH (node, hmap_node, &loop->poll_nodes) { if (pollfds[i].revents) { log_wakeup(node->where, &pollfds[i], 0); @@ -389,6 +393,13 @@ poll_block(void) seq_woke(); } + +bool +poll_block_waken_by_timer(void) +{ +return poll_loop()->waken_by_timer; +} + static void free_poll_loop(void *loop_) diff --git a/lib/poll-loop.h b/lib/poll-loop.h index 01e1aa8..b0a4cff 100644 --- a/lib/poll-loop.h +++ b/lib/poll-loop.h @@ -70,6 +70,9 @@ void poll_immediate_wake_at(const char *where); /* Wait until an event occurs. */ void poll_block(void); +/* Return true if the last poll block was waken by timer*/ +bool poll_block_waken_by_timer(void); + #ifdef __cplusplus } #endif -- 1.9.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [poll group RFC 4/6] lib: Add poll group support in jsonrpc library.
Signed-off-by: Andy Zhou --- lib/jsonrpc.c | 74 +++ lib/jsonrpc.h | 7 ++ 2 files changed, 77 insertions(+), 4 deletions(-) diff --git a/lib/jsonrpc.c b/lib/jsonrpc.c index 35428a6..e8ea8b4 100644 --- a/lib/jsonrpc.c +++ b/lib/jsonrpc.c @@ -138,11 +138,15 @@ jsonrpc_run(struct jsonrpc *rpc) } /* Arranges for the poll loop to wake up when 'rpc' needs to perform - * maintenance activities. */ + * maintenance activities. + * + * This function should not be called when 'rpc' has joined a poll + * group. Use poll_group_get_events() instead. */ void jsonrpc_wait(struct jsonrpc *rpc) { if (!rpc->status) { +ovs_assert(!stream_joined(rpc->stream)); stream_run_wait(rpc->stream); if (!list_is_empty(&rpc->output)) { stream_send_wait(rpc->stream); @@ -353,11 +357,36 @@ jsonrpc_recv(struct jsonrpc *rpc, struct jsonrpc_msg **msgp) return EAGAIN; } +bool +jsonrpc_joined_poll_group(const struct jsonrpc *rpc) +{ +return rpc->status ? false : stream_joined(rpc->stream); +} + +bool +jsonrpc_has_pending_input(const struct jsonrpc *rpc) +{ +return rpc->status ? false : !byteq_is_empty(&rpc->input); +} + +void +jsonrpc_poll_group_update(struct jsonrpc *rpc, bool write) +{ +if (!rpc->status) { +ovs_assert(stream_joined(rpc->stream)); +stream_update(rpc->stream, write); +} +} + /* Causes the poll loop to wake up when jsonrpc_recv() may return a value other * than EAGAIN. */ void jsonrpc_recv_wait(struct jsonrpc *rpc) { +if (!rpc->status) { +ovs_assert(!stream_joined(rpc->stream)); +} + if (rpc->status || !byteq_is_empty(&rpc->input)) { poll_immediate_wake_at(rpc->name); } else { @@ -377,6 +406,8 @@ jsonrpc_send_block(struct jsonrpc *rpc, struct jsonrpc_msg *msg) fatal_signal_run(); +ovs_assert(!stream_joined(rpc->stream)); + error = jsonrpc_send(rpc, msg); if (error) { return error; @@ -397,6 +428,8 @@ jsonrpc_send_block(struct jsonrpc *rpc, struct jsonrpc_msg *msg) int jsonrpc_recv_block(struct jsonrpc *rpc, struct jsonrpc_msg **msgp) { +ovs_assert(!stream_joined(rpc->stream)); + for (;;) { int error = jsonrpc_recv(rpc, msgp); if (error != EAGAIN) { @@ -827,7 +860,6 @@ jsonrpc_session_open_unreliably(struct jsonrpc *jsonrpc, uint8_t dscp) s->stream = NULL; s->pstream = NULL; s->seqno = 0; - return s; } @@ -976,12 +1008,26 @@ jsonrpc_session_run(struct jsonrpc_session *s) } } +/* Only wait for stream within the 's'. Poll group does not + * handle pstream, and stream's initial connection, these + * are still using poll loop. */ void jsonrpc_session_wait(struct jsonrpc_session *s) { +/* When s->rpc is set, The jsonrpc session is in connected + * state. Check if 's' has already registered with a poll group. + * + * s->stream is set when the stream is not in a connected state + * continue to let poll loop handle it. + * + * poll group currently does not work with pstream. Let + * poll loop handle all pstreams. */ if (s->rpc) { -jsonrpc_wait(s->rpc); -} else if (s->stream) { + if (!jsonrpc_joined_poll_group(s->rpc)) { +jsonrpc_wait(s->rpc); +} +} +if (s->stream) { stream_run_wait(s->stream); stream_connect_wait(s->stream); } @@ -1055,6 +1101,26 @@ jsonrpc_session_recv(struct jsonrpc_session *s) return NULL; } +bool +jsonrpc_session_joined_poll_group(const struct jsonrpc_session *s) +{ +return s->rpc ? jsonrpc_joined_poll_group(s->rpc) : false; +} + +bool +jsonrpc_session_has_pending_input(const struct jsonrpc_session *s) +{ +return s->rpc ? jsonrpc_has_pending_input(s->rpc) : false; +} + +void +jsonrpc_session_poll_group_update(struct jsonrpc_session *s, bool write) +{ +if (s->rpc) { +jsonrpc_poll_group_update(s->rpc, write); +} +} + void jsonrpc_session_recv_wait(struct jsonrpc_session *s) { diff --git a/lib/jsonrpc.h b/lib/jsonrpc.h index 5f46e3b..c8e6690 100644 --- a/lib/jsonrpc.h +++ b/lib/jsonrpc.h @@ -53,6 +53,10 @@ size_t jsonrpc_get_backlog(const struct jsonrpc *); unsigned int jsonrpc_get_received_bytes(const struct jsonrpc *); const char *jsonrpc_get_name(const struct jsonrpc *); +bool jsonrpc_joined_poll_group(const struct jsonrpc *); +bool jsonrpc_has_pending_input(const struct jsonrpc *); +void jsonrpc_poll_group_update(struct jsonrpc *, bool); + int jsonrpc_send(struct jsonrpc *, struct jsonrpc_msg *); int jsonrpc_recv(struct jsonrpc *, struct jsonrpc_msg **); void jsonrpc_recv_wait(struct jsonrpc *); @@ -107,6 +111,9 @@ void jsonrpc_session_run(struct jsonrpc_session *); void jsonrpc_session_wait(struct jsonrpc_session *); size_t jsonrpc_session_get_backlog(const struct jsonrpc_session *); +bool jsonrpc_session_joined_poll_group(const struct jsonrpc_session *); +bool jso
[ovs-dev] [poll group RFC 1/6] lib: Introduce poll group provider class
Poll group is a new poll class that sits between application and the stream class. Poll group compliments the poll loop facility and the stream class to make main loop more efficient when dealing with large number of current connections. See comments in poll-group-provider.h for more details. Signed-off-by: Andy Zhou --- lib/automake.mk | 3 + lib/poll-group-provider.h | 152 +++ lib/poll-group.c | 221 ++ lib/poll-group.h | 51 +++ 4 files changed, 427 insertions(+) create mode 100644 lib/poll-group-provider.h create mode 100644 lib/poll-group.c create mode 100644 lib/poll-group.h diff --git a/lib/automake.mk b/lib/automake.mk index 27a1669..dfda1bf 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -197,6 +197,9 @@ lib_libopenvswitch_la_SOURCES = \ lib/perf-counter.c \ lib/pktbuf.c \ lib/pktbuf.h \ + lib/poll-group.c \ + lib/poll-group.h \ + lib/poll-group-provider.h \ lib/poll-loop.c \ lib/poll-loop.h \ lib/process.c \ diff --git a/lib/poll-group-provider.h b/lib/poll-group-provider.h new file mode 100644 index 000..a8bfe63 --- /dev/null +++ b/lib/poll-group-provider.h @@ -0,0 +1,152 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef POLL_GROUP_PROVIDER_H +#define POLL_GROUP_PROVIDER_H 1 + +#include +#include +#include "util.h" + +struct poll_group; +/* + * Poll group: An application aware poll interface + * + * Poll group implements a higher level interface than those offered by + * poll loop and aims to improve efficiency with high number concurrent + * connections. + * + * Poll group sits between the application and the stream layer. Same kind + * of application sessions can ask their individual streams to join + * a common poll group, which is created by the application. + * In addition, the application session is required to provide an + * unique pointer that is only associated with the session, called + * 'caller_event' when joining a poll group. Each poll group then + * in term registers with the main poll loop. + * + * When poll_block() wakes up, each poll group delivers only a set of + * 'caller_event's that correspond to the sessions that are responsible + * for poll_block waken up. Thus, the main loop can avoid interrogate all + * sessions which is required when using poll loop directly. + * + * On Linux, poll group maps well into the epoll(7) facility. Each + * poll group creates an file descriptor that is created with epoll_create(2) + * system call. Epoll implementation provides additional performance + * benefits by aggregate multiple system calls into a single one. + * + * Poll group is designed to be closely integrated with the stream class. + * An newly created application session can join a poll group at any time + * during its life cycle. Internally, a stream only turns on 'poll group' + * when a stream is fully connected. Stream connection set up and tear down + * is still managed by the poll loop, but those mode switches are managed + * internally with the stream class, transparent to the application. + * + * An application session can always leave poll group at any time, or + * simply never join one. In both cases, the stream will fall back to + * use poll loop directly. + */ + +struct poll_group_class { +/* Prefix of a poll group name, e.g. "default", "epoll" . */ +const char *name; + +/* The following APIs are required to be implemented in any + * class. + */ + +/* Create a new poll group. */ +struct poll_group *(*create)(const char *name); + +/* Shut down a poll group, release memory and other resources. */ +void (*close)(struct poll_group *group); + +/* 'join' allows a new 'fd' to join a poll group. 'caller_event' are user + * provided pointer that is opaque to poll group. Those pointers + * are passed back to the user when poll_group_get_events() are + * called. */ +int (*join)(struct poll_group *group, int fd, void *caller_event); + +/* 'updates' allows application to inform whether the 'fd' is + * interested to be waken up when 'fd' is ready for transmission. + */ +int (*update)(struct poll_group *group, int fd, bool write, + void *caller_event); + +/* 'leave' is the opposite of 'join'. 'fd' will no longer be watched +
Re: [ovs-dev] [PATCHv2 0/6] Loosen tunnel device MTU constraints.
On 19 February 2016 at 11:05, Jesse Gross wrote: > On Fri, Feb 19, 2016 at 10:44 AM, Joe Stringer wrote: >> Since the commit e23775f20e1a ("datapath: Add support for lwtunnel"), >> devices underlying each tunnel type have applied quite strict restrictions >> on the MTU which can be set for the tunnel device, even if the underlying >> physical device has no such restriction. This made it impossible to, for >> instance, send jumbo frames over a tunnel which should otherwise support it. >> >> This series loosens the restrictions for these tunnel devices and sets the >> default MTU to a very high value, meaning that in practice they will >> handle any size packet that the underlying devices support sending. > > Acked-by: Jesse Gross Thanks for the review, I pushed the series to master and branch-2.5. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [poll group RFC 2/6] lib: Add epoll poll group implementation for the Linux platform
Add the first poll group implementation. Signed-off-by: Andy Zhou --- lib/automake.mk | 1 + lib/poll-group-epoll.c| 321 ++ lib/poll-group-provider.h | 4 + lib/poll-group.c | 9 +- 4 files changed, 332 insertions(+), 3 deletions(-) create mode 100644 lib/poll-group-epoll.c diff --git a/lib/automake.mk b/lib/automake.mk index dfda1bf..e9eaf03 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -350,6 +350,7 @@ lib_libopenvswitch_la_SOURCES += \ lib/dpif-netlink.h \ lib/if-notifier.c \ lib/if-notifier.h \ + lib/poll-group-epoll.c \ lib/netdev-linux.c \ lib/netdev-linux.h \ lib/netlink-conntrack.c \ diff --git a/lib/poll-group-epoll.c b/lib/poll-group-epoll.c new file mode 100644 index 000..f842708 --- /dev/null +++ b/lib/poll-group-epoll.c @@ -0,0 +1,321 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include +#include +#include +#include "openvswitch/vlog.h" +#include "hash.h" +#include "hmap.h" +#include "poll-group.h" +#include "poll-group-provider.h" +#include "poll-loop.h" +#include "util.h" + +VLOG_DEFINE_THIS_MODULE(epoll_group); + + +struct epoll_group; +struct fd_node { +struct hmap_node hmap_node; +int fd; +struct stream *stream;/* Stream associated with the fd */ +struct poll_group *poll_group; + +/* events that was last set using epoll_ctl(), called from + * epoll_group_get_events(). */ +struct epoll_event current; + +/* epoll_group_join() call will initially create this node, + * the event.data.ptr will be set to 'caller_event'. This + * value can not be changed for the life of this node. + * + * The initial value of event.events will be set to EPOLLIN. + * It can be changed to EPOLLIN | EPOLLOUT if epoll_group_update() + * is called with 'write' set to true. There is no way to + * remove EPOLLIN flags. + * + * epoll_group_update() can be called multiple times to update the + * same node. The union of all updates will take effect when + * when epoll_group_get_events() is called. If applied, + * the 'current' will also be update to reflect the new setting, + * while 'event' will be reset to the creation state, ready to + * accumulate the next batch of epoll_group_update() calls. + */ +struct epoll_event event; +}; + +static struct fd_node *find_fd_node(struct epoll_group *, int fd); +static struct fd_node *create_fd_node(struct epoll_group *, int, void *); +static void find_and_delete_fd_node_assert(struct epoll_group *, int fd); + +struct epoll_group { +struct poll_group up; +int epoll_fd; + +/* Maintain a buffer used for epoll_wait() call */ +struct epoll_event *epoll_events; +size_t n_allocated; + +/* Stores fds that have joined. */ +struct hmap fd_nodes; +}; + + +static struct epoll_group * +epoll_group_cast(struct poll_group *group) +{ +poll_group_assert_class(group, &epoll_group_class); +return CONTAINER_OF(group, struct epoll_group, up); +} + +static struct poll_group * +epoll_group_create(const char *name) +{ +int epoll_fd = epoll_create(10); + +if (epoll_fd == -1) { +VLOG_ERR("epoll_create: %s", ovs_strerror(errno)); +return NULL; +} + +struct epoll_group *group; +group = xmalloc(sizeof *group); +poll_group_init(&group->up, name, &epoll_group_class); + +group->epoll_fd = epoll_fd; +group->epoll_events = NULL; +group->n_allocated = 0; + +hmap_init(&group->fd_nodes); +return &group->up; +} + +static void +epoll_group_close(struct poll_group *group_) +{ +struct epoll_group *group = epoll_group_cast(group_); +int retval; + +retval = close(group->epoll_fd); +if (retval == -1) { +VLOG_ERR("close: %s", ovs_strerror(errno)); +} +free(group->epoll_events); + +hmap_destroy(&group->fd_nodes); +} + +static int +epoll_group_join(struct poll_group *group_, int fd, void *caller_event) +{ +struct epoll_group *group = epoll_group_cast(group_); +size_t n_joined = group_->n_joined; +struct fd_node *node; +int retval; + +ovs_assert(!find_fd_node(group, fd)); +node = create_fd_node(group, fd, caller_event); + +retval = epoll_ctl(group->epoll_fd, EPOLL_CTL_ADD, fd, &node->current); +if (retval == -1) { +VLOG_ERR
[ovs-dev] [poll group RFC 3/6] lib: Update stream class to support poll group.
Add new APIs in stream class that works with poll group. Signed-off-by: Andy Zhou --- lib/stream-fd.c | 32 lib/stream-provider.h | 17 +++ lib/stream-ssl.c | 40 - lib/stream-tcp.c | 3 ++ lib/stream-unix.c | 3 ++ lib/stream.c | 83 ++- lib/stream.h | 5 7 files changed, 181 insertions(+), 2 deletions(-) diff --git a/lib/stream-fd.c b/lib/stream-fd.c index 31bfc6e..496a517 100644 --- a/lib/stream-fd.c +++ b/lib/stream-fd.c @@ -28,6 +28,7 @@ #include "socket-util.h" #include "util.h" #include "stream-provider.h" +#include "poll-group.h" #include "stream.h" #include "openvswitch/vlog.h" @@ -160,6 +161,34 @@ fd_wait(struct stream *stream, enum stream_wait_type wait) } } +static int +fd_join(struct stream *stream) +{ +struct stream_fd *s = stream_fd_cast(stream); +struct poll_group *group = stream->poll_group; +void *caller_event = stream->caller_event; + +return poll_group_join(group, s->fd, caller_event); +} + +static int +fd_update(struct stream *stream, bool write) +{ +struct stream_fd *s = stream_fd_cast(stream); +struct poll_group *group = stream->poll_group; + +return poll_group_update(group, s->fd, write, stream->caller_event); +} + +static int +fd_leave(struct stream *stream) +{ +struct stream_fd *s = stream_fd_cast(stream); +struct poll_group *group = stream->poll_group; + +return poll_group_leave(group, s->fd); +} + static const struct stream_class stream_fd_class = { "fd", /* name */ false, /* needs_probes */ @@ -171,6 +200,9 @@ static const struct stream_class stream_fd_class = { NULL, /* run */ NULL, /* run_wait */ fd_wait,/* wait */ +fd_join,/* join */ +fd_update, /* update */ +fd_leave, /* leave */ }; /* Passive file descriptor stream. */ diff --git a/lib/stream-provider.h b/lib/stream-provider.h index 2226a80..74dad08 100644 --- a/lib/stream-provider.h +++ b/lib/stream-provider.h @@ -20,6 +20,7 @@ #include #include "stream.h" +struct poll_group; /* Active stream connection. */ /* Active stream connection. @@ -30,6 +31,11 @@ struct stream { int state; int error; char *name; + +/* poll_group related states. */ +struct poll_group *poll_group; +void *caller_event; +bool joined; }; void stream_init(struct stream *, const struct stream_class *, @@ -123,6 +129,17 @@ struct stream_class { /* Arranges for the poll loop to wake up when 'stream' is ready to take an * action of the given 'type'. */ void (*wait)(struct stream *stream, enum stream_wait_type type); + +/* Arranges for 'stream' to join poll group when it is connected. */ +int (*join)(struct stream *stream); + +/* Let 'stream' inform pull group about its interest to be waken up + * by tx_ready. */ +int (*update)(struct stream *stream, bool write); + +/* Disconnects stream from poll group. A disconnected stream will fall + * back to use poll loop directly. */ +int (*leave)(struct stream *stream); }; /* Passive listener for incoming stream connections. diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c index 2699633..88dea03 100644 --- a/lib/stream-ssl.c +++ b/lib/stream-ssl.c @@ -38,6 +38,7 @@ #include "ofpbuf.h" #include "openflow/openflow.h" #include "packets.h" +#include "poll-group.h" #include "poll-loop.h" #include "shash.h" #include "socket-util.h" @@ -670,7 +671,6 @@ ssl_send(struct stream *stream, const void *buffer, size_t n) case EAGAIN: return n; default: -sslv->txbuf = NULL; return -error; } } @@ -684,6 +684,12 @@ ssl_run(struct stream *stream) if (sslv->txbuf && ssl_do_tx(stream) != EAGAIN) { ssl_clear_txbuf(sslv); } + +if (sslv->tx_want != SSL_NOTHING) { +if (want_to_poll_events(sslv->tx_want) == SSL_WRITING) { +stream_update(stream, true); +} +} } static void @@ -747,6 +753,35 @@ ssl_wait(struct stream *stream, enum stream_wait_type wait) } } +static int +ssl_join(struct stream *stream) +{ +struct ssl_stream *sslv = ssl_stream_cast(stream); +struct poll_group *group = stream->poll_group; +void *caller_event = stream->caller_event; + +return poll_group_join(group, sslv->fd, caller_event); +} + +static int +ssl_update(struct stream *stream, bool write) +{ +struct ssl_stream *sslv = ssl_stream_cast(stream); +struct poll_group *group = stream->poll_group; +void *caller_event = stream->caller_event; + +return poll_group_update(group, sslv->fd, write, caller_event); +} + +static int +ssl_leave(struct stream *stream) +{ +struct ssl_stream *sslv = ssl_stream_c
Re: [ovs-dev] [poll group RFC 0/6] poll group RFC
On Thu, Feb 18, 2016 at 8:20 PM, Andy Zhou wrote: > This is the new 'poll group' feature I have been working on improve > the efficiency in dealing with large number of connections, and its > first application to improve ovsdb-sever. > > There is a known limitation: > > The probing feature does not work properly. Poll group avoids > waking up a connection if there is no active I/O going on. An inactive > session won't send out probes. Timed wake up currently bypass those > connections per design. I can think of a few options to improve / restore > this feature if necessary. In the mean time, the TCP keep alive option, > suggested on OVN IRC channel also looks attractive. > I have since figured out what the problem was. Poll_block() can be waken either by fd events or by time out. Sessions joined poll group should still be visited if it was time out. I have reposted the series with the fix. IMHO, TCP keep alive may still be worth looking into. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] ofp-msgs: Move most OpenFlow header definitions here.
Thanks for the reviews. I applied these to master. On Thu, Feb 18, 2016 at 10:47:06AM -0800, Jarno Rajahalme wrote: > Acked-by: Jarno Rajahalme > > > On Jan 27, 2016, at 3:50 PM, Ben Pfaff wrote: > > > > This code was the only user for OpenFlow header definitions other than > > struct ofp_header itself. > > > > Signed-off-by: Ben Pfaff > > --- > > include/openflow/nicira-ext.h | 9 - > > include/openflow/openflow-1.0.h | 20 > > include/openflow/openflow-1.1.h | 11 +-- > > lib/ofp-msgs.c | 39 +++ > > 4 files changed, 40 insertions(+), 39 deletions(-) > > > > diff --git a/include/openflow/nicira-ext.h b/include/openflow/nicira-ext.h > > index fdee330..a209e6a 100644 > > --- a/include/openflow/nicira-ext.h > > +++ b/include/openflow/nicira-ext.h > > @@ -68,15 +68,6 @@ struct nx_vendor_error { > > > > /* Nicira vendor requests and replies. */ > > > > -/* Header for Nicira vendor stats request and reply messages in OpenFlow > > - * 1.0. */ > > -struct nicira10_stats_msg { > > -struct ofp10_vendor_stats_msg vsm; /* Vendor NX_VENDOR_ID. */ > > -ovs_be32 subtype; /* One of NXST_* below. */ > > -uint8_t pad[4]; /* Align to 64-bits. */ > > -}; > > -OFP_ASSERT(sizeof(struct nicira10_stats_msg) == 24); > > - > > /* Fields to use when hashing flows. */ > > enum nx_hash_fields { > > /* Ethernet source address (NXM_OF_ETH_SRC) only. */ > > diff --git a/include/openflow/openflow-1.0.h > > b/include/openflow/openflow-1.0.h > > index db629f7..68c7952 100644 > > --- a/include/openflow/openflow-1.0.h > > +++ b/include/openflow/openflow-1.0.h > > @@ -331,15 +331,6 @@ struct ofp10_flow_removed { > > }; > > OFP_ASSERT(sizeof(struct ofp10_flow_removed) == 80); > > > > -/* Statistics request or reply message. */ > > -struct ofp10_stats_msg { > > -struct ofp_header header; > > -ovs_be16 type; /* One of the OFPST_* constants. */ > > -ovs_be16 flags; /* Requests: always 0. > > - * Replies: 0 or OFPSF_REPLY_MORE. */ > > -}; > > -OFP_ASSERT(sizeof(struct ofp10_stats_msg) == 12); > > - > > /* Stats request of type OFPST_AGGREGATE or OFPST_FLOW. */ > > struct ofp10_flow_stats_request { > > struct ofp10_match match; /* Fields to match. */ > > @@ -444,15 +435,4 @@ struct ofp10_queue_stats { > > }; > > OFP_ASSERT(sizeof(struct ofp10_queue_stats) == 32); > > > > -/* Vendor extension stats message. */ > > -struct ofp10_vendor_stats_msg { > > -struct ofp10_stats_msg osm; /* Type OFPST_VENDOR. */ > > -ovs_be32 vendor;/* Vendor ID: > > - * - MSB 0: low-order bytes are IEEE OUI. > > - * - MSB != 0: defined by OpenFlow > > - * consortium. */ > > -/* Followed by vendor-defined arbitrary additional data. */ > > -}; > > -OFP_ASSERT(sizeof(struct ofp10_vendor_stats_msg) == 16); > > - > > #endif /* openflow/openflow-1.0.h */ > > diff --git a/include/openflow/openflow-1.1.h > > b/include/openflow/openflow-1.1.h > > index 63d8b41..805f222 100644 > > --- a/include/openflow/openflow-1.1.h > > +++ b/include/openflow/openflow-1.1.h > > @@ -1,4 +1,4 @@ > > -/* Copyright (c) 2008, 2011, 2012, 2013, 2014 The Board of Trustees of The > > Leland Stanford > > +/* Copyright (c) 2008, 2011, 2012, 2013, 2014, 2016 The Board of Trustees > > of The Leland Stanford > > * Junior University > > * > > * We are making the OpenFlow specification and associated documentation > > @@ -382,15 +382,6 @@ struct ofp11_queue_get_config_reply { > > }; > > OFP_ASSERT(sizeof(struct ofp11_queue_get_config_reply) == 8); > > > > -struct ofp11_stats_msg { > > -struct ofp_header header; > > -ovs_be16 type; /* One of the OFPST_* constants. */ > > -ovs_be16 flags; /* OFPSF_REQ_* flags (none yet defined). */ > > -uint8_t pad[4]; > > -/* Followed by the body of the request. */ > > -}; > > -OFP_ASSERT(sizeof(struct ofp11_stats_msg) == 16); > > - > > /* Stats request of type OFPST_FLOW. */ > > struct ofp11_flow_stats_request { > > uint8_t table_id; /* ID of table to read (from ofp_table_stats), > > diff --git a/lib/ofp-msgs.c b/lib/ofp-msgs.c > > index fcfbb50..bc38f40 100644 > > --- a/lib/ofp-msgs.c > > +++ b/lib/ofp-msgs.c > > @@ -52,6 +52,35 @@ struct ofp_vendor_header { > > }; > > OFP_ASSERT(sizeof(struct ofp_vendor_header) == 16); > > > > +/* Statistics request or reply message. */ > > +struct ofp10_stats_msg { > > +struct ofp_header header; > > +ovs_be16 type; /* One of the OFPST_* constants. */ > > +ovs_be16 flags; /* Requests: always 0. > > + * Replies: 0 or OFPSF_REPLY_MORE. */ > > +}; > > +OFP_ASSERT(sizeof(struct ofp10_stats_msg) == 12); > > + > > +/* Vendor extension stats message. */ > > +struct ofp10_vendor_stats_msg {
Re: [ovs-dev] [poll group RFC 0/6] Poll group
The patches are also available at github. https://github.com/azhou-nicira/ovs-review/tree/poll-group On Fri, Feb 19, 2016 at 12:40 PM, Andy Zhou wrote: > Introduce a new 'poll group' feature to improve the efficiency of poll > loop in dealing with large number of connections, and its first application > to improve ovsdb-server to scale up connections. > > Patch 6/6 commit log as some performance numbers. They are copied here > for reference. > >For ovsdb-server to maintain 1000 idle connections over TCP with the >default 5s probe interval, the CPU load droped from 48% to 14%. > > Andy Zhou (6): > lib: Introduce poll group provider class > lib: Add epoll poll group implementation for the Linux platform > lib: Update stream class to support poll group. > lib: Add poll group support in jsonrpc library. > lib: add poll_block_waken_by_timer > ovsdb: Change jsonrpc server to make use of poll group > > NEWS | 5 +- > lib/automake.mk | 4 + > lib/jsonrpc.c | 74 ++- > lib/jsonrpc.h | 7 + > lib/poll-group-epoll.c| 321 > ++ > lib/poll-group-provider.h | 156 ++ > lib/poll-group.c | 224 > lib/poll-group.h | 51 > lib/poll-loop.c | 11 ++ > lib/poll-loop.h | 3 + > lib/stream-fd.c | 32 + > lib/stream-provider.h | 17 +++ > lib/stream-ssl.c | 40 +- > lib/stream-tcp.c | 3 + > lib/stream-unix.c | 3 + > lib/stream.c | 83 +++- > lib/stream.h | 5 + > ovsdb/jsonrpc-server.c| 75 ++- > ovsdb/jsonrpc-server.h| 2 +- > ovsdb/ovsdb-server.c | 43 +-- > tests/ovsdb-server.at | 11 ++ > 21 files changed, 1145 insertions(+), 25 deletions(-) > create mode 100644 lib/poll-group-epoll.c > create mode 100644 lib/poll-group-provider.h > create mode 100644 lib/poll-group.c > create mode 100644 lib/poll-group.h > > -- > 1.9.1 > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/3] ofproto-dpif-xlate: Don't consider mirrors used when excluded by VLAN.
On Mon, Feb 08, 2016 at 11:14:23AM -0800, Jarno Rajahalme wrote: > > > On Feb 5, 2016, at 7:27 PM, Ben Pfaff wrote: > > > > On Fri, Feb 05, 2016 at 04:41:29PM -0800, Jarno Rajahalme wrote: > >> > >>> On Feb 5, 2016, at 3:30 PM, Ben Pfaff wrote: > >>> > >>> Mirroring is supposed to happen at most once for any destination on a > >>> given > >>> packet, so the implementation keeps track of which mirrors have already > >>> been used. However, until this commit it did that incorrectly: it > >>> considered a mirror "used" even if it had been rejected on the basis of > >>> VLAN. This commit fixes the problem. > >> > >> So even if a mirror has been rejected on the basis of a VLAN, it > >> should still be considered for output (later)? Can you describe a > >> scenario where this makes a difference? E.g., is there a case where a > >> packet is not sent when it should have been sent, or did we get > >> duplicate mirroring due to this, as tested against by the new test > >> case? > > > > I think the best way to answer this is to rewrite the commit message. > > Here's a new version. It also adds more comments to the code. > > > > Ben, > > Thanks for taking the time to explain this in detail. I guess we must have > the documentary piece of the commit message in a document somewhere, but I > had missed that so was not aware of it in this level of detail. It makes all > sense now, > > Acked-by: Jarno Rajahalme Thanks for the reviews. I applied these patches to master and branch-2.5. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC v2] netdev-dpdk: vhost-user: Fix sending packets to queues not enabled by guest.
Would it be safe to assume that the enabled queues are sequential? In this case we could just play with 'real_n_txq' instead of keeping a mapping and the patch would be simpler. I'm not sure if that's a reasonable assumption though. Also, on my system, with qemu-2.5.0, vring_state_changed() is called after new_device(), leading to the following scenario: 2016-02-19T19:04:13.617Z|1|dpdk(vhost_thread2)|DBG|TX queue mapping for /home/daniele/ovs/_run/run/dv0 2016-02-19T19:04:13.618Z|2|dpdk(vhost_thread2)|DBG| 0 --> 0 2016-02-19T19:04:13.618Z|3|dpdk(vhost_thread2)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/home/daniele/ovs/_run/run/dv0' 0 changed to 'enabled' 2016-02-19T19:04:13.618Z|4|dpdk(vhost_thread2)|DBG|TX queue mapping for /home/daniele/ovs/_run/run/dv0 2016-02-19T19:04:13.618Z|5|dpdk(vhost_thread2)|DBG| 0 --> 0 2016-02-19T19:04:13.618Z|6|dpdk(vhost_thread2)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/home/daniele/ovs/_run/run/dv0' 0 changed to 'disabled' 2016-02-19T19:04:13.619Z|7|dpdk(vhost_thread2)|DBG|TX queue mapping for /home/daniele/ovs/_run/run/dv1 2016-02-19T19:04:13.619Z|8|dpdk(vhost_thread2)|DBG| 0 --> 0 2016-02-19T19:04:13.619Z|9|dpdk(vhost_thread2)|INFO|State of queue 0 ( tx_qid 0 ) of vhost device '/home/daniele/ovs/_run/run/dv1' 1 changed to 'enabled' 2016-02-19T19:04:13.619Z|00010|dpdk(vhost_thread2)|DBG|TX queue mapping for /home/daniele/ovs/_run/run/dv1 2016-02-19T19:04:13.619Z|00011|dpdk(vhost_thread2)|DBG| 0 --> 0 2016-02-19T19:04:13.619Z|00012|dpdk(vhost_thread2)|INFO|State of queue 2 ( tx_qid 1 ) of vhost device '/home/daniele/ovs/_run/run/dv1' 1 changed to 'disabled' 2016-02-19T19:04:13.619Z|00013|dpdk(vhost_thread2)|INFO|vHost Device '/home/daniele/ovs/_run/run/dv0' 0 has been added 2016-02-19T19:04:13.620Z|00014|dpdk(vhost_thread2)|INFO|vHost Device '/home/daniele/ovs/_run/run/dv1' 1 has been added tx_qid 1 of both devices is still mapped on -1, causing the packets to be lost One more comment inline. Thanks, Daniele On 11/02/2016 22:21, "Ilya Maximets" wrote: >Currently virtio driver in guest operating system have to be configured >to use exactly same number of queues. If number of queues will be less, >some packets will get stuck in queues unused by guest and will not be >received. > >Fix that by using new 'vring_state_changed' callback, which is >available for vhost-user since DPDK 2.2. >Implementation uses additional mapping from configured tx queues to >enabled by virtio driver. This requires mandatory locking of TX queues >in __netdev_dpdk_vhost_send(), but this locking was almost always anyway >because of calling set_multiq with n_txq = 'ovs_numa_get_n_cores() + 1'. > >Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support") >Signed-off-by: Ilya Maximets >Reviewed-by: Aaron Conole >Acked-by: Flavio Leitner >--- > lib/netdev-dpdk.c | 105 >+++--- > 1 file changed, 92 insertions(+), 13 deletions(-) > >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >index e4f789b..c50b1dd 100644 >--- a/lib/netdev-dpdk.c >+++ b/lib/netdev-dpdk.c >@@ -173,6 +173,8 @@ struct dpdk_tx_queue { > * from concurrent access. It is >used only > * if the queue is shared among >different > * pmd threads (see >'txq_needs_locking'). */ >+int map; /* Mapping of configured vhost-user >queues >+* to enabled by guest. */ > uint64_t tsc; > struct rte_mbuf *burst_pkts[MAX_TX_QUEUE_LEN]; > }; >@@ -559,6 +561,7 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *netdev, >unsigned int n_txqs) > unsigned i; > > netdev->tx_q = dpdk_rte_mzalloc(n_txqs * sizeof *netdev->tx_q); >+ > for (i = 0; i < n_txqs; i++) { > int numa_id = ovs_numa_get_numa_id(i); > >@@ -572,6 +575,9 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *netdev, >unsigned int n_txqs) > /* Queues are shared among CPUs. Always flush */ > netdev->tx_q[i].flush_tx = true; > } >+ >+/* Initialize map for vhost devices. */ >+netdev->tx_q[i].map = -1; > rte_spinlock_init(&netdev->tx_q[i].tx_lock); > } > } >@@ -1117,17 +1123,16 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, >int qid, > unsigned int total_pkts = cnt; > uint64_t start = 0; > >-if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { >+qid = vhost_dev->tx_q[qid % vhost_dev->real_n_txq].map; >+ >+if (OVS_UNLIKELY(!is_vhost_running(virtio_dev) || qid == -1)) { > rte_spinlock_lock(&vhost_dev->stats_lock); > vhost_dev->stats.tx_dropped+= cnt; > rte_spinlock_unlock(&vhost_dev->stats_lock); > goto out; > } > >-if (vhost_dev->txq_needs_locking) { >-qid = qid % vhost_dev->real_n_txq; >-rte_spinlock_lock(&vhost_dev->tx_q[qid].tx_lock); >-} >+rte_
[ovs-dev] [PATCH] types: Fix defined but not used warning.
warning: ‘OVS_BE128_MAX’ defined but not used [-Wunused-const-variable] Found using CentOS 6.6 with gcc 6.6.0. Signed-off-by: William Tu --- include/openvswitch/types.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/openvswitch/types.h b/include/openvswitch/types.h index 658fb50..5f3347d 100644 --- a/include/openvswitch/types.h +++ b/include/openvswitch/types.h @@ -19,6 +19,7 @@ #include #include +#include "openvswitch/compiler.h" #ifdef __CHECKER__ #define OVS_BITWISE __attribute__((bitwise)) @@ -104,7 +105,7 @@ typedef union { * So we use these static definitions rather than using initializer macros. */ static const ovs_u128 OVS_U128_MAX = { { UINT32_MAX, UINT32_MAX, UINT32_MAX, UINT32_MAX } }; -static const ovs_be128 OVS_BE128_MAX = { { OVS_BE32_MAX, OVS_BE32_MAX, +static const ovs_be128 OVS_BE128_MAX OVS_UNUSED = { { OVS_BE32_MAX, OVS_BE32_MAX, OVS_BE32_MAX, OVS_BE32_MAX } }; /* A 64-bit value, in network byte order, that is only aligned on a 32-bit -- 2.5.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 01/14] ofpbuf: New function ofpbuf_const_initializer().
With a note and a question below: Acked-by: Jarno Rajahalme > On Feb 19, 2016, at 12:34 AM, Ben Pfaff wrote: > > A number of times I've looked at code and thought that it would be easier > to understand if I could write an initializer instead of > ofpbuf_use_const(). This commit adds a function for that purpose and > adapts a lot of code to use it, in the places where I thought it made > the code better. > > In theory this could improve code generation since the new function can > be inlined whereas ofpbuf_use_const() isn't. But I guess that's probably > insignificant; the intent of this change is code readability. > > Signed-off-by: Ben Pfaff > — > (snip) > diff --git a/lib/ofpbuf.h b/lib/ofpbuf.h > index dd72cde..489a543 100644 > --- a/lib/ofpbuf.h > +++ b/lib/ofpbuf.h > @@ -64,8 +64,9 @@ struct ofpbuf { > enum ofpbuf_source source; /* Source of memory allocated as 'base'. */ > }; > > -/* An initializer for a struct ofpbuf that will be initially empty and > - * uses the space in STUB (which should be an array) as a stub. > +/* An initializer for a struct ofpbuf that will be initially empty and uses > the > + * space in STUB (which should be an array) as a stub. This is the > initializer > + * form of ofpbuf_use_stub(). > * > * Usage example: > * > @@ -83,6 +84,27 @@ struct ofpbuf { > .source = OFPBUF_STUB, \ > } > > +/* An initializer for a struct ofpbuf whose data starts at DATA and continues > + * for SIZE bytes. This is appropriate for an ofpbuf that will be used to > + * inspect existing data, without moving it around or reallocating it, and > + * generally without modifying it at all. This is the initializer form of > + * ofpbuf_use_const(). > + */ > +static inline struct ofpbuf > +ofpbuf_const_initializer(const void *data, size_t size) > +{ > +return (struct ofpbuf) { > +.base = CONST_CAST(void *, data), > +.data = CONST_CAST(void *, data), > +.size = size, > +.allocated = size, > +.header = NULL, > +.msg = NULL, > +.list_node = OVS_LIST_POISON, > +.source = OFPBUF_STACK, > +}; > +} > + I’m curious if there is a specific reason to define ofpbuf_const_initializer() as an inline function, and keep the OFPBUF_STUB_INITIALIZER() as a macro? > void ofpbuf_use_ds(struct ofpbuf *, const struct ds *); > void ofpbuf_use_stack(struct ofpbuf *, void *, size_t); > void ofpbuf_use_stub(struct ofpbuf *, void *, size_t); (snip) > diff --git a/ovn/controller/pinctrl.c b/ovn/controller/pinctrl.c > index 97b7f6c..7b7532d 100644 > --- a/ovn/controller/pinctrl.c > +++ b/ovn/controller/pinctrl.c > @@ -70,6 +70,15 @@ set_switch_config(struct rconn *swconn, > queue_msg(request); > } > > +unsigned int init_ofpbuf(const void *p, size_t size); > +unsigned int > +init_ofpbuf(const void *p, size_t size) > +{ > +struct ofpbuf b = ofpbuf_const_initializer(p, size); > +ofpbuf_pull(&b, 8); > +return b.size; > +} > + This seems unused. > static void > process_packet_in(struct controller_ctx *ctx OVS_UNUSED, > const struct ofp_header *msg) (snip) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] types: Fix defined but not used warning.
On Fri, Feb 19, 2016 at 01:35:55PM -0800, William Tu wrote: > warning: ‘OVS_BE128_MAX’ defined but not used [-Wunused-const-variable] > Found using CentOS 6.6 with gcc 6.6.0. Are you sure about that GCC version number? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4 02/14] ofp-util: Remove 'const' from struct ofputil_packet_in's 'packet' member.
Acked-by: Jarno Rajahalme > On Feb 19, 2016, at 12:34 AM, Ben Pfaff wrote: > > It's not const in all cases so it doesn't entirely make sense to mark > it const here. > > Signed-off-by: Ben Pfaff > --- > lib/ofp-util.c| 2 +- > lib/ofp-util.h| 2 +- > ofproto/connmgr.c | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/lib/ofp-util.c b/lib/ofp-util.c > index 23f7eda..d057915 100644 > --- a/lib/ofp-util.c > +++ b/lib/ofp-util.c > @@ -3341,7 +3341,7 @@ ofputil_decode_packet_in(const struct ofp_header *oh, > > opi = ofpbuf_pull(&b, offsetof(struct ofp10_packet_in, data)); > > -pin->packet = opi->data; > +pin->packet = CONST_CAST(uint8_t *, opi->data); > pin->len = b.size; > > match_init_catchall(&pin->flow_metadata); > diff --git a/lib/ofp-util.h b/lib/ofp-util.h > index 866e1bc..19bfc4b 100644 > --- a/lib/ofp-util.h > +++ b/lib/ofp-util.h > @@ -426,7 +426,7 @@ struct ofputil_packet_in { > * On decoding, the 'len' bytes in 'packet' might only be the first part > of > * the original packet. ofputil_decode_packet_in() reports the full > * original length of the packet using its 'total_len' output parameter. > */ > -const void *packet; /* The packet. */ > +void *packet; /* The packet. */ > size_t len; /* Length of 'packet' in bytes. */ > > /* Input port and other metadata for packet. */ > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c > index d4f64b2..cc947f0 100644 > --- a/ofproto/connmgr.c > +++ b/ofproto/connmgr.c > @@ -2247,6 +2247,6 @@ ofmonitor_wait(struct connmgr *mgr) > void > ofproto_async_msg_free(struct ofproto_async_msg *am) > { > -free(CONST_CAST(void *, am->pin.up.packet)); > +free(am->pin.up.packet); > free(am); > } > -- > 2.1.3 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev