Re: [ovs-dev] why we need op wait when add-port or delete port
Another 3 questions: 1. when use ovs-vsctl to add or del port p0 in br0, is that ok in the transaction when I change the op wait to the following? for port adding, the operation wait is only for br0(condition must exist), p0(conditon must not exist) for port deleting, the operation wait is only for br0(condition must exist), p0(condition must exist) 2. I noticed that when add-port p0 in br0, why we need to use op wait for another bridge and all the ports in that bridge? 3. when use ovs-vsctl add-port, can I use op mutate instead of op update in the transaction, exp: old format: { "where": [["_uuid", "==", ["uuid", "774222f2-9ab9-427f-a634-b818dc13cb2e"]]], "row": { "ports": ["set", [ ["named-uuid", "row35edc309_3d4a_4791_8aae_ab3a1ac73aa8"], ["uuid", "46b3b3f3-0e75-4d3c-bea0-088b4d7bbec2"], ["uuid", "d31a00e7-edf7-4ae6-a589-5c0537b7dde7"] ]] }, "op": "update", "table": "Bridge" }, new format: { "mutations":[["ports","insert",["named-uuid","row35edc309_3d4a_4791_8aae_ab3a1ac73aa8"]]], "table":"Bridge", "where":[["_uuid","==",["uuid","774222f2-9ab9-427f-a634-b818dc13cb2e"]]], "op":"mutate" } At 2016-03-08 12:32:30, "Ben Pfaff" wrote: >On Tue, Mar 08, 2016 at 11:43:53AM +0800, ychen wrote: >> I noticed that when add or delete port, the transaction always send op >> wait with all the ports in the same bridge. >> If the port p0 and p1 are totally independent, and have no relationship >> with each other, why when I add port p1, I need to wait port p0? > >This implements transactional atomicity. If two transactions each add a >different port, and they execute at the same time, one of them needs to >fail and be retried, otherwise only one of the ports will be added in >the end. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH][RESEND] openvswitch: add support for kernel 4.4
A bit cleaner than my previous patch. http://patchwork.ozlabs.org/patch/595969/ Though I couldn't figure out a clean solution for ip6_local_out(), genl_notify(), and vport-vxlan Signed-off-by: Alexandru Ardelean --- acinclude.m4 |4 +-- datapath/actions.c |6 ++-- datapath/conntrack.c |4 +-- datapath/datapath.c|6 +++- .../linux/compat/include/linux/netfilter_ipv6.h|2 +- datapath/linux/compat/include/net/ip.h | 29 ++-- datapath/linux/compat/include/net/ip6_tunnel.h |4 +++ datapath/linux/compat/include/net/vxlan.h | 10 +++ datapath/linux/compat/ip_fragment.c|4 +-- datapath/linux/compat/stt.c|6 datapath/vport-vxlan.c |5 11 files changed, 62 insertions(+), 18 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index 11c7787..07dd647 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -134,10 +134,10 @@ AC_DEFUN([OVS_CHECK_LINUX], [ AC_MSG_RESULT([$kversion]) if test "$version" -ge 4; then - if test "$version" = 4 && test "$patchlevel" -le 3; then + if test "$version" = 4 && test "$patchlevel" -le 4; then : # Linux 4.x else - AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version newer than 4.3.x is not supported (please refer to the FAQ for advice)]) + AC_ERROR([Linux kernel in $KBUILD is version $kversion, but version newer than 4.4.x is not supported (please refer to the FAQ for advice)]) fi elif test "$version" = 3; then : # Linux 3.x diff --git a/datapath/actions.c b/datapath/actions.c index 20413c9..719c43d 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -706,7 +706,8 @@ static void ovs_fragment(struct vport *vport, struct sk_buff *skb, u16 mru, skb_dst_set_noref(skb, &ovs_dst); IPCB(skb)->frag_max_size = mru; - ip_do_fragment(skb->sk, skb, ovs_vport_output); + ip_do_fragment(NET_ARG(dev_net(ovs_dst.dev)) + skb->sk, skb, ovs_vport_output); refdst_drop(orig_dst); } else if (ethertype == htons(ETH_P_IPV6)) { const struct nf_ipv6_ops *v6ops = nf_get_ipv6_ops(); @@ -727,7 +728,8 @@ static void ovs_fragment(struct vport *vport, struct sk_buff *skb, u16 mru, skb_dst_set_noref(skb, &ovs_rt.dst); IP6CB(skb)->frag_max_size = mru; - v6ops->fragment(skb->sk, skb, ovs_vport_output); + v6ops->fragment(NET_ARG(dev_net(ovs_rt.dst.dev)) + skb->sk, skb, ovs_vport_output); refdst_drop(orig_dst); } else { WARN_ONCE(1, "Failed fragment ->%s: eth=%04x, MRU=%d, MTU=%d.", diff --git a/datapath/conntrack.c b/datapath/conntrack.c index 795ed91..3b9bfba 100644 --- a/datapath/conntrack.c +++ b/datapath/conntrack.c @@ -323,7 +323,7 @@ static int handle_fragments(struct net *net, struct sw_flow_key *key, int err; memset(IPCB(skb), 0, sizeof(struct inet_skb_parm)); - err = ip_defrag(skb, user); + err = ip_defrag(NET_ARG(net) skb, user); if (err) return err; @@ -374,7 +374,7 @@ ovs_ct_expect_find(struct net *net, const struct nf_conntrack_zone *zone, { struct nf_conntrack_tuple tuple; - if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, &tuple)) + if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, NET_ARG(net) &tuple)) return NULL; return __nf_ct_expect_find(net, zone, &tuple); } diff --git a/datapath/datapath.c b/datapath/datapath.c index e3d3c8c..a4157f4 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -96,8 +96,12 @@ static bool ovs_must_notify(struct genl_family *family, struct genl_info *info, static void ovs_notify(struct genl_family *family, struct genl_multicast_group *grp, struct sk_buff *skb, struct genl_info *info) { - genl_notify(family, skb, genl_info_net(info), +#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,4,0) +genl_notify(family, skb, info, GROUP_ID(grp), GFP_KERNEL); +#else +genl_notify(family, skb, genl_info_net(info), info->snd_portid, GROUP_ID(grp), info->nlhdr, GFP_KERNEL); +#endif } /** diff --git a/datapath/linux/compat/include/linux/netfilter_ipv6.h b/datapath/linux/compat/include/linux/netfilter_ipv6.h index 3939e14..b724623 100644 --- a/datapath/linux/compat/include/linux/netfilter_ipv6.h +++ b/datapath/linux/compat/include/linux/netfilter_ipv6.h @@ -13,7 +13,7 @@ * the callback parameter needs to be in the form that older kernels accept. * We don't backport the other ipv6_ops as they're currently unus
[ovs-dev] FW: Payment 16-03-#269634
Dear dev, We have received this documents from your bank, please review attached documents. Yours sincerely, Eduardo Bullock Project Manager __ This email has been scanned by the Symantec Email Security.cloud service. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
> Zookeeper transactions can be isolated depending on what level of > isolation you need. > A setData on a node operation can contain a version, so that it fails > if that node has changed since the version. This means with a multi[1] > of setData operations, you can effectively get a snapshot isolation > level of isolation. For serializable, you could probably shoehorn it > in by rewriting all nodes that you've written. Thinking about this more, and refreshing my cache on isolation levels, I realized that zookeeper doesn't in fact offer SI, since reads are done one at a time, and the state in the database may change between reads to the same node. So zookeeper offers "Read committed" rather than SI. -Ivan ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Fwd: Where to add GTP tunnel headers in datapath flow table of open vSwitch
-- Forwarded message -- From: Ajmer Singh Date: Fri, Mar 11, 2016 at 3:16 PM Subject: Re: [ovs-dev] Where to add GTP tunnel headers in datapath flow table of open vSwitch To: Jesse Gross Hi, openflow1.3 specification supports below match fields. enum oxm_ofb_match_fields { OFPXMT_OFB_IN_PORT = 0, /* Switch input port. */ OFPXMT_OFB_IN_PHY_PORT = 1, /* Switch physical input port. */ OFPXMT_OFB_METADATA = 2, /* Metadata passed between tables. */ OFPXMT_OFB_ETH_DST = 3, /* Ethernet destination address. */ OFPXMT_OFB_ETH_SRC = 4, /* Ethernet source address. */ OFPXMT_OFB_ETH_TYPE = 5, /* Ethernet frame type. */ OFPXMT_OFB_VLAN_VID = 6, /* VLAN id. */ --- OFPXMT_OFB_IPV6_EXTHDR = 39, /* IPv6 Extension Header pseudo-field */ } but open Vswitch2.4 does have different enum constants for match fields. enum OVS_PACKED_ENUM mf_field_id{ MFF_DP_HASH, MFF_RECIRC_ID, MFF_CONJ_ID, MFF_TUN_ID, MFF_TUN_SRC, MFF_TUN_DST, MFF_TUN_FLAGS, MFF_TUN_TTL, MFF_TUN_TOS, MFF_TUN_GBP_ID, MFF_TUN_GBP_FLAGS, MFF_METADATA, MFF_IN_PORT, MFF_IN_PORT_OXM, MFF_ACTSET_OUTPUT, MFF_SKB_PRIORITY, MFF_PKT_MARK, MFF_REG0, MFF_REG1, MFF_REG2, MFF_REG3, MFF_REG4, MFF_REG5, MFF_REG6, MFF_REG7, #error "Need to update MFF_REG* to match FLOW_N_REGS" MFF_XREG0, MFF_XREG1, MFF_XREG2, MFF_XREG3, #error "Need to update MFF_REG* to match FLOW_N_XREGS" MFF_ETH_SRC, MFF_ETH_DST, MFF_ETH_TYPE, MFF_VLAN_TCI, MFF_DL_VLAN, MFF_VLAN_VID, MFF_DL_VLAN_PCP, MFF_VLAN_PCP, MFF_MPLS_LABEL, MFF_MPLS_TC, MFF_MPLS_BOS, /* Update mf_is_l3_or_higher() if MFF_IPV4_SRC is no longer the first element MFF_IPV4_SRC, MFF_IPV4_DST, MFF_IPV6_SRC, MFF_IPV6_DST, MFF_IPV6_LABEL, MFF_IP_PROTO, MFF_IP_DSCP, MFF_IP_DSCP_SHIFTED, MFF_IP_ECN, MFF_IP_TTL, MFF_IP_FRAG, MFF_ARP_OP, MFF_ARP_SPA, MFF_ARP_TPA, MFF_ARP_SHA, MFF_ARP_THA, MFF_TCP_SRC, MFF_TCP_DST, MFF_TCP_FLAGS, MFF_UDP_SRC, MFF_UDP_DST, MFF_SCTP_SRC, MFF_SCTP_DST, MFF_ICMPV4_TYPE, MFF_ICMPV4_CODE, MFF_ICMPV6_TYPE, MFF_ICMPV6_CODE, MFF_ND_TARGET, MFF_ND_SLL, MFF_ND_TLL, MFF_N_IDS } Could you please guide how these maps to standard openflow specification match fields? Regards, Ajmer On Fri, Mar 11, 2016 at 3:10 PM, Ajmer Singh wrote: > thanks for your response. > > On Wed, Mar 9, 2016 at 10:02 PM, Jesse Gross wrote: > >> On Tue, Mar 8, 2016 at 9:24 PM, Ajmer Singh >> wrote: >> > I have now question related to mapping of ofp_header->type (OFPT_) with >> > OFPRAW_contants >> > >> > struct ofp_header { >> > uint8_t version;/* An OpenFlow version number, e.g. >> OFP10_VERSION. >> > */ >> > uint8_t type; /* One of the OFPT_ constants. */ >> > ovs_be16 length;/* Length including this ofp_header. */ >> > ovs_be32 xid; /* Transaction id associated with this packet. >> > Replies use the same id as was in the request to facilitate pairing. */ >> > }; >> > OFP_ASSERT(sizeof(struct ofp_header) == 8); >> > >> > ofphdrs_decode(): openVswitch/lib/ofp-msg.c >> > this function assumes that openflow header type contains only below enum >> > constants. My query is why it is not taking care of lot many openflow >> > messages (PACKET_OUT, FLOW_MOD etc..). your response is much >> appreciated. >> > OFPT_VENDOR >> > OFPT10_STATS_REQUEST >> > OFPT10_STATS_REPLY >> > OFPT11_STATS_REQUEST >> > OFPT11_STATS_REQUEST >> >> The comment above struct ofphdrs describes what information is contained >> in it: >> >> /* A thin abstraction of OpenFlow headers: >> * >> * - 'version' and 'type' come straight from struct ofp_header, so >> these are >> * always present and meaningful. >> * >> * - 'stat' comes from the 'type' member in statistics messages only. >> It is >> * meaningful, therefore, only if 'version' and 'type' taken together >> * specify a statistics request or reply. Otherwise it is 0. >> * >> * - 'vendor' is meaningful only for vendor messages, that is, if >> 'version' >> * and 'type' specify a vendor message or if 'version' and 'type' >> specify >> * a statistics message and 'stat' specifies a vendor statistic type. >> * Otherwise it is 0. >> * >> * - 'subtype' is meaningful only for vendor messages and otherwise 0. >> It >> * specifies a vendor-defined subtype. There is no standard format >> for >> * these but 32 bits seems like it should be enough. */ >> >> It is not handling any message types, just extracting this >> information. However, none of this needs to be modified to add a new >> tunnel type. >> > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ofproto-dpif : propagate may_enable flag as link aliveness
Hi, Could someone please respond to this e-mail or give any feedback? Thank you, Zoltan -Original Message- From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Zoltán Balogh Sent: Tuesday, March 01, 2016 1:41 PM To: Ben Pfaff Cc: dev@openvswitch.org Subject: Re: [ovs-dev] [PATCH] ofproto-dpif : propagate may_enable flag as link aliveness Hi Ben, This small patch modifies the port_run( ) function in ofproto_dpif. This function is invoked indirectly from ofproto_run() when ofproto_class->run() is called. Sending of OFPT_PORT_STATUS message can be triggered by invoking update_port() in ofproto. I tried to create a 'call tree' that shows where the LIVE bit is set/cleared and where OFPT_PORT_STATUS message can be sent to a controller. We start from the main loop, that invokes bridge_run(), xxx_run(), xxx_wait(), poll_block() and so on. I used exclamation mark (!) to indicate where LIVE bit can be updated if needed and asterisk (*) to show where OFPT_PORT_STATUS can be sent via update_port(). So, here comes the tree. It looks quite ugly in my outlook. It can be copy-pasted to an editor with fixed width character set to get a better look. main() -> bridge_run() +-> bridge_run__() | -> ofproto_run() | +-> ofproto_class->run() ! | | -> port_run() # updates OFPUTIL_PS_LIVE bit if needed !!! | | | +-> process_port_change() | | +-> reinit_ports() * | | | -> update_port() | | | * | | +-> update_port() | | * | +-> update_port() | +-> bridge_reconfigure() +-> bridge_delete_or_reconfigure_ports() | -> ofproto_port_del() * | -> update_port() | +-> bridge_add_ports() | -> bridge_add_ports__() | -> iface_create() | +-> iface_do_create() | | -> ofproto_port_add() * | | -> update_port() | | | +-> ofproto_port_add() * | -> update_port() | +-> bridge_run__() -> ofproto_run() +-> ofproto_class->run() !| -> port_run() # updates OFPUTIL_PS_LIVE bit if needed !!! | +-> process_port_change() | +-> reinit_ports() *| | -> update_port() | | *| +-> update_port() | *+-> update_port() So, you can see that LIVE bit is updated before update_port() can be called in each cycle of the main loop. When update_port() is called, it verifies if any properties of the port has changed. If it has then calls ofport_modified() which sends port status message to the controller. I did a test with ovs-testcontroller in verbose mode. I created a bridge, added a physical port to it, then changed the port's state to up. This is the controller ouput: 2016-03-01T08:40:58Z|00050|vconn|DBG|tcp:192.168.2.145:42767: received: OFPT_PORT_STATUS (OF1.3) (xid=0x0): MOD: 1(eth3): addr:aa:55:aa:55:00:07 config: 0 state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 2016-03-01T08:40:58Z|00051|learning_switch|DBG|6e81cdce5141: OpenFlow packet ignored: OFPT_PORT_STATUS (OF1.3) (xid=0x0): MOD: 1(eth3): addr:aa:55:aa:55:00:07 config: 0 state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max 2016-03-01T08:40:58Z|00052|poll_loop|DBG|wakeup due to [POLLIN] on fd 8 (192.168.2.145:6653<->192.168.2.145:42767) at lib/stream-fd.c:155 2016-03-01T08:40:58Z|00053|vconn|DBG|tcp:192.168.2.145:42767: received: OFPT_PORT_STATUS (OF1.3) (xid=0x0): MOD: 1(eth3): addr:aa:55:aa:55:00:07 config: 0 state: LIVE speed: 0 Mbps now, 0 Mbps max 2016-03-01T08:40:58Z|00054|learning_switch|DBG|6e81cdce5141: OpenFlow packet ignored: OFPT_PORT_STATUS (OF1.3) (xid=0x0): MOD: 1(eth3): addr:aa:55:aa:55:00:07 config: 0 state: LIVE speed: 0 Mbps now, 0 Mbps max This is what ovs-ofctl shows before and after port state modification: # ovs-ofctl show br OFPT_FEATURES_REPLY (xid=0x2): dpid:6e81cdce5141 n_tables:254, n_buffers:256 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(eth3): addr:aa:55:aa:55:00:07 config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max LO
Re: [ovs-dev] RFC: OVN database options
On Thu, Mar 10 2016, Russell Bryant wrote: > Specific to the OVN+OpenStack use case, I imagine a frequent question would > be, "why do I have to use MariaDB+Galera AND PostgreSQL in the same > environment?!" I suppose OpenStack works with PostgreSQL, too, and it's > just a deployment choice that most people seem to be using MariaDB+Galera. OpenStack (supposedly) works with PostgreSQL. Galera is picked for most deployment because it's an HA layer on top of MariaDB that is much easier to deploy than PostgreSQL solution is. Other than that, in term of RDBMS/SQL features, PostgreSQL is generally on top of MySQL. FTR, Gnocchi¹ is one of the OpenStack project pushing PostgreSQL as recommended over MySQL, so it can leverage features such as precise timestamps or time range computing which MySQL was/is unable to do (I know, it sounds ridiculous). We actually wish we could use even more PostgreSQL features and drop MySQL, but that'd be a bit harsh. A great thing for OpenStack would be to capitalize on the work that OVN could do if it'd pick the path of PostgreSQL in HA mode. And sorry for getting a little bit off-topic here, :-) Cheers, ¹ http://gnocchi.xyz/configuration.html -- Julien Danjou -- Free Software hacker -- https://julien.danjou.info ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Apple Online Inc.
You just won the Apple Online Sweepstakes Donor: Apple Online Inc. Prize Won: $850,000.00 USD & One iPhone 6+ Handheld device. Claims Agent: Harry Thompson (apple.onlin...@yandex.com) Send your Name, Age, Gender, Occupation, Home Address and Telephone Number to the e-mail above for more information and prize claim. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFCv2: OVN database options
On Thu, Mar 10, 2016 at 9:41 PM, Han Zhou wrote: > On Thu, Mar 10, 2016 at 5:45 PM, Ben Pfaff wrote: > > > > On Thu, Mar 10, 2016 at 05:31:18PM -0800, Ben Pfaff wrote: > > > I have been considering this as a minimum interesting scale. It's hard > > > for me to know what the interesting scale range is. I am really happy > > > to hear what is important to you? > > > > That ? was supposed to be an ! > > > > > Can you tell me about what you want to scale to? > > > > But that's really a question. > > It is hard to tell an exact number since it increases over time. But > considering scale of modern data centers, it is not uncommon to have more > than 1k hypervisors in a single control plane. Would 5k - 10k clients be a > realistic target? > For number of ports, considering number of cores in a BM, maybe something > around 100 lports per hypervisor sounds better. > I had a discussion with a group last week that was wanting to target 10k hypervisors in a single OpenStack environment. We should also not forget the vast numbers of environments that really only need 10s to a few hundred hypervisors to work *really* well. There are other pain points for OpenStack at the higher end of scale targets. OpenStack already works great for these 10s-100s enterprise private cloud use cases. With that context, HA is far more of an immediate concern for me than the much higher scale targets. -- Russell Bryant ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
On Fri, Mar 11, 2016 at 09:58:18AM +0100, Ivan Kelly wrote: > > Zookeeper transactions can be isolated depending on what level of > > isolation you need. > > A setData on a node operation can contain a version, so that it fails > > if that node has changed since the version. This means with a multi[1] > > of setData operations, you can effectively get a snapshot isolation > > level of isolation. For serializable, you could probably shoehorn it > > in by rewriting all nodes that you've written. > Thinking about this more, and refreshing my cache on isolation levels, > I realized that zookeeper doesn't in fact offer SI, since reads are > done one at a time, and the state in the database may change between > reads to the same node. So zookeeper offers "Read committed" rather > than SI. Just to make sure, does this means that a Zookeeper client cannot read a consistent snapshot of the entire database? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
> Just to make sure, does this means that a Zookeeper client cannot read a > consistent snapshot of the entire database? Yes, exactly. It can only read one node at a time, so writes can occur between the reading of two nodes. -Ivan ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
On Fri, Mar 11, 2016 at 05:10:15PM +0100, Ivan Kelly wrote: > > Just to make sure, does this means that a Zookeeper client cannot read a > > consistent snapshot of the entire database? > Yes, exactly. It can only read one node at a time, so writes can occur > between the reading of two nodes. OK. That's a major downside for this use case, because the OVN clients are accustomed to viewing a consistent snapshot of the database. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
On Fri, Mar 11, 2016 at 5:20 PM, Ben Pfaff wrote: > On Fri, Mar 11, 2016 at 05:10:15PM +0100, Ivan Kelly wrote: >> > Just to make sure, does this means that a Zookeeper client cannot read a >> > consistent snapshot of the entire database? >> Yes, exactly. It can only read one node at a time, so writes can occur >> between the reading of two nodes. > > OK. That's a major downside for this use case, because the OVN clients > are accustomed to viewing a consistent snapshot of the database. Well, if you do the log tailing thing I suggested, then the client will have access to a consistent snapshot, since they would only read from the database directly once, and all client updates after that would come from the log which arrive in a well defined order. -Ivan ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFCv2: OVN database options
On Thu, Mar 10, 2016 at 06:41:51PM -0800, Han Zhou wrote: > On Thu, Mar 10, 2016 at 5:45 PM, Ben Pfaff wrote: > > > > On Thu, Mar 10, 2016 at 05:31:18PM -0800, Ben Pfaff wrote: > > > I have been considering this as a minimum interesting scale. It's hard > > > for me to know what the interesting scale range is. I am really happy > > > to hear what is important to you? > > > > That ? was supposed to be an ! > > > > > Can you tell me about what you want to scale to? > > > > But that's really a question. > > It is hard to tell an exact number since it increases over time. But > considering scale of modern data centers, it is not uncommon to have more > than 1k hypervisors in a single control plane. Would 5k - 10k clients be a > realistic target? > For number of ports, considering number of cores in a BM, maybe something > around 100 lports per hypervisor sounds better. That's a significantly higher goal. 10,000 * 100 == 1,000,000 lports, so if we keep the 1 kB to 5 kB per lport figure then that's 1 GB to 5 GB of data. Let me revise my requirements, then: - Size: For 1,000 hypervisors at 20 lports/HV and 1 kB/lport, 20 MB; for 10,000 hypervisors at 100 lports/HV and 5 kB/lport, 5 GB. - Scale: The northbound database has only a single-digit number of clients. Each hypervisor is a client to the southbound database, so about 1,000 clients for 1,000 hypervisors or 10,000 clients for 10,000 hypervisors. and the analysis: - OVSDB. If we choose to use OVSDB, we'll have to add high-availability support. Also, the table doesn't mention scaling, since it's hard to compare objectively, but the OVSDB server probably does not scale to 10,000 clients. (Incidentally, Martin also mentioned today in a meeting that 1,000 HVs sounded low.) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
On Fri, Mar 11, 2016 at 05:26:06PM +0100, Ivan Kelly wrote: > On Fri, Mar 11, 2016 at 5:20 PM, Ben Pfaff wrote: > > On Fri, Mar 11, 2016 at 05:10:15PM +0100, Ivan Kelly wrote: > >> > Just to make sure, does this means that a Zookeeper client cannot read a > >> > consistent snapshot of the entire database? > >> Yes, exactly. It can only read one node at a time, so writes can occur > >> between the reading of two nodes. > > > > OK. That's a major downside for this use case, because the OVN clients > > are accustomed to viewing a consistent snapshot of the database. > Well, if you do the log tailing thing I suggested, then the client > will have access to a consistent snapshot, since they would only read > from the database directly once, and all client updates after that > would come from the log which arrive in a well defined order. OK. I'm concerned about the log tailing solution, because it seems likely to me that each hypervisor would have to examine every transaction, not just those related to the logical switches that they're interested in. This could become a scale issue. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Fwd: Where to add GTP tunnel headers in datapath flow table of open vSwitch
On Fri, Mar 11, 2016 at 03:18:00PM +0530, Ajmer Singh wrote: > -- Forwarded message -- > From: Ajmer Singh > Date: Fri, Mar 11, 2016 at 3:16 PM > Subject: Re: [ovs-dev] Where to add GTP tunnel headers in datapath flow > table of open vSwitch > To: Jesse Gross > > > Hi, > > > openflow1.3 specification supports below match fields. > > enum oxm_ofb_match_fields { > OFPXMT_OFB_IN_PORT = 0, /* Switch input port. */ > OFPXMT_OFB_IN_PHY_PORT = 1, /* Switch physical input port. */ > OFPXMT_OFB_METADATA = 2, /* Metadata passed between tables. */ > OFPXMT_OFB_ETH_DST = 3, /* Ethernet destination address. */ > OFPXMT_OFB_ETH_SRC = 4, /* Ethernet source address. */ > OFPXMT_OFB_ETH_TYPE = 5, /* Ethernet frame type. */ > OFPXMT_OFB_VLAN_VID = 6, /* VLAN id. */ > --- > OFPXMT_OFB_IPV6_EXTHDR = 39, /* IPv6 Extension Header pseudo-field */ > } > but open Vswitch2.4 does have different enum constants for match fields. > > enum OVS_PACKED_ENUM mf_field_id{ > MFF_DP_HASH, > MFF_RECIRC_ID, > MFF_CONJ_ID, > MFF_TUN_ID, > MFF_TUN_SRC, > MFF_TUN_DST, > MFF_TUN_FLAGS, > MFF_TUN_TTL, > MFF_TUN_TOS, > MFF_TUN_GBP_ID, > MFF_TUN_GBP_FLAGS, > MFF_METADATA, > MFF_IN_PORT, > MFF_IN_PORT_OXM, > MFF_ACTSET_OUTPUT, > MFF_SKB_PRIORITY, > MFF_PKT_MARK, > MFF_REG0, > MFF_REG1, > MFF_REG2, > MFF_REG3, > MFF_REG4, > MFF_REG5, > MFF_REG6, > MFF_REG7, > #error "Need to update MFF_REG* to match FLOW_N_REGS" > MFF_XREG0, > MFF_XREG1, > MFF_XREG2, > MFF_XREG3, > #error "Need to update MFF_REG* to match FLOW_N_XREGS" > MFF_ETH_SRC, > MFF_ETH_DST, > MFF_ETH_TYPE, > MFF_VLAN_TCI, > MFF_DL_VLAN, > MFF_VLAN_VID, > MFF_DL_VLAN_PCP, > MFF_VLAN_PCP, > MFF_MPLS_LABEL, > MFF_MPLS_TC, > MFF_MPLS_BOS, > /* Update mf_is_l3_or_higher() if MFF_IPV4_SRC is no longer the first > element > MFF_IPV4_SRC, > MFF_IPV4_DST, > MFF_IPV6_SRC, > MFF_IPV6_DST, > MFF_IPV6_LABEL, > MFF_IP_PROTO, > MFF_IP_DSCP, > MFF_IP_DSCP_SHIFTED, > MFF_IP_ECN, > MFF_IP_TTL, > MFF_IP_FRAG, > MFF_ARP_OP, > MFF_ARP_SPA, > MFF_ARP_TPA, > MFF_ARP_SHA, > MFF_ARP_THA, > MFF_TCP_SRC, > MFF_TCP_DST, > MFF_TCP_FLAGS, > MFF_UDP_SRC, > MFF_UDP_DST, > MFF_SCTP_SRC, > MFF_SCTP_DST, > MFF_ICMPV4_TYPE, > MFF_ICMPV4_CODE, > MFF_ICMPV6_TYPE, > MFF_ICMPV6_CODE, > MFF_ND_TARGET, > MFF_ND_SLL, > MFF_ND_TLL, > MFF_N_IDS > } > Could you please guide how these maps to standard openflow specification > match fields? There's a *huge* comment in meta-flow.h that explains the whole thing. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
>> Well, if you do the log tailing thing I suggested, then the client >> will have access to a consistent snapshot, since they would only read >> from the database directly once, and all client updates after that >> would come from the log which arrive in a well defined order. > > OK. > > I'm concerned about the log tailing solution, because it seems likely to > me that each hypervisor would have to examine every transaction, not > just those related to the logical switches that they're interested in. > This could become a scale issue. As I have it in my head, the hypervisors wouldn't access the log directly, but there'd a facade process which handles it. This facade could easily do filtering to avoid having the whole log go to all clients. TBH, I don't fully understand the semantics of ovsdb yet, and it's interaction with OVN, so I'm not 100% that this approach would work. It's something I plan to study next week though. -Ivan ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
On 03/10/2016 06:50 PM, Ben Pfaff wrote: I've been a fan of Postgres since I used in the 1990s for a web-based application. It didn't occur to me that it was appropriate here. Julien, thanks so much for joining the discussion. So yes, it has everything OVN needs. It can push notifications to clients via the NOTIFY¹ command (that you can use in any procedure/trigger). For example, you could imagine creating a trigger that sends a JSON payload for each new update/insert in the database. That's literally 10 lines of PL/SQL. That's good to know. I hadn't figured out how to do this kind of thing with SQL-based systems. ¹ http://www.postgresql.org/docs/9.5/static/sql-notify.html I think that PostgreSQL would be the safer bet in this move, as: - building something on top of etcd would seem weak w.r.t your schema/table requirements - investing in OVSDB (though keep in mind I don't know it :-) would probably end up in redoing a job PostgreSQL people already have done better than you would ;-) The only questions that this raises to me are: - whether PostgreSQL is too large/complex to deploy for OVN. Seeing the list of candidates that were evaluated, I wouldn't think so, but there can be a lot of different opinions on that based on different perception of PostgreSQL. And since you're targeting a network DB, you definitely need a daemon configured and set-up so I'm only partially worried here. :) Hi there, Russell Bryant invited me to this list to chime in on this discussion. If it were me, I *might* not build out based on NOTIFY as the core system of notifying clients, and I'd likely stick with a tool that's designed for cluster communication and in this case the custom service that's already there seems like it might be the best bet; I'd actually build out the service and use RAFT to keep it in sync with itself. The reason is because Postgresql is not supplying you with an easy out-of-the-box HA component in any case (Galera does, but then you don't get NOTIFY), so you're going to have to build out something like RAFT or such on the PG side in any case in order to handle failover. Postgresql's HA story is not very good right now, it's very much roll-your-own, and it is nowhere near the sophistication of Galera's multi-master approach which would be an enormous muilt-year undertaking to recreate on Posgtresql. IMO building out the HA part from scratch is the difficult part; being able to send events to clients is pretty easy from any kind of custom service. Since to do HA in PG you'd have to build your own event-dispatch system anyway (e.g. to determine a node is down and send out the call to pick a new master node as well as some method to get all the clients to send data updates to this node), might as well just build your custom service to do just the thing you need. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ofproto-dpif : propagate may_enable flag as link aliveness
Can you briefly state the question you're asking? I didn't know how to respond to the previous email. On Fri, Mar 11, 2016 at 10:42:26AM +, Zoltán Balogh wrote: > Hi, > > Could someone please respond to this e-mail or give any feedback? > > Thank you, > Zoltan > > -Original Message- > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Zoltán Balogh > Sent: Tuesday, March 01, 2016 1:41 PM > To: Ben Pfaff > Cc: dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH] ofproto-dpif : propagate may_enable flag as > link aliveness > > Hi Ben, > > This small patch modifies the port_run( ) function in ofproto_dpif. This > function is invoked indirectly from ofproto_run() when ofproto_class->run() > is called. > Sending of OFPT_PORT_STATUS message can be triggered by invoking > update_port() in ofproto. > > I tried to create a 'call tree' that shows where the LIVE bit is set/cleared > and where OFPT_PORT_STATUS message can be sent to a controller. > We start from the main loop, that invokes bridge_run(), xxx_run(), > xxx_wait(), poll_block() and so on. > I used exclamation mark (!) to indicate where LIVE bit can be updated if > needed and asterisk (*) to show where OFPT_PORT_STATUS can be sent via > update_port(). > So, here comes the tree. It looks quite ugly in my outlook. It can be > copy-pasted to an editor with fixed width character set to get a better look. > >main() > -> bridge_run() > +-> bridge_run__() > | -> ofproto_run() > | +-> ofproto_class->run() > ! | | -> port_run() # updates OFPUTIL_PS_LIVE bit if > needed !!! > | | > | +-> process_port_change() > | | +-> reinit_ports() > * | | | -> update_port() > | | | > * | | +-> update_port() > | | > * | +-> update_port() > | > +-> bridge_reconfigure() >+-> bridge_delete_or_reconfigure_ports() >| -> ofproto_port_del() > * | -> update_port() >| >+-> bridge_add_ports() >| -> bridge_add_ports__() >| -> iface_create() >| +-> iface_do_create() >| | -> ofproto_port_add() > * | | -> update_port() >| | >| +-> ofproto_port_add() > * | -> update_port() >| >+-> bridge_run__() > -> ofproto_run() > +-> ofproto_class->run() > !| -> port_run() # updates OFPUTIL_PS_LIVE bit > if needed !!! > | > +-> process_port_change() > | +-> reinit_ports() > *| | -> update_port() > | | > *| +-> update_port() > | > *+-> update_port() > > > So, you can see that LIVE bit is updated before update_port() can be called > in each cycle of the main loop. When update_port() is called, it verifies if > any properties of the port has changed. If it has then calls > ofport_modified() which sends port status message to the controller. > > I did a test with ovs-testcontroller in verbose mode. I created a bridge, > added a physical port to it, then changed the port's state to up. > This is the controller ouput: > > 2016-03-01T08:40:58Z|00050|vconn|DBG|tcp:192.168.2.145:42767: received: > OFPT_PORT_STATUS (OF1.3) (xid=0x0): MOD: 1(eth3): addr:aa:55:aa:55:00:07 > config: 0 > state: LINK_DOWN > speed: 0 Mbps now, 0 Mbps max > 2016-03-01T08:40:58Z|00051|learning_switch|DBG|6e81cdce5141: OpenFlow > packet ignored: OFPT_PORT_STATUS (OF1.3) (xid=0x0): MOD: 1(eth3): > addr:aa:55:aa:55:00:07 > config: 0 > state: LINK_DOWN > speed: 0 Mbps now, 0 Mbps max > 2016-03-01T08:40:58Z|00052|poll_loop|DBG|wakeup due to [POLLIN] on fd 8 > (192.168.2.145:6653<->192.168.2.145:42767) at lib/stream-fd.c:155 > 2016-03-01T08:40:58Z|00053|vconn|DBG|tcp:192.168.2.145:42767: received: > OFPT_PORT_STATUS (OF1.3) (xid=0x0): MOD: 1(eth3): addr:aa:55:aa:55:00:07 > config: 0 > state: LIVE > speed: 0 Mbps now, 0 Mbps max > 2016-03-01T08:40:58Z|00054|learning_switch|DBG|6e81cdce5141: OpenFlow > packet ignored: OFPT_PORT_STATUS (OF1.3) (xid=0x0): MOD: 1(eth3): > addr:aa:55:aa:55:00:07 > config: 0 > state: LIVE > speed: 0 Mbps now, 0 Mbps max > > > This is what ovs-ofctl shows before and after port state modification: > > # ovs-ofctl show br > OFPT_FEATURES_
Re: [ovs-dev] [PATCH 6/6] tunneling: Enable IPv6 tuneling.
On Wed, Mar 9, 2016 at 4:40 PM, Pravin B Shelar wrote: > From: Pravin B Shelar > > There is check to disable IPv6 tunneling. Following patch > removes it and reintroduces the tunneling automake tests. > > This reverts commit 250bd94d1e500a89c76cac944e660bd9c07ac364. > > Signed-off-by: Pravin B Shelar When we reintroduce IPv6 tunneling, it would be nice to update the documentation and add it to NEWS as well. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
On Fri, Mar 11, 2016 at 12:13:25PM -0500, Mike Bayer wrote: > On 03/10/2016 06:50 PM, Ben Pfaff wrote: > > > >I've been a fan of Postgres since I used in the 1990s for a web-based > >application. It didn't occur to me that it was appropriate here. > >Julien, thanks so much for joining the discussion. > > > >>>So yes, it has everything OVN needs. It can push notifications to > >>>clients via the NOTIFY¹ command (that you can use in any > >>>procedure/trigger). For example, you could imagine creating a trigger > >>>that sends a JSON payload for each new update/insert in the database. > >>>That's literally 10 lines of PL/SQL. > > > >That's good to know. I hadn't figured out how to do this kind of thing > >with SQL-based systems. > > > >>>¹ http://www.postgresql.org/docs/9.5/static/sql-notify.html > >>> > >>>I think that PostgreSQL would be the safer bet in this move, as: > >>>- building something on top of etcd would seem weak w.r.t your > >>>schema/table requirements > >>>- investing in OVSDB (though keep in mind I don't know it :-) would > >>>probably end up in redoing a job PostgreSQL people already have done > >>>better than you would ;-) > >>> > >>>The only questions that this raises to me are: > >>>- whether PostgreSQL is too large/complex to deploy for OVN. Seeing the > >>> list of candidates that were evaluated, I wouldn't think so, but there > >>> can be a lot of different opinions on that based on different > >>> perception of PostgreSQL. And since you're targeting a network DB, you > >>> definitely need a daemon configured and set-up so I'm only partially > >>> worried here. :) > > Hi there, Russell Bryant invited me to this list to chime in on this > discussion. If it were me, I *might* not build out based on NOTIFY as the > core system of notifying clients, and I'd likely stick with a tool > that's designed for cluster communication and in this case the custom > service that's already there seems like it might be the best bet; I'd > actually build out the service and use RAFT to keep it in sync with itself. > > The reason is because Postgresql is not supplying you with an easy > out-of-the-box HA component in any case (Galera does, but then you don't get > NOTIFY), so you're going to have to build out something like RAFT or such on > the PG side in any case in order to handle failover. Postgresql's HA story > is not very good right now, it's very much roll-your-own, and it is nowhere > near the sophistication of Galera's multi-master approach which would be an > enormous muilt-year undertaking to recreate on Posgtresql. IMO building > out the HA part from scratch is the difficult part; being able to send > events to clients is pretty easy from any kind of custom service. Since to > do HA in PG you'd have to build your own event-dispatch system anyway (e.g. > to determine a node is down and send out the call to pick a new master node > as well as some method to get all the clients to send data updates to this > node), might as well just build your custom service to do just the thing you > need. Thanks a lot for the comments! I've added this to my notes. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] RFC: OVN database options
On Fri, Mar 11, 2016 at 05:49:07PM +0100, Ivan Kelly wrote: > >> Well, if you do the log tailing thing I suggested, then the client > >> will have access to a consistent snapshot, since they would only read > >> from the database directly once, and all client updates after that > >> would come from the log which arrive in a well defined order. > > > > OK. > > > > I'm concerned about the log tailing solution, because it seems likely to > > me that each hypervisor would have to examine every transaction, not > > just those related to the logical switches that they're interested in. > > This could become a scale issue. > As I have it in my head, the hypervisors wouldn't access the log > directly, but there'd a facade process which handles it. This facade > could easily do filtering to avoid having the whole log go to all > clients. TBH, I don't fully understand the semantics of ovsdb yet, and > it's interaction with OVN, so I'm not 100% that this approach would > work. It's something I plan to study next week though. Thanks. I've added that to my notes. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/6] lib: Fix compose nd
On Wed, Mar 09, 2016 at 04:40:42PM -0800, Pravin B Shelar wrote: > Following patch fixes number of issues with compose nd, like > setting ip packet header, set ICMP opt-len, checksum. > > Signed-off-by: Pravin B Shelar > --- > lib/packets.c | 60 > +-- > 1 file changed, 42 insertions(+), 18 deletions(-) > > diff --git a/lib/packets.c b/lib/packets.c > index daca1b3..6e2c68b 100644 > --- a/lib/packets.c > +++ b/lib/packets.c > @@ -794,7 +794,7 @@ eth_compose(struct dp_packet *b, const struct eth_addr > eth_dst, > dp_packet_prealloc_tailroom(b, 2 + ETH_HEADER_LEN + VLAN_HEADER_LEN + > size); > dp_packet_reserve(b, 2 + VLAN_HEADER_LEN); > eth = dp_packet_put_uninit(b, ETH_HEADER_LEN); > -data = dp_packet_put_uninit(b, size); > +data = dp_packet_put_zeros(b, size); > > eth->eth_dst = eth_dst; > eth->eth_src = eth_src; > @@ -845,7 +845,7 @@ packet_rh_present(struct dp_packet *packet) > size_t remaining; > uint8_t *data = dp_packet_l3(packet); > > -remaining = packet->l4_ofs - packet->l3_ofs; > +remaining = dp_packet_size(packet) - packet->l3_ofs; > > if (remaining < sizeof *nh) { > return false; > @@ -1027,9 +1027,7 @@ packet_set_ipv6(struct dp_packet *packet, uint8_t > proto, const ovs_be32 src[4], > } > > packet_set_ipv6_tc(&nh->ip6_flow, key_tc); > - > packet_set_ipv6_flow_label(&nh->ip6_flow, key_fl); > - > nh->ip6_hlim = key_hl; > } > > @@ -1116,7 +1114,8 @@ packet_set_icmp(struct dp_packet *packet, uint8_t type, > uint8_t code) > > void > packet_set_nd(struct dp_packet *packet, const ovs_be32 target[4], > - const struct eth_addr sll, const struct eth_addr tll) { > + const struct eth_addr sll, const struct eth_addr tll) > +{ > struct ovs_nd_msg *ns; > struct ovs_nd_opt *nd_opt; > int bytes_remain = dp_packet_l4_size(packet); > @@ -1288,34 +1287,60 @@ compose_arp(struct dp_packet *b, uint16_t arp_op, > dp_packet_set_l3(b, arp); > } > > +/* This function expect packet with ehernet header with correct > + * l3 pointer set. */ > +static void * > +compose_ipv6(struct dp_packet *packet, uint8_t proto, const ovs_be32 src[4], > +const ovs_be32 dst[4], uint8_t key_tc, ovs_be32 key_fl, > +uint8_t key_hl, int size) > +{ > +struct ip6_hdr *nh; > +void *data; > + > +nh = dp_packet_l3(packet); > +nh->ip6_vfc = 0x60; > +nh->ip6_nxt = proto; > +nh->ip6_plen = htons(size); > +data = dp_packet_put_zeros(packet, size); > +dp_packet_set_l4(packet, data); > +packet_set_ipv6(packet, proto, src, dst, key_tc, key_fl, key_hl); > +return data; > +} > + > void > compose_nd(struct dp_packet *b, const struct eth_addr eth_src, > - struct in6_addr * ipv6_src, struct in6_addr * ipv6_dst) > + struct in6_addr * ipv6_src, struct in6_addr *ipv6_dst) Remove the extra space before ipv6_src too. Otherwise, seems good to me. It passes my manual test of setting up two hosts connected through IPv6 + VXLAN, and copying a file with scp between them. Also, it pass my checksum tests. I have a patch to test-csum, inline below, and I have added a receive test VXLAN with checksummed UDP. The test packet was checked using tcpdump and wireshark, ie, they both claim good checksum. This and other patches I have written (some similar to yours, but besides the tests, none needed with your patchset) are hosted here [1], in case anyone would like to take a look, review, or use them. [1] http://git.cascardo.eti.br/?p=cascardo/ovs.git;a=shortlog;h=refs/heads/ipv6 > { > struct in6_addr sn_addr; > struct eth_addr eth_dst; > struct ovs_nd_msg *ns; > struct ovs_nd_opt *nd_opt; > +uint32_t icmp_csum; > > in6_addr_solicited_node(&sn_addr, ipv6_dst); > ipv6_multicast_to_ethernet(ð_dst, &sn_addr); > > -eth_compose(b, eth_dst, eth_src, ETH_TYPE_IPV6, > -IPV6_HEADER_LEN + ICMP6_HEADER_LEN + ND_OPT_LEN); > -packet_set_ipv6(b, IPPROTO_ICMPV6, > -ALIGNED_CAST(ovs_be32 *, ipv6_src->s6_addr), > -ALIGNED_CAST(ovs_be32 *, sn_addr.s6_addr), > -0, 0, 255); > - > -ns = dp_packet_l4(b); > -nd_opt = &ns->options[0]; > +eth_compose(b, eth_dst, eth_src, ETH_TYPE_IPV6, IPV6_HEADER_LEN); > +ns = compose_ipv6(b, IPPROTO_ICMPV6, > + ALIGNED_CAST(ovs_be32 *, ipv6_src->s6_addr), > + ALIGNED_CAST(ovs_be32 *, sn_addr.s6_addr), > + 0, 0, 255, > + ND_MSG_LEN + ND_OPT_LEN); > > ns->icmph.icmp6_type = ND_NEIGHBOR_SOLICIT; > ns->icmph.icmp6_code = 0; > +put_16aligned_be32(&ns->rco_flags, htons(0)); > > +nd_opt = &ns->options[0]; > nd_opt->nd_opt_type = ND_OPT_SOURCE_LINKADDR; > +nd_opt->nd_opt_len = 1; > + > packet_set_nd(b, ALIGNED_CAST(o
[ovs-dev] [PATCH v9 02/10] Present tracked changes in increasing change number order
From: RYAN D. MOATS Currently changes are added to the front of the track list, so they are looped through in LIFO order. Incremental processing is more efficient with a FIFO presentation, so (1) add new changes to the back of the track list, and (2) move updated changes to the back of the track list Signed-off-by: RYAN D. MOATS --- lib/ovsdb-idl.c |9 + 1 files changed, 5 insertions(+), 4 deletions(-) diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c index 4cb1c81..5dc8565 100644 --- a/lib/ovsdb-idl.c +++ b/lib/ovsdb-idl.c @@ -1350,10 +1350,11 @@ ovsdb_idl_row_change__(struct ovsdb_idl_row *row, const struct json *row_json, = row->table->change_seqno[change] = row->table->idl->change_seqno + 1; if (table->modes[column_idx] & OVSDB_IDL_TRACK) { -if (list_is_empty(&row->track_node)) { -list_push_front(&row->table->track_list, -&row->track_node); +if (!list_is_empty(&row->track_node)) { +list_remove(&row->track_node); } +list_push_back(&row->table->track_list, + &row->track_node); if (!row->updated) { row->updated = bitmap_allocate(class->n_columns); } @@ -1572,7 +1573,7 @@ ovsdb_idl_row_destroy(struct ovsdb_idl_row *row) = row->table->idl->change_seqno + 1; } if (list_is_empty(&row->track_node)) { -list_push_front(&row->table->track_list, &row->track_node); +list_push_back(&row->table->track_list, &row->track_node); } } } -- 1.7.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v9 09/10] Reset lflow processing when adding/removing patch ports
From: RYAN D. MOATS As lflow processing is incremental, reset it whenever a patch port is added or removed. Signed-off-by: RYAN D. MOATS --- ovn/controller/patch.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c index 9c519b0..e8f107a 100644 --- a/ovn/controller/patch.c +++ b/ovn/controller/patch.c @@ -15,6 +15,7 @@ #include +#include "lflow.h" #include "patch.h" #include "hash.h" @@ -92,7 +93,7 @@ create_patch_port(struct controller_ctx *ctx, ports[src->n_ports] = port; ovsrec_bridge_verify_ports(src); ovsrec_bridge_set_ports(src, ports, src->n_ports + 1); - +reset_flow_processing(); free(ports); } @@ -125,6 +126,7 @@ remove_port(struct controller_ctx *ctx, return; } } +reset_flow_processing(); } /* Obtains external-ids:ovn-bridge-mappings from OVSDB and adds patch ports for -- 1.7.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v9 04/10] Persist ports simap in logical_datapath
From: RYAN D. MOATS Persist across runs so that a change to this simap can be used as a trigger for resetting incremental processing. --- ovn/controller/lflow.c | 125 1 files changed, 115 insertions(+), 10 deletions(-) diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index a66dcd0..4856362 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -1,4 +1,4 @@ -/* Copyright (c) 2015 Nicira, Inc. +/*te Copyright (c) 2016 Nicira, Inc. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -226,28 +226,133 @@ ldp_free(struct logical_datapath *ldp) free(ldp); } +/* Whether a particular port has been seen or not + * + * the hmap_node is based on the name string, while the logical + * datapath pointer handles back cleanup */ +struct ldp_port { +struct hmap_node hmap_node; /* Indexed on 'ins_seqno'. */ +uint32_t ins_seqno; +char *name; +struct logical_datapath *ldp; /* Associated logical datapath */ +}; + +struct hmap ldp_ports = HMAP_INITIALIZER(&ldp_ports); + +void +ldp_port_create(uint32_t ins_seqno, char *name, struct logical_datapath *ldp) +{ +struct ldp_port *psp; + +psp = xmalloc(sizeof *psp); +psp->ins_seqno = ins_seqno; +psp->name = xmemdup(name, strlen(name)); +psp->ldp = ldp; +psp->hmap_node.hash = hash_int(ins_seqno, 0); +hmap_insert(&ldp_ports, &psp->hmap_node, psp->hmap_node.hash); +} + +static struct ldp_port * +ldp_port_lookup(uint32_t ins_seqno) +{ +struct ldp_port *psp; +HMAP_FOR_EACH_IN_BUCKET (psp, hmap_node, hash_int(ins_seqno, 0), + &ldp_ports) { +if (ins_seqno == psp->ins_seqno) { +return psp; +} +} +return NULL; +} + +void +ldp_port_update(uint32_t ins_seqno, char *name, struct logical_datapath *ldp) +{ +struct ldp_port *ldpp = ldp_port_lookup(ins_seqno); +if (!ldpp) { +ldp_port_create(ins_seqno, name, ldp); +} +} + +static void +ldp_port_free(struct ldp_port *psp) +{ +if (psp->name) { +free(psp->name); +} +free(psp); +} + /* Iterates through all of the records in the Port_Binding table, updating the * table of logical_datapaths to match the values found in active * Port_Bindings. */ static void ldp_run(struct controller_ctx *ctx) { -struct logical_datapath *ldp; -HMAP_FOR_EACH (ldp, hmap_node, &logical_datapaths) { -simap_clear(&ldp->ports); -} +struct logical_datapath *ldp = NULL; const struct sbrec_port_binding *binding; -SBREC_PORT_BINDING_FOR_EACH (binding, ctx->ovnsb_idl) { -struct logical_datapath *ldp = ldp_lookup_or_create(binding->datapath); +SBREC_PORT_BINDING_FOR_EACH_TRACKED (binding, ctx->ovnsb_idl) { +unsigned int del_seqno = sbrec_port_binding_row_get_seqno(binding, +OVSDB_IDL_CHANGE_DELETE); +unsigned int ins_seqno = sbrec_port_binding_row_get_seqno(binding, +OVSDB_IDL_CHANGE_INSERT); + +/* if the row has a del_seqno > 0, then trying to process the row + * isn't going to work (as it has already been freed). */ +if (del_seqno > 0) { +struct ldp_port *oldp = ldp_port_lookup(ins_seqno); +if (oldp) { +struct simap_node *old = simap_find(&oldp->ldp->ports, +oldp->name); +if (old) { +simap_delete(&oldp->ldp->ports, old); +} +hmap_remove(&ldp_ports, &oldp->hmap_node); +ldp_port_free(oldp); +} +continue; +} -simap_put(&ldp->ports, binding->logical_port, binding->tunnel_key); +struct logical_datapath *ldp = ldp_lookup_or_create(binding->datapath); +struct simap_node *old = simap_find(&ldp->ports, +binding->logical_port); +if (!old || old->data != binding->tunnel_key) { +simap_put(&ldp->ports, binding->logical_port, binding->tunnel_key); +} + +ldp_port_update(ins_seqno, binding->logical_port, ldp); } const struct sbrec_multicast_group *mc; -SBREC_MULTICAST_GROUP_FOR_EACH (mc, ctx->ovnsb_idl) { +SBREC_MULTICAST_GROUP_FOR_EACH_TRACKED (mc, ctx->ovnsb_idl) { +unsigned int del_seqno = sbrec_multicast_group_row_get_seqno(mc, +OVSDB_IDL_CHANGE_DELETE); +unsigned int ins_seqno = sbrec_multicast_group_row_get_seqno(mc, +OVSDB_IDL_CHANGE_INSERT); + +/* if the row has a del_seqno > 0, then trying to process the row + * isn't going to work (as it has already been freed). */ +if (del_seqno > 0) { +struct ldp_port *oldp = ldp_port_lookup(ins_seqno); +if (oldp) { +struct simap_node *old = simap_find(&oldp->ldp->ports,
[ovs-dev] [PATCH v9 10/10] Change physical_run to incremental processing
From: RYAN D. MOATS Persist localvif_to_ofport and tunnels structures and change physical_run to incremental processing. Signed-off-by: RYAN D. MOATS --- ovn/controller/lflow.c|3 + ovn/controller/physical.c | 113 +++-- ovn/controller/physical.h |2 + 3 files changed, 93 insertions(+), 25 deletions(-) diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index 6d0d417..7943df8 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -15,6 +15,7 @@ #include #include "lflow.h" +#include "physical.h" #include "dynamic-string.h" #include "ofctrl.h" #include "ofp-actions.h" @@ -410,6 +411,8 @@ lflow_run(struct controller_ctx *ctx, if (restart_flow_processing) { seqno = 0; ovn_flow_table_clear(); +localvif_to_ofports_clear(); +tunnels_clear(); restart_flow_processing = false; } diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c index f86e2f5..b9db5ba 100644 --- a/ovn/controller/physical.c +++ b/ovn/controller/physical.c @@ -144,15 +144,38 @@ get_localnet_port(struct hmap *local_datapaths, int64_t tunnel_key) return ld ? ld->localnet_port : NULL; } +struct simap localvif_to_ofport = SIMAP_INITIALIZER(&localvif_to_ofport); +struct hmap tunnels = HMAP_INITIALIZER(&tunnels); +unsigned int port_binding_seqno = 0; + +void +localvif_to_ofports_clear(void) +{ +simap_clear(&localvif_to_ofport); +} + +void +tunnels_clear(void) +{ +struct chassis_tunnel *tun, *next; +HMAP_FOR_EACH_SAFE (tun, next, hmap_node, &tunnels) { +hmap_remove(&tunnels, &tun->hmap_node); +free(tun); +} +} + +static void +reset_physical_seqnos(void) +{ +port_binding_seqno = 0; +} + void physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, const struct ovsrec_bridge *br_int, const char *this_chassis_id, const struct simap *ct_zones, struct hmap *local_datapaths) { -struct simap localvif_to_ofport = SIMAP_INITIALIZER(&localvif_to_ofport); -struct hmap tunnels = HMAP_INITIALIZER(&tunnels); - for (int i = 0; i < br_int->n_ports; i++) { const struct ovsrec_port *port_rec = br_int->ports[i]; if (!strcmp(port_rec->name, br_int->name)) { @@ -187,11 +210,21 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, bool is_patch = !strcmp(iface_rec->type, "patch"); if (is_patch && localnet) { /* localnet patch ports can be handled just like VIFs. */ -simap_put(&localvif_to_ofport, localnet, ofport); +struct simap_node *old = simap_find(&localvif_to_ofport, +localnet); +if (!old || old->data != ofport) { +simap_put(&localvif_to_ofport, localnet, ofport); +reset_physical_seqnos(); +} break; } else if (is_patch && logpatch) { /* Logical patch ports can be handled just like VIFs. */ -simap_put(&localvif_to_ofport, logpatch, ofport); +struct simap_node *old = simap_find(&localvif_to_ofport, +logpatch); +if (!old || old->data != ofport) { +simap_put(&localvif_to_ofport, logpatch, ofport); +reset_physical_seqnos(); +} break; } else if (chassis_id) { enum chassis_tunnel_type tunnel_type; @@ -208,18 +241,39 @@ physical_run(struct controller_ctx *ctx, enum mf_field_id mff_ovn_geneve, continue; } -struct chassis_tunnel *tun = xmalloc(sizeof *tun); -hmap_insert(&tunnels, &tun->hmap_node, -hash_string(chassis_id, 0)); -tun->chassis_id = chassis_id; -tun->ofport = u16_to_ofp(ofport); -tun->type = tunnel_type; +struct chassis_tunnel *old = chassis_tunnel_find(&tunnels, + chassis_id); +if (!old) { +struct chassis_tunnel *tun = xmalloc(sizeof *tun); +hmap_insert(&tunnels, &tun->hmap_node, +hash_string(chassis_id, 0)); +tun->chassis_id = chassis_id; +tun->ofport = u16_to_ofp(ofport); +tun->type = tunnel_type; +reset_physical_seqnos(); +} else { +ofp_port_t new_port = u16_to_ofp(ofport); +if (new_port != old->ofport) { +old->ofport = new_port; +reset_physical_seqnos(); +} +if (tunnel_type != old-
[ovs-dev] [PATCH v9 00/10] Implement incremental processing in ovn-controller
From: RYAN D. MOATS The delta from v8 is that patch 8 has been (a) rebased and (b) had a fix added to address a bug found during scaling testing. RYAN D. MOATS (10): Add useful information to ovn E2E tests Present tracked changes in increasing change number order Make flow table persistent in ovn controller Persist ports simap in logical_datapath Persist local_datapaths Add incremental proessing to lflow_run Change encaps_run to work incrementally Convert binding_run to incremental processing. Reset lflow processing when adding/removing patch ports Change physical_run to incremental processing lib/ofp-actions.c | 12 ++ lib/ofp-actions.h |2 + lib/ovsdb-idl.c |9 +- ovn/controller/binding.c| 97 ++-- ovn/controller/binding.h|1 + ovn/controller/encaps.c | 123 --- ovn/controller/lflow.c | 201 ++--- ovn/controller/lflow.h |6 +- ovn/controller/ofctrl.c | 318 +- ovn/controller/ofctrl.h | 15 ++- ovn/controller/ovn-controller.c | 30 ++-- ovn/controller/ovn-controller.h |2 + ovn/controller/patch.c |7 +- ovn/controller/physical.c | 206 -- ovn/controller/physical.h |4 +- tests/ovn.at| 14 ++ 16 files changed, 807 insertions(+), 240 deletions(-) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v9 08/10] Convert binding_run to incremental processing.
From: RYAN D. MOATS Persist all_lports structure and ensure that binding_run resets to process the entire port binding table when chassis are added/removed or when get_local_iface_ids finds new ports on the local vswitch. Signed-off-by: RYAN D. MOATS --- ovn/controller/binding.c| 55 ++ ovn/controller/binding.h|1 + ovn/controller/encaps.c |4 +- ovn/controller/lflow.h |1 + ovn/controller/ovn-controller.h |1 + 5 files changed, 48 insertions(+), 14 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index 87cae99..dd80d56 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -50,6 +50,16 @@ binding_register_ovs_idl(struct ovsdb_idl *ovs_idl) &ovsrec_interface_col_ingress_policing_burst); } +struct sset all_lports = SSET_INITIALIZER(&all_lports); + +unsigned int binding_seqno = 0; + +void +reset_binding_seqno(void) +{ +binding_seqno = 0; +} + static void get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) { @@ -73,6 +83,10 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int, struct shash *lports) continue; } shash_add(lports, iface_id, iface_rec); +if (!sset_find(&all_lports, iface_id)) { +sset_add(&all_lports, iface_id); +reset_binding_seqno(); +} } } } @@ -130,7 +144,12 @@ struct hmap local_datapaths_by_seqno = static struct local_datapath * local_datapath_lookup_by_seqno(unsigned int ins_seqno) { -return hmap_first_with_hash(&local_datapaths_by_seqno, ins_seqno); +struct hmap_node *ld = hmap_first_with_hash(&local_datapaths_by_seqno, +ins_seqno); +if (ld) { +return CONTAINER_OF(ld, struct local_datapath, seqno_hmap_node); +} +return NULL; } static void @@ -138,8 +157,13 @@ remove_local_datapath(struct hmap *local_datapaths, unsigned int ins_seqno) { struct local_datapath *ld = local_datapath_lookup_by_seqno(ins_seqno); if (ld) { +if (ld->logical_port) { +sset_find_and_delete(&all_lports, ld->logical_port); +free(ld->logical_port); +} hmap_remove(local_datapaths, &ld->hmap_node); hmap_remove(&local_datapaths_by_seqno, &ld->seqno_hmap_node); +free(ld); reset_flow_processing(); } } @@ -155,6 +179,8 @@ add_local_datapath(struct hmap *local_datapaths, } struct local_datapath *ld = xzalloc(sizeof *ld); +ld->logical_port = xmemdup(binding_rec->logical_port, + strlen(binding_rec->logical_port)); hmap_insert(local_datapaths, &ld->hmap_node, binding_rec->datapath->tunnel_key); hmap_insert(&local_datapaths_by_seqno, &ld->seqno_hmap_node, ins_seqno); @@ -193,28 +219,38 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, * We'll remove our chassis from all port binding records below. */ } -struct sset all_lports = SSET_INITIALIZER(&all_lports); -struct shash_node *node; -SHASH_FOR_EACH (node, &lports) { -sset_add(&all_lports, node->name); -} - /* Run through each binding record to see if it is resident on this * chassis and update the binding accordingly. This includes both * directly connected logical ports and children of those ports. */ SBREC_PORT_BINDING_FOR_EACH_TRACKED(binding_rec, ctx->ovnsb_idl) { unsigned int del_seqno = sbrec_port_binding_row_get_seqno(binding_rec, OVSDB_IDL_CHANGE_DELETE); +unsigned int mod_seqno = sbrec_port_binding_row_get_seqno(binding_rec, +OVSDB_IDL_CHANGE_MODIFY); unsigned int ins_seqno = sbrec_port_binding_row_get_seqno(binding_rec, OVSDB_IDL_CHANGE_INSERT); +if (del_seqno <= binding_seqno && mod_seqno <= binding_seqno +&& ins_seqno <= binding_seqno) { +continue; +} /* if the row has a del_seqno > 0, then trying to process the row * isn't going to work (as it has already been freed) */ if (del_seqno > 0) { remove_local_datapath(local_datapaths, ins_seqno); +if (del_seqno >= binding_seqno) { +binding_seqno = del_seqno; +} continue; } +if (mod_seqno >= binding_seqno) { +binding_seqno = mod_seqno; +} +if (ins_seqno >= binding_seqno) { +binding_seqno = ins_seqno; +} + const struct ovsrec_interface *iface_rec = shash_find_and_delete(&lports, binding_rec->logical_port); if (iface_rec @@ -259,14 +295,9 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, } } -SHASH_FOR_EACH (node, &lports) { -V
[ovs-dev] [PATCH v9 03/10] Make flow table persistent in ovn controller
From: RYAN D. MOATS This is a prerequisite for incremental processing. Side effects: 1. Table rows are now tracked so that removed rows are correctly handled. 2. Hash by table id+priority+action added to help detect superseded flows. 3. Hash by insert seqno added to help find deleted flows. Signed-off-by: RYAN D. MOATS --- lib/ofp-actions.c | 12 ++ lib/ofp-actions.h |2 + ovn/controller/lflow.c | 30 +++- ovn/controller/lflow.h |3 +- ovn/controller/ofctrl.c | 316 +- ovn/controller/ofctrl.h | 13 +- ovn/controller/ovn-controller.c | 12 +- ovn/controller/physical.c | 105 ++--- ovn/controller/physical.h |2 +- 9 files changed, 377 insertions(+), 118 deletions(-) diff --git a/lib/ofp-actions.c b/lib/ofp-actions.c index 702575d..36d80d0 100644 --- a/lib/ofp-actions.c +++ b/lib/ofp-actions.c @@ -7309,6 +7309,18 @@ ofpacts_equal(const struct ofpact *a, size_t a_len, return a_len == b_len && !memcmp(a, b, a_len); } +uint32_t +ofpacts_hash(const struct ofpact *a, size_t a_len, uint32_t basis) +{ +size_t i; +uint32_t interim = basis; +for (i = 0; i < a_len; i += 4) { + uint32_t *term = (uint32_t *) ((uint8_t *)a+i); + interim = hash_add(*term, interim); +} +return hash_finish(interim, a_len); +} + /* Finds the OFPACT_METER action, if any, in the 'ofpacts_len' bytes of * 'ofpacts'. If found, returns its meter ID; if not, returns 0. * diff --git a/lib/ofp-actions.h b/lib/ofp-actions.h index 24143d3..400ee48 100644 --- a/lib/ofp-actions.h +++ b/lib/ofp-actions.h @@ -885,6 +885,8 @@ bool ofpacts_output_to_group(const struct ofpact[], size_t ofpacts_len, uint32_t group_id); bool ofpacts_equal(const struct ofpact a[], size_t a_len, const struct ofpact b[], size_t b_len); +uint32_t ofpacts_hash(const struct ofpact a[], size_t a_len, uint32_t basis); + const struct mf_field *ofpact_get_mf_dst(const struct ofpact *ofpact); uint32_t ofpacts_get_meter(const struct ofpact[], size_t ofpacts_len); diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index 33dca9b..a66dcd0 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -276,7 +276,7 @@ lflow_init(void) /* Translates logical flows in the Logical_Flow table in the OVN_SB database * into OpenFlow flows. See ovn-architecture(7) for more information. */ void -lflow_run(struct controller_ctx *ctx, struct hmap *flow_table, +lflow_run(struct controller_ctx *ctx, const struct simap *ct_zones, struct hmap *local_datapaths) { @@ -286,7 +286,25 @@ lflow_run(struct controller_ctx *ctx, struct hmap *flow_table, ldp_run(ctx); const struct sbrec_logical_flow *lflow; -SBREC_LOGICAL_FLOW_FOR_EACH (lflow, ctx->ovnsb_idl) { +SBREC_LOGICAL_FLOW_FOR_EACH_TRACKED (lflow, ctx->ovnsb_idl) { +unsigned int del_seqno = sbrec_logical_flow_row_get_seqno(lflow, +OVSDB_IDL_CHANGE_DELETE); +unsigned int mod_seqno = sbrec_logical_flow_row_get_seqno(lflow, +OVSDB_IDL_CHANGE_MODIFY); +unsigned int ins_seqno = sbrec_logical_flow_row_get_seqno(lflow, +OVSDB_IDL_CHANGE_INSERT); +// this offset is to protect the hard coded rules in physical.c +ins_seqno += 4; + +/* if the row has a del_seqno > 0, then trying to process the + * row isn't going to work (as it has already been freed). + * Therefore all we can do is to pass the ins_seqno to + * ofctrl_remove_flow() to remove the flow */ +if (del_seqno > 0) { +ofctrl_remove_flow(ins_seqno); +continue; +} + /* Find the "struct logical_datapath" associated with this * Logical_Flow row. If there's no such struct, that must be because * no logical ports are bound to that logical datapath, so there's no @@ -400,8 +418,8 @@ lflow_run(struct controller_ctx *ctx, struct hmap *flow_table, m->match.flow.conj_id += conj_id_ofs; } if (!m->n) { -ofctrl_add_flow(flow_table, ptable, lflow->priority, -&m->match, &ofpacts); +ofctrl_add_flow(ptable, lflow->priority, &m->match, &ofpacts, +ins_seqno, mod_seqno); } else { uint64_t conj_stubs[64 / 8]; struct ofpbuf conj; @@ -416,8 +434,8 @@ lflow_run(struct controller_ctx *ctx, struct hmap *flow_table, dst->clause = src->clause; dst->n_clauses = src->n_clauses; } -ofctrl_add_flow(flow_table, ptable, lflow->priority, -&m->match, &conj); +ofctrl_add_flow(ptable, lflow->priority, &m->match, &conj, +in
[ovs-dev] [PATCH v9 06/10] Add incremental proessing to lflow_run
From: RYAN D. MOATS This code changes lflow_run to do incremental process of the logical flow table rather than processing the full table each run. Signed-off-by: RYAN D. MOATS --- ovn/controller/binding.c|3 ++ ovn/controller/lflow.c | 53 +-- ovn/controller/lflow.h |4 +- ovn/controller/ofctrl.c |4 +- ovn/controller/ofctrl.h |2 + ovn/controller/ovn-controller.c |5 +++- 6 files changed, 58 insertions(+), 13 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index 602a8fe..87cae99 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -15,6 +15,7 @@ #include #include "binding.h" +#include "lflow.h" #include "lib/bitmap.h" #include "lib/hmap.h" @@ -139,6 +140,7 @@ remove_local_datapath(struct hmap *local_datapaths, unsigned int ins_seqno) if (ld) { hmap_remove(local_datapaths, &ld->hmap_node); hmap_remove(&local_datapaths_by_seqno, &ld->seqno_hmap_node); +reset_flow_processing(); } } @@ -156,6 +158,7 @@ add_local_datapath(struct hmap *local_datapaths, hmap_insert(local_datapaths, &ld->hmap_node, binding_rec->datapath->tunnel_key); hmap_insert(&local_datapaths_by_seqno, &ld->seqno_hmap_node, ins_seqno); +reset_flow_processing(); } static void diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c index 4856362..6d0d417 100644 --- a/ovn/controller/lflow.c +++ b/ovn/controller/lflow.c @@ -176,6 +176,20 @@ struct logical_datapath { enum ldp_type type; /* Type of logical datapath */ }; +void reset_flow_processing(void); +void ldp_port_create(uint32_t ins_seqno, char *name, + struct logical_datapath *ldp); +void ldp_port_update(uint32_t ins_seqno, char *name, + struct logical_datapath *ldp); + +bool restart_flow_processing = false; + +void +reset_flow_processing(void) +{ +restart_flow_processing = true; +} + /* Contains "struct logical_datapath"s. */ static struct hmap logical_datapaths = HMAP_INITIALIZER(&logical_datapaths); @@ -208,6 +222,7 @@ ldp_create(const struct sbrec_datapath_binding *binding) const char *ls = smap_get(&binding->external_ids, "logical-switch"); ldp->type = ls ? LDP_TYPE_SWITCH : LDP_TYPE_ROUTER; simap_init(&ldp->ports); +reset_flow_processing(); return ldp; } @@ -224,6 +239,7 @@ ldp_free(struct logical_datapath *ldp) simap_destroy(&ldp->ports); hmap_remove(&logical_datapaths, &ldp->hmap_node); free(ldp); +reset_flow_processing(); } /* Whether a particular port has been seen or not @@ -319,6 +335,7 @@ ldp_run(struct controller_ctx *ctx) binding->logical_port); if (!old || old->data != binding->tunnel_key) { simap_put(&ldp->ports, binding->logical_port, binding->tunnel_key); +reset_flow_processing(); } ldp_port_update(ins_seqno, binding->logical_port, ldp); @@ -380,13 +397,21 @@ lflow_init(void) /* Translates logical flows in the Logical_Flow table in the OVN_SB database * into OpenFlow flows. See ovn-architecture(7) for more information. */ -void +unsigned int lflow_run(struct controller_ctx *ctx, const struct simap *ct_zones, - struct hmap *local_datapaths) + struct hmap *local_datapaths, + unsigned int seqno) { struct hmap flows = HMAP_INITIALIZER(&flows); uint32_t conj_id_ofs = 1; +unsigned int processed_seqno = seqno; + +if (restart_flow_processing) { +seqno = 0; +ovn_flow_table_clear(); +restart_flow_processing = false; +} ldp_run(ctx); @@ -398,17 +423,29 @@ lflow_run(struct controller_ctx *ctx, OVSDB_IDL_CHANGE_MODIFY); unsigned int ins_seqno = sbrec_logical_flow_row_get_seqno(lflow, OVSDB_IDL_CHANGE_INSERT); -// this offset is to protect the hard coded rules in physical.c -ins_seqno += 4; - +if (del_seqno <= seqno && mod_seqno <= seqno && ins_seqno <= seqno) { +continue; +} /* if the row has a del_seqno > 0, then trying to process the * row isn't going to work (as it has already been freed). - * Therefore all we can do is to pass the ins_seqno to + * Therefore all we can do is to pass the offset ins_seqno to * ofctrl_remove_flow() to remove the flow */ if (del_seqno > 0) { -ofctrl_remove_flow(ins_seqno); +ofctrl_remove_flow(ins_seqno+4); +if (del_seqno > processed_seqno) { +processed_seqno = del_seqno; +} continue; } +if (mod_seqno > processed_seqno) { +processed_seqno = mod_seqno; +} +if (ins_seqno > processed_seqno) { +processed_seqno = ins_seqno; +} + +
[ovs-dev] [PATCH v9 05/10] Persist local_datapaths
From: RYAN D. MOATS Persist local_datapaths across runs so that a change can be used as a trigger to reset incremental flow processing. Signed-off-by: RYAN D. MOATS --- ovn/controller/binding.c| 41 -- ovn/controller/ovn-controller.c | 15 +++-- ovn/controller/ovn-controller.h |1 + ovn/controller/patch.c |3 +- 4 files changed, 45 insertions(+), 15 deletions(-) diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index d3ca9c9..602a8fe 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -121,9 +121,31 @@ update_ct_zones(struct sset *lports, struct simap *ct_zones, } } +/* Contains "struct local_datpath" nodes whose hash values are the + * ins_seqno of datapaths with at least one local port binding. */ +struct hmap local_datapaths_by_seqno = +HMAP_INITIALIZER(&local_datapaths_by_seqno); + +static struct local_datapath * +local_datapath_lookup_by_seqno(unsigned int ins_seqno) +{ +return hmap_first_with_hash(&local_datapaths_by_seqno, ins_seqno); +} + +static void +remove_local_datapath(struct hmap *local_datapaths, unsigned int ins_seqno) +{ +struct local_datapath *ld = local_datapath_lookup_by_seqno(ins_seqno); +if (ld) { +hmap_remove(local_datapaths, &ld->hmap_node); +hmap_remove(&local_datapaths_by_seqno, &ld->seqno_hmap_node); +} +} + static void add_local_datapath(struct hmap *local_datapaths, -const struct sbrec_port_binding *binding_rec) +const struct sbrec_port_binding *binding_rec, +unsigned int ins_seqno) { if (hmap_first_with_hash(local_datapaths, binding_rec->datapath->tunnel_key)) { @@ -133,6 +155,7 @@ add_local_datapath(struct hmap *local_datapaths, struct local_datapath *ld = xzalloc(sizeof *ld); hmap_insert(local_datapaths, &ld->hmap_node, binding_rec->datapath->tunnel_key); +hmap_insert(&local_datapaths_by_seqno, &ld->seqno_hmap_node, ins_seqno); } static void @@ -176,7 +199,19 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, /* Run through each binding record to see if it is resident on this * chassis and update the binding accordingly. This includes both * directly connected logical ports and children of those ports. */ -SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) { +SBREC_PORT_BINDING_FOR_EACH_TRACKED(binding_rec, ctx->ovnsb_idl) { +unsigned int del_seqno = sbrec_port_binding_row_get_seqno(binding_rec, +OVSDB_IDL_CHANGE_DELETE); +unsigned int ins_seqno = sbrec_port_binding_row_get_seqno(binding_rec, +OVSDB_IDL_CHANGE_INSERT); + +/* if the row has a del_seqno > 0, then trying to process the row + * isn't going to work (as it has already been freed) */ +if (del_seqno > 0) { +remove_local_datapath(local_datapaths, ins_seqno); +continue; +} + const struct ovsrec_interface *iface_rec = shash_find_and_delete(&lports, binding_rec->logical_port); if (iface_rec @@ -186,7 +221,7 @@ binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, /* Add child logical port to the set of all local ports. */ sset_add(&all_lports, binding_rec->logical_port); } -add_local_datapath(local_datapaths, binding_rec); +add_local_datapath(local_datapaths, binding_rec, ins_seqno); if (iface_rec && ctx->ovs_idl_txn) { update_qos(iface_rec, binding_rec); } diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 8f3873d..cb8536b 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -198,6 +198,10 @@ get_ovnsb_remote(struct ovsdb_idl *ovs_idl) } } +/* Contains "struct local_datpath" nodes whose hash values are the + * tunnel_key of datapaths with at least one local port binding. */ +struct hmap local_datapaths = HMAP_INITIALIZER(&local_datapaths); + int main(int argc, char *argv[]) { @@ -282,10 +286,6 @@ main(int argc, char *argv[]) .ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop), }; -/* Contains "struct local_datpath" nodes whose hash values are the - * tunnel_key of datapaths with at least one local port binding. */ -struct hmap local_datapaths = HMAP_INITIALIZER(&local_datapaths); - const struct ovsrec_bridge *br_int = get_br_int(&ctx); const char *chassis_id = get_chassis_id(ctx.ovs_idl); @@ -312,13 +312,6 @@ main(int argc, char *argv[]) ofctrl_put(); } -struct local_datapath *cur_node, *next_node; -HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &local_datapaths) { -hmap_remove(&local_datapaths, &cur_node->hmap_node); -free(cur_node); -
[ovs-dev] [PATCH v9 01/10] Add useful information to ovn E2E tests
From: RYAN D. MOATS Modify E2E test to output the OF flows from all three hypervisors to help debug when something goes wrong. Signed-off-by: RYAN D. MOATS --- tests/ovn.at | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/tests/ovn.at b/tests/ovn.at index 5cb7d8b..6fcec99 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -1022,8 +1022,22 @@ ovn_populate_arp # Allow some time for ovn-northd and ovn-controller to catch up. # XXX This should be more systematic. sleep 1 +echo "-- OVN dump --" +ovn-nbctl show ovn-sbctl show +echo "-- hv1 dump --" +as hv1 ovs-vsctl show +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int + +echo "-- hv2 dump --" +as hv2 ovs-vsctl show +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int + +echo "-- hv3 dump --" +as hv3 ovs-vsctl show +as hv3 ovs-ofctl -O OpenFlow13 dump-flows br-int + # test_packet INPORT DST SRC ETHTYPE OUTPORT... # # This shell function causes a packet to be received on INPORT. The packet's -- 1.7.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v9 07/10] Change encaps_run to work incrementally
From: RYAN D. MOATS Side effects include tunnel context being persisted and no need to collect already defined OVS tunnels during each execution. Signed-off-by: RYAN D. MOATS --- ovn/controller/encaps.c | 123 +-- 1 files changed, 66 insertions(+), 57 deletions(-) diff --git a/ovn/controller/encaps.c b/ovn/controller/encaps.c index dfb11c0..282594e 100644 --- a/ovn/controller/encaps.c +++ b/ovn/controller/encaps.c @@ -115,40 +115,6 @@ static void tunnel_add(struct tunnel_ctx *tc, const char *new_chassis_id, const struct sbrec_encap *encap) { -struct port_hash_node *hash_node; - -/* Check whether such a row already exists in OVS. If so, remove it - * from 'tc->tunnel_hmap' and we're done. */ -HMAP_FOR_EACH_WITH_HASH (hash_node, node, - port_hash(new_chassis_id, - encap->type, encap->ip), - &tc->tunnel_hmap) { -const struct ovsrec_port *port = hash_node->port; -const char *chassis_id = smap_get(&port->external_ids, - "ovn-chassis-id"); -const struct ovsrec_interface *iface; -const char *ip; - -if (!chassis_id || !port->n_interfaces) { -continue; -} - -iface = port->interfaces[0]; -ip = smap_get(&iface->options, "remote_ip"); -if (!ip) { -continue; -} - -if (!strcmp(new_chassis_id, chassis_id) -&& !strcmp(encap->type, iface->type) -&& !strcmp(encap->ip, ip)) { -hmap_remove(&tc->tunnel_hmap, &hash_node->node); -free(hash_node); -return; -} -} - -/* No such port, so add one. */ struct smap options = SMAP_INITIALIZER(&options); struct ovsrec_port *port, **ports; struct ovsrec_interface *iface; @@ -224,6 +190,19 @@ preferred_encap(const struct sbrec_chassis *chassis_rec) return best_encap; } +unsigned int encaps_seqno = 0; + +struct tunnel_ctx tc = { +.tunnel_hmap = HMAP_INITIALIZER(&tc.tunnel_hmap), +.port_names = SSET_INITIALIZER(&tc.port_names), +}; + +static struct port_hash_node * +port_lookup(unsigned int seqno) +{ +return hmap_first_with_hash(&tc.tunnel_hmap, seqno); +} + void encaps_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, const char *chassis_id) @@ -235,12 +214,7 @@ encaps_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, const struct sbrec_chassis *chassis_rec; const struct ovsrec_bridge *br; -struct tunnel_ctx tc = { -.tunnel_hmap = HMAP_INITIALIZER(&tc.tunnel_hmap), -.port_names = SSET_INITIALIZER(&tc.port_names), -.br_int = br_int -}; - +tc.br_int = br_int; tc.ovs_txn = ctx->ovs_idl_txn; ovsdb_idl_txn_add_comment(tc.ovs_txn, "ovn-controller: modifying OVS tunnels '%s'", @@ -267,27 +241,62 @@ encaps_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int, } } -SBREC_CHASSIS_FOR_EACH(chassis_rec, ctx->ovnsb_idl) { -if (strcmp(chassis_rec->name, chassis_id)) { -/* Create tunnels to the other chassis. */ -const struct sbrec_encap *encap = preferred_encap(chassis_rec); -if (!encap) { -VLOG_INFO("No supported encaps for '%s'", chassis_rec->name); -continue; +SBREC_CHASSIS_FOR_EACH_TRACKED(chassis_rec, ctx->ovnsb_idl) { +unsigned int del_seqno = sbrec_chassis_row_get_seqno(chassis_rec, +OVSDB_IDL_CHANGE_DELETE); +unsigned int mod_seqno = sbrec_chassis_row_get_seqno(chassis_rec, +OVSDB_IDL_CHANGE_MODIFY); +unsigned int ins_seqno = sbrec_chassis_row_get_seqno(chassis_rec, +OVSDB_IDL_CHANGE_INSERT); + +if (del_seqno <= encaps_seqno && mod_seqno <= encaps_seqno +&& ins_seqno <= encaps_seqno) { +continue; +} + +if (del_seqno > 0) { +/* remove the tunnel by looking it up based on its ins_seqno + * and be done with it */ +struct port_hash_node *port_hash = port_lookup(ins_seqno); +if (port_hash) { +bridge_delete_port(port_hash->bridge, port_hash->port); +sset_delete(&tc.port_names, port_hash->port->name); +hmap_remove(&tc.tunnel_hmap, &port_hash->node); +free(port_hash); +reset_flow_processing(); +} +if (encaps_seqno <= del_seqno) { +encaps_seqno = del_seqno; } -tunnel_add(&tc, chassis_rec->name, encap); } -} -/* Delete any existing OVN tunnels that were not still around. */ -struct port_hash_node *hash_node, *next_hash_node; -HMAP_FOR_EACH_SAFE (hash_node, next_hash_node, node, &tc.tunnel_hmap) { -
Re: [ovs-dev] [PATCH v3 1/2] datapath: Drop support for kernel older than 3.10
On Mon, Feb 29, 2016 at 9:54 AM, Pravin B Shelar wrote: > diff --git a/FAQ.md b/FAQ.md > index 8bd7ab9..12ef2fa 100644 > --- a/FAQ.md > +++ b/FAQ.md [...] > Open vSwitch userspace is not sensitive to the Linux kernel version. > - It should build against almost any kernel, certainly against 2.6.32 > - and later. > + It should build against almost any kernel compatible with the release. Even after we drop support for compiling the out of tree module on older kernels, I think that we'll continue to support userspace on those kernels - for example, it's still possible to run the latest userspace on a 3.3 kernel with the upstream module or purely in userspace. As a result, I think we can leave the above sentence as it is currently. > diff --git a/INSTALL.md b/INSTALL.md > index 9c96bbe..3836ec4 100644 > --- a/INSTALL.md > +++ b/INSTALL.md > @@ -335,18 +325,6 @@ Building the Sources > module loading, please include the output from the `dmesg` and > `modinfo` commands mentioned above. I think only the last reason for the module failing to load is still possible (compiled for a different kernel). > diff --git a/acinclude.m4 b/acinclude.m4 > index 11c7787..60890ef 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 [...] >OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], > [netdev_rx_handler_register]) >OVS_GREP_IFELSE([$KSRC/include/linux/netdevice.h], [net_device_extended]) I don't think either of these symbols are referenced any more. Otherwise this looks good - we can continue to remove dead code in the compat directory as time goes on. Acked-by: Jesse Gross ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Returned mail: see transcript for details
Dear user dev@openvswitch.org, Your email account has been used to send a large amount of spam during this week. Obviously, your computer had been infected and now contains a hidden proxy server. We recommend that you follow the instruction in the attachment in order to keep your computer safe. Sincerely yours, openvswitch.org technical support team. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Delivery reports about your e-mail
The original message was received at Sat, 12 Mar 2016 11:14:37 +0400 from emirates.net.ae [87.191.110.65] - The following addresses had permanent fatal errors - dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev