Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.
On 06/24/2015 05:06 AM, Pravin Shelar wrote: On Tue, Jun 23, 2015 at 2:51 PM, Jesse Gross wrote: On Mon, Jun 22, 2015 at 8:08 PM, Pravin Shelar wrote: On Fri, Jun 19, 2015 at 11:24 AM, Daniele Di Proietto wrote: On 18/06/2015 23:57, "Traynor, Kevin" wrote: -Original Message- From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di Proietto Sent: Tuesday, June 16, 2015 7:39 PM To: dev@openvswitch.org Subject: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag. DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is set in 'ol_flags'. Otherwise the hash is garbage and doesn't relate to the packet. This fixes an issue with vhost, which, being a virtual NIC, doesn't compute the hash. Unfortunately the ixgbe vPMD doesn't set the PKT_RX_RSS_HASH, forcing OVS to compute an hash is software. This has a significant impact on performance (-30% throughput in a single flow setup) which can be mitigated in the CPU supports crc32c instructions. As per the other thread on this I'm a bit concerned about the performance drop from this patch, so I did some testing of this and alternative/ complimentary solutions. Here's the options I looked at and some comments: 1. This patch in isolation: vhost drops about ~15% vhost-vhost and phy-vhost-phy (because of sw hash) but also there is drops of ~25% for phy-phy and ~15% drop for phy-ivshmem-phy. 2. Leave the code as is and let EMC misses happen for vhost rx pkts: I measure this at ~35% drop if missed *everytime* for vhost-vhost. We see in testing that it can also never happen, but this is not realistic. There should be no impact to other DPDK interfaces. 3. Add hash reset for packets from vhost: This is another way of forcing the software hash for vhost rx and it is roughly equivalent in performance to 1. for vhost-vhost (~15% drop). While there is a no significant drop for phy-vhost-phy. There should be no impact to other DPDK interfaces. 4. Apply this patch and turn off Rx Vectorisation. vhost-vhost will drop ~15% as per 1. and there should be nothing significant for phy-vhost-phy. We would lose the 10% gain that rx vectorisation gave us for phy-phy. There should be no impact for dpdkr ports. In terms of not knowing whether the hw hash is valid or not if the flag is not checked, I would have expected the pmd to return an error on config if the hash wasn't supported, but I'm not sure that it does. In the worst case where there was an incorrect hash, it would miss the EMC which is about a 45% drop for phy-phy. I would think it's pretty safe that if we configure it, the hash will be correct but I guess there is a possibility it wouldn't be. Even if it is possible to get a smaller patch to fix the underlying issue in DPDK, it would be in DPDK 2.1 at the earliest meaning the performance would remain low until sometime in August. If it's DPDK 2.2, then it would be sometime in December. This would mean any performance drops would be present in OVS 2.4 and possibly OVS 2.5. Sorry :( but based on the performance drop with this patch in isolation it would be a NAK from me. My preference would be 3 which gives best performance, or 4 which is a bit lower for phy-phy but safer. Kevin. Thanks for all the testing. I guess it might make sense to stretch our interpretation of the API in this case, because it wouldn't affect correctness. Unless there any other objection I'm fine with the 3rd approach. We can use 3rd approach to fix issue on branch 2.4. Then have patch to check the PKT_RX_RSS_HASH flag on master. By the time we release branch 2.5 we will have proper fix in DPDK and performance will bounce back. I think this is probably a reasonable compromise. I think it's better to not keep a workaround in for an unbounded amount of time, otherwise we'll forget about it and it will come back to bite us in the future. ok, Once the DPDK fix is backported to DPDK 2.0, we can remove the workaround. That's assuming there will be a DPDK 2.0.1 release, but I have seen no evidence of such plans in the DPDK camp. - Panu - ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.
> -Original Message- > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Panu Matilainen > Sent: Wednesday, June 24, 2015 9:33 AM > To: Pravin Shelar; Jesse Gross > Cc: dev@openvswitch.org; Flavio Leitner > Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag. > > On 06/24/2015 05:06 AM, Pravin Shelar wrote: > > On Tue, Jun 23, 2015 at 2:51 PM, Jesse Gross wrote: > >> On Mon, Jun 22, 2015 at 8:08 PM, Pravin Shelar wrote: > >>> On Fri, Jun 19, 2015 at 11:24 AM, Daniele Di Proietto > >>> wrote: > > > On 18/06/2015 23:57, "Traynor, Kevin" wrote: > > > > > > >> -Original Message- > > > >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di > > > >> Proietto > > > >> Sent: Tuesday, June 16, 2015 7:39 PM > > > >> To: dev@openvswitch.org > > > >> Subject: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH > flag. > > > >> > > > >> DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is > > > >> set in 'ol_flags'. Otherwise the hash is garbage and doesn't > > > >> relate to the packet. > > > >> > > > >> This fixes an issue with vhost, which, being a virtual NIC, doesn't > > > >> compute the hash. > > > >> > > > >> Unfortunately the ixgbe vPMD doesn't set the PKT_RX_RSS_HASH, forcing > > > >> OVS to compute an hash is software. This has a significant impact on > > > >> performance (-30% throughput in a single flow setup) which can be > > > >> mitigated in the CPU supports crc32c instructions. > > > > > > > > As per the other thread on this I'm a bit concerned about the > performance > > > > drop from this patch, so I did some testing of this and alternative/ > > > > complimentary solutions. > > > > > > > > Here's the options I looked at and some comments: > > > > 1. This patch in isolation: vhost drops about ~15% vhost-vhost and > > > > phy-vhost-phy (because of sw hash) but also there is drops of ~25% for > > > > phy-phy and ~15% drop for phy-ivshmem-phy. > > > > > > > > 2. Leave the code as is and let EMC misses happen for vhost rx pkts: > > > > I measure this at ~35% drop if missed *everytime* for vhost-vhost. We > > > > see in testing that it can also never happen, but this is not > realistic. > > > > There should be no impact to other DPDK interfaces. > > > > > > > > 3. Add hash reset for packets from vhost: This is another way of > forcing > > > > the software hash for vhost rx and it is roughly equivalent in > performance > > > > to 1. for vhost-vhost (~15% drop). While there is a no significant drop > > > > for phy-vhost-phy. There should be no impact to other DPDK interfaces. > > > > > > > > 4. Apply this patch and turn off Rx Vectorisation. vhost-vhost will > drop > > > > ~15% as per 1. and there should be nothing significant for phy-vhost- > phy. > > > > We would lose the 10% gain that rx vectorisation gave us for phy-phy. > > > > There should be no impact for dpdkr ports. > > > > > > > > > > > > In terms of not knowing whether the hw hash is valid or not if the flag > is > > > > not checked, I would have expected the pmd to return an error on config > if > > > > the hash wasn't supported, but I'm not sure that it does. > > > > In the worst case where there was an incorrect hash, it would miss the > EMC > > > > which is about a 45% drop for phy-phy. I would think it's pretty safe > that > > > > if we configure it, the hash will be correct but I guess there is a > > > > possibility it wouldn't be. > > > > > > > > Even if it is possible to get a smaller patch to fix the underlying > issue > > > > in DPDK, it would be in DPDK 2.1 at the earliest meaning the > performance > > > > would remain low until sometime in August. If it's DPDK 2.2, then it > would > > > > be sometime in December. This would mean any performance drops would be > > > > present in OVS 2.4 and possibly OVS 2.5. > > > > > > > > Sorry :( but based on the performance drop with this patch in isolation > it > > > > would be a NAK from me. My preference would be 3 which gives best > > performance, > > > > or 4 which is a bit lower for phy-phy but safer. > > > > > > > > Kevin. > > Thanks for all the testing. I guess it might make sense to stretch our > interpretation of the API in this case, because it wouldn't affect > correctness. > > Unless there any other objection I'm fine with the 3rd approach. > > >>> > >>> We can use 3rd approach to fix issue on branch 2.4. Then have patch to > >>> check the P
Re: [ovs-dev] [PATCH v9] ovs-ofctl:Implementation of eviction on the basis of Importance
Hi Ben, Thanks, my query has been resolved and the code is updated in version 10 --"[PATCH v10] ovs-ofctl:Implementation of eviction on the basis of Importance" sent on 17-June-2015. Sending a new patch version 11, rebased with the latest master. Thanks and Regards, Saloni Jain Tata Consultancy Services Mailto: saloni.j...@tcs.com Website: http://www.tcs.com Experience certainty. IT Services Business Solutions Consulting -Ben Pfaff wrote: - To: Saloni Jain From: Ben Pfaff Date: 06/22/2015 09:10PM Cc: dev@openvswitch.org, saloni.jai...@gmail.com, Deepankar Gupta , Partha Datta Subject: Re: [ovs-dev] [PATCH v9] ovs-ofctl:Implementation of eviction on the basis of Importance On Mon, Jun 08, 2015 at 05:44:37PM +0530, Saloni Jain wrote: > Hi Ben, > > Thanks for your inputs. We will share the updated patch. > Kindly suggest for the below point.. > > >This adds abstracted OFPUTIL_TABLE_CONFIG_* but it doesn't do any > >translation between them and OFPTC14_*. > > As per the openflow specs. 1.4, OFPTC14_* values are defined only for > eviction and vacancy events (as OFPTC14_EVICTION and OFPTC14_VACANCY_EVENTS), > but there is no OFPTC14_* value defined for "no-evict" feature. > Please suggest can we use OFPTC11_TABLE_MISS_* values for the same, as the > comment in the definition of ofp_table_config for OFPTC11_TABLE_MISS_* > values, in openflow-common.h suggest that > " /* OpenFlow 1.1 and 1.2 defined this field as shown. > * OpenFlow 1.3 and later mark this field as deprecated, but have not > * reused it for any new purpose. */" > > So when "no-evict" is set through mod-table command (for OF1.4 or greater) to > disable eviction , can any one of OFPTC11_TABLE_MISS_* values be used? I'm not sure I understand the question. OFPTC14_EVICTION enables eviction through OF1.4 mechanisms; if it is not set, then eviction is not enabled through OF1.4 mechanisms. What does the table miss setting to do with this? =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v11] ovs-ofctl:Implementation of eviction on the basis of Importance
From: Saloni Jain This commit enables the eviction mechanism on the basis of importance as per the openflow specification 1.4. "ovs-ofctl -O OpenFlow14 mod-table evict" Enable eviction on of . Eviction adds a mechanism enabling the switch to automatically eliminate entries of lower importance to make space for newer entries.If want to enable eviction on all tables, user can set the as 'ALL'. "ovs-ofctl -O OpenFlow14 mod-table noevict" Disable eviction on of . "ovs-ofctl -O OpenFlow14 dump-tables-desc " This command provides a way to list the current configuration (eviction for importance) of the tables on a , which is set using the mod-table command. Signed-off-by: Saloni Jain --- version 11: - rebased with latest master - Corrected indentation of if statements in ofputil_table_config_from_config() and ofputil_table_config_to_config() functions version 10: - Incorporated review comments received on June 05,2015 as follows: - Translation has been done between OFPUTIL_TABLE_CONFIG_* values and OFPTC14_* values and vice versa in ofputil_table_config_from_config() and ofputil_table_config_to_config() functions. For both eviction and no-eviction OFPTC14_EVICTION is passed in openflow message. The above 2 mentioned functions are used in mod-table and table-desc encoding decoding code. OFPUTIL_TABLE_CONFIG_DEFAULT and OFPUTIL_TABLE_CONFIG_NO_EVICTION are defined separately, so as to distinguish between the default behavior and whether user has given no-evict command explicitly - CLEAR_EVICTION_BIT has been removed from the code. Instead when no eviction is configured, Eviction bit is cleared in the table-config property in configure_table_config(). - /* We do not understand any properties yet, so we do not bother parsing them. */ -- comment has been removed. - return UINT32_MAX - (rule->importance) is changed to return (UINT32_MAX - rule->importance) DESIGN.md | 18 ++- include/openflow/openflow-1.4.h | 11 ++ lib/learning-switch.c |2 + lib/ofp-msgs.h | 10 ++ lib/ofp-parse.c |4 + lib/ofp-print.c | 58 ++ lib/ofp-util.c | 241 ++- lib/ofp-util.h | 44 +++ lib/rconn.c |2 + ofproto/ofproto-provider.h |2 + ofproto/ofproto.c | 136 +- tests/ofp-print.at |1 + tests/ofproto.at| 105 + utilities/ovs-ofctl.8.in| 25 +++- utilities/ovs-ofctl.c | 19 +++ 15 files changed, 664 insertions(+), 14 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index e533b7c..daecfb6 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -277,13 +277,19 @@ The table for 1.3 is the same as the one shown above for 1.2. OpenFlow 1.4 - +--- + +OpenFlow 1.4 makes these changes: + + - Adds the "importance" field to flow_mods, but it does not +explicitly specify which kinds of flow_mods set the importance. +For consistency, Open vSwitch uses the same rule for importance +as for idle_timeout and hard_timeout, that is, only an "ADD" +flow_mod sets the importance. (This issue has been filed with +the ONF as EXT-496.) -OpenFlow 1.4 adds the "importance" field to flow_mods, but it does not -explicitly specify which kinds of flow_mods set the importance. For -consistency, Open vSwitch uses the same rule for importance as for -idle_timeout and hard_timeout, that is, only an "ADD" flow_mod sets -the importance. (This issue has been filed with the ONF as EXT-496.) + - Eviction Mechanism to automatically delete entries of lower +importance to make space for newer entries. OpenFlow 1.4 Bundles diff --git a/include/openflow/openflow-1.4.h b/include/openflow/openflow-1.4.h index 7631e47..326b220 100644 --- a/include/openflow/openflow-1.4.h +++ b/include/openflow/openflow-1.4.h @@ -155,6 +155,17 @@ struct ofp14_table_mod { }; OFP_ASSERT(sizeof(struct ofp14_table_mod) == 8); +/* Body of reply to OFPMP_TABLE_DESC request. */ +struct ofp14_table_desc { +ovs_be16 length; /* Length is padded to 64 bits. */ +uint8_t table_id; /* Identifier of table. Lower numbered tables + are consulted first. */ +uint8_t pad[1];/* Align to 32-bits. */ +ovs_be32 config; /* Bitmap of OFPTC_* values. */ +/* Followed by 0 or more OFPTMPT14_* properties. */ +}; +OFP_ASSERT(sizeof(struct ofp14_table_desc) == 8); + /* ## ## */ /* ## ofp14_port_stats ## */ diff --git a/lib/learning-switch.c b/lib/learning-switch.c index 3c8536d..6e0a82a 100644 --- a/lib/learning-switch.c +++ b/lib/learning-switch.c @@ -447,6 +447,8 @@ lswitch_process_packet(struct lswitch *sw, const struct ofpbuf *msg) case OFPTYPE_METER_FEATURES_STATS_REPLY: case OFPTYPE_TABLE_FEATURE
[ovs-dev] [PATCH] datapath-windows: Wrong cleanup of newly created multiple NBLs
Bug found in OvsPartialCopyToMultipleNBLs function in the cleanup part of the code. Before completing the current NBL (newNbl) the NEXT link for the following NBL (firstNbl) was broken, instead of the current one (newNbl). This patch is to be applied both on master and on 2.4 branch. Signed-off-by: Sorin Vinturis Reported-by: Sorin Vinturis Reported-at: https://github.com/openvswitch/ovs-issues/issues/87 --- datapath-windows/ovsext/BufferMgmt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/datapath-windows/ovsext/BufferMgmt.c b/datapath-windows/ovsext/BufferMgmt.c index 5adbb25..83d6cde 100644 --- a/datapath-windows/ovsext/BufferMgmt.c +++ b/datapath-windows/ovsext/BufferMgmt.c @@ -881,7 +881,7 @@ cleanup: newNbl = firstNbl; while (newNbl) { firstNbl = NET_BUFFER_LIST_NEXT_NBL(newNbl); -NET_BUFFER_LIST_NEXT_NBL(firstNbl) = NULL; +NET_BUFFER_LIST_NEXT_NBL(newNbl) = NULL; OvsCompleteNBL(context, newNbl, TRUE); newNbl = firstNbl; } -- 1.9.0.msysgit.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] OVN and OpenStack Provider Networks
On 06/23/2015 06:56 PM, Ben Pfaff wrote: > On Tue, Jun 23, 2015 at 11:58:25PM +0200, Salvatore Orlando wrote: >> I'm afraid I have to start bike shedding on this thread too. >> Apparently that's what I do best. > > These are important clarifications, not bikeshedding. > > More below (with lots of trimming and reflowing for clarity): > >> On 06/23/2015 05:10 PM, Ben Pfaff wrote: >>> A "tenant network" is what OVN calls a logical network. OVN can >>> construct it as an L2-over-L3 overlay with STT or Geneve or >>> whatever. Tenant networks can be connected to physical networks via >>> OVN gateways. >>> >>> A "provider network" is just a physical L2 network (possibly >>> VLAN-tagged). In such a network, on the sending side, OVN would >>> rely on normal L2 switching for packets to reach their destinations, >>> and on the receiving side, OVN would not have a reliable way to >>> determine the source of a packet (it would have to infer it from the >>> source MAC). Is that accurate? >> >> While this is correct, it is also restrictive - as that would imply that a >> "provider network" is just a physical L2 segment on the data centre. >> Therefore logical ports on a provider networks would be pretty much pass >> through to the physical network. While it is correct that they might be >> mapped to OVS ports on a bridge doings plain L2 forwarding onto a physical >> network, this does not mean that L2 forwarding is the only thing that one >> can do on provider networks. >> >> A provider network is, from the neutron perspective, exactly like any other >> logical network, including tenant networks. What changes are bindings (or >> mappings, I don't know what's the correct OVN terminology). These bindings >> define three aspects: >> 1 - the transport type (VLAN, GRE, STT, VxLAN, etc) >> 2 - the physical network, if any >> 3 - the segmentation id on the physical network, if any, >> >> For tenant networks, bindings are implicit and depend on what the control >> plan defaults to. As Ben was suggesting this could STT or Geneve. >> For provider networks, these bindings are explicit, as the admin defines >> them. For instance I want this network to be mapped to VLAN 666 on physical >> network MEH. >> >> In practical terms with provider networks the control plane must honour the >> specification made in the neutron request concerning transport bindings for >> the logical networks. If it can't honour these mapping - for instance if it >> does not support the select transport type - it must return an error. >> Nevertheless the control plane still treats provider networks like any >> other network. You can have services like DHCP on them (even if often is >> not a great idea), apply security groups to its ports, uplink them to >> logical routers, and so on. This is great clarification. Thanks, Salvatore. I've been looking at this very focused on some specific use cases (using the provider networks extensions to specify 'flat' or 'vlan' transport type on a specific physical network) as that is what I have seen come up regularly. Do people actually use this to specify other transport types? Do you know of any good references on uses? For example, I'm not sure what is expected if you only set the transport type to VxLAN or whatever. Is it just "use VxLAN, but exactly how is left up to the backend" ? Or is it more well defined than that? > OK, let me take another stab at a recap, then. > > For a tenant network, it is outside the scope of Neutron to dictate or > configure how packets are transported among VMs. Instead, that is > delegated to the plugin itself or to whatever the plugin configures > (e.g. in this case, to OVN). > > For a provider network, the administrator (via Neutron) configures use > of a specific transport in a specific way. A Neutron plugin must > operate over that configured transport in that way, or if cannot then it > must refuse to operate at all. > > OK, if that all that is correct, does the following logical extension > also hold? A provider network implementation is expected to > transparently interoperate with preexisting software that shares a given > transport. For example, if I set up a provider network with Neutron on > a particular Ethernet network, and I have a bunch of physical machines > attached to the same Ethernet network, then I would expect my Neutron > VMs attached to the physical network to be able to communicate back and > forth with those physical machines. > > If that is the case, then I guess one description of the two different > types of network is this: a Neutron plugin may *define* a tenant > network, but a Neutron plugin only *participates* in a provider network. > Is that fair? > > (I apologize if all this should be obvious to Neutron veterans. I'm new > to this!) Based on this thread, and some conversations I've seen between other Neutron devs, this is definitely *not* obvious. The recap you have here sounds right to me, though. However, as discussed above,
Re: [ovs-dev] [PATCH] dpif-netdev: log port/core affinity
>When using multiple PMDs and numerous ports, a performance gain >may be achieved in some use cases by pinning a PMD/port to a >particular (set of) core(s). > >This patch provides a summary of the switch's port/core affinities >each time that the status of the switch's ports is modified. >Based on this information, a user may determine what affinity >modifications are required to optimise performance for their >particular use case. > >Signed-off-by: Mark Kavanagh >Signed-off-by: Wojciech Andralojc >--- > lib/dpif-netdev.c | 5 + > 1 file changed, 5 insertions(+) > >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >index 7df9523..17a6c08 100644 >--- a/lib/dpif-netdev.c >+++ b/lib/dpif-netdev.c >@@ -2664,6 +2664,11 @@ reload: > emc_cache_init(&pmd->flow_cache); > poll_cnt = pmd_load_queues(pmd, &poll_list, poll_cnt); > >+/* List port/core affinity */ >+for (i = 0; i < poll_cnt; i++) { >+ VLOG_INFO("Core %d processing port \'%s\'\n", pmd->core_id, >netdev_get_name(poll_list[i].port->netdev)); >+} >+ > /* Signal here to make sure the pmd finishes > * reloading the updated configuration. */ > dp_netdev_pmd_reload_done(pmd); >-- >1.9.3 Are there any outstanding issues with this patch that might prevent it from being upstreamed? Any feedback would be appreciated. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/3] Update windows test documentation
On Tue, Jun 23, 2015 at 6:11 PM, Alin Serdean wrote: > How about the following: > > Before running the unit tests make sure to add the pthread libraries to your > PATH environment variable. The above is already part of the documentation with the following wording: You should add the pthread-win32's dll path (e.g.: C:\pthread\dll\x86) to the Windows' PATH environment variable. So why repeat it? > > One of the unit tests starts Open vSwitch as a system wide service. This > test expects the pthread libraries path to be in the Windows' SYSTEM PATH > environment variable. If while running unit tests, you do not have the > permissions to start a system wide service or if you already have a Open > vSwitch > service running, you can disable the unit test > with: make check TESTSUITEFLAGS="-k \!windows-service –j8” That looks fine to me and I will include it. > > Alin. > > -Mesaj original- > De la: Gurucharan Shetty [mailto:shet...@nicira.com] > Trimis: Tuesday, June 23, 2015 5:51 PM > Către: Alin Serdean > Cc: dev@openvswitch.org > Subiect: Re: [ovs-dev] [PATCH 3/3] Update windows test documentation > > On Mon, Jun 22, 2015 at 3:45 PM, Alin Serdean > wrote: >> Describe explictly where to add the pthread library to the PATH variable. >> >> In case the pthread library directory was added to the user PATH >> variable the service failed to start. >> >> Signed-off-by: Alin Gabriel Serdean >> --- >> INSTALL.Windows.md | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/INSTALL.Windows.md b/INSTALL.Windows.md index >> 6d870ed..52acf6e 100644 >> --- a/INSTALL.Windows.md >> +++ b/INSTALL.Windows.md >> @@ -92,6 +92,9 @@ or from a distribution tar ball. >> >> % make check TESTSUITEFLAGS="-j8" >> >> + For "daemon --service" test make sure to have the pthread libaries >> + in the the Windows' SYSTEM PATH environment variable >> + > How about the following wording instead? > (One of the unit tests starts Open vSwitch as a system wide service. This > test expects the pthread libraries path to be in the Windows' SYSTEM PATH > environment variable. If while running unit tests, you do not have the > permissions to start a system wide service or if you already have a Open > vSwitch > service running, you can disable the unit test > with: make check TESTSUITEFLAGS="-k \!windows-service –j8”) > > If you think above is correct, I will update your patch before committing. > >> * To install all the compiled executables on the local machine, run: >> >> % make install >> -- >> 1.9.5.msysgit.0 >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/3] Add OVS_VSWITCHD_STOP to bfd unit tests
You can look at "commit 5e65e080ad4d57e" for more information on the exact problem with taskkill. One option is to check for tskill and if it does not exist, use taskkill. On Tue, Jun 23, 2015 at 6:37 PM, Alin Serdean wrote: > Think I found why tskill is missing on my system. It is part of the role > remote desktop services role > https://technet.microsoft.com/en-us/library/cc725766.aspx while taskkill is > part of the default installation. > > Theoretically taskkill with option /f > "Specifies that processes be forcefully terminated. This parameter is ignored > for remote processes; all remote processes are forcefully terminated." > and /t > " Terminates the specified process and any child processes started by it." > would suffice. > > Alin. > > -Mesaj original- > De la: Gurucharan Shetty [mailto:shet...@nicira.com] > Trimis: Tuesday, June 23, 2015 5:40 PM > Către: Alin Serdean > Cc: dev@openvswitch.org > Subiect: Re: [ovs-dev] [PATCH 1/3] Add OVS_VSWITCHD_STOP to bfd unit tests > > I applied this patch as-is. We still need to figure out the issue around > tskill. > > On Tue, Jun 23, 2015 at 7:05 AM, Gurucharan Shetty wrote: >> On Mon, Jun 22, 2015 at 5:21 PM, Alin Serdean >> wrote: >>> I could send another patch with the following: >>> -tskill $i >>> +taskkill //PID $i //F >/dev/null >>> >>> Alin. >> >> Your current patch is something that I think we should apply because >> from Windows' perspective it does a clean kill instead of the force >> kill. >> >> Do you know why your system does not have tskill? The code originally >> had taskkill instead of tskill. One thing I observed was that taskkill >> cannot kill deadlocked processes and unit tests would hang whenever >> processes deadlock. That is the reason I changed it to tskill. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] OVN and OpenStack Provider Networks
[resending to the ovs-dev as I sent my original reply to Russell only] Comments inline Salvatore On 24 June 2015 at 16:15, Russell Bryant wrote: > On 06/23/2015 06:56 PM, Ben Pfaff wrote: > > On Tue, Jun 23, 2015 at 11:58:25PM +0200, Salvatore Orlando wrote: > >> I'm afraid I have to start bike shedding on this thread too. > >> Apparently that's what I do best. > > > > These are important clarifications, not bikeshedding. > > > > More below (with lots of trimming and reflowing for clarity): > > > >> On 06/23/2015 05:10 PM, Ben Pfaff wrote: > >>> A "tenant network" is what OVN calls a logical network. OVN can > >>> construct it as an L2-over-L3 overlay with STT or Geneve or > >>> whatever. Tenant networks can be connected to physical networks via > >>> OVN gateways. > >>> > >>> A "provider network" is just a physical L2 network (possibly > >>> VLAN-tagged). In such a network, on the sending side, OVN would > >>> rely on normal L2 switching for packets to reach their destinations, > >>> and on the receiving side, OVN would not have a reliable way to > >>> determine the source of a packet (it would have to infer it from the > >>> source MAC). Is that accurate? > >> > >> While this is correct, it is also restrictive - as that would imply > that a > >> "provider network" is just a physical L2 segment on the data centre. > >> Therefore logical ports on a provider networks would be pretty much pass > >> through to the physical network. While it is correct that they might be > >> mapped to OVS ports on a bridge doings plain L2 forwarding onto a > physical > >> network, this does not mean that L2 forwarding is the only thing that > one > >> can do on provider networks. > >> > >> A provider network is, from the neutron perspective, exactly like any > other > >> logical network, including tenant networks. What changes are bindings > (or > >> mappings, I don't know what's the correct OVN terminology). These > bindings > >> define three aspects: > >> 1 - the transport type (VLAN, GRE, STT, VxLAN, etc) > >> 2 - the physical network, if any > >> 3 - the segmentation id on the physical network, if any, > >> > >> For tenant networks, bindings are implicit and depend on what the > control > >> plan defaults to. As Ben was suggesting this could STT or Geneve. > >> For provider networks, these bindings are explicit, as the admin defines > >> them. For instance I want this network to be mapped to VLAN 666 on > physical > >> network MEH. > >> > >> In practical terms with provider networks the control plane must honour > the > >> specification made in the neutron request concerning transport bindings > for > >> the logical networks. If it can't honour these mapping - for instance > if it > >> does not support the select transport type - it must return an error. > >> Nevertheless the control plane still treats provider networks like any > >> other network. You can have services like DHCP on them (even if often is > >> not a great idea), apply security groups to its ports, uplink them to > >> logical routers, and so on. > > This is great clarification. Thanks, Salvatore. > > I've been looking at this very focused on some specific use cases (using > the provider networks extensions to specify 'flat' or 'vlan' transport > type on a specific physical network) as that is what I have seen come up > regularly. > > Do people actually use this to specify other transport types? Do you > know of any good references on uses? For example, I'm not sure what is > expected if you only set the transport type to VxLAN or whatever. Is it > just "use VxLAN, but exactly how is left up to the backend" ? Or is it > more well defined than that? > While the API allows you to map a provider network to a specific VxLAN VNI or GRE tunnel key, I have never seen this in practice. Also, I have no idea of what concrete use case might require this scenario. But it won't be the first time someone would come up with some science-fi use case that is a key requirement for their deployment! Nevertheless, I don't think that an implementation of provider networks must support all transport types. Flat and Vlan only is fine for me. API requests for different transport types should be rejected. I believe that the provider networks code already has hooks to allow the plugin to declare which transport types are allowed. > > > OK, let me take another stab at a recap, then. > > > > For a tenant network, it is outside the scope of Neutron to dictate or > > configure how packets are transported among VMs. Instead, that is > > delegated to the plugin itself or to whatever the plugin configures > > (e.g. in this case, to OVN). > > > > For a provider network, the administrator (via Neutron) configures use > > of a specific transport in a specific way. A Neutron plugin must > > operate over that configured transport in that way, or if cannot then it > > must refuse to operate at all. > That's how neutron is supposed to work. > > > > OK, if that all that is c
Re: [ovs-dev] [PATCH] datapath-windows: Wrong cleanup of newly created multiple NBLs
> On Jun 24, 2015, at 3:56 AM, Sorin Vinturis > wrote: > > Bug found in OvsPartialCopyToMultipleNBLs function in the cleanup part of > the code. Before completing the current NBL (newNbl) the NEXT link for the > following NBL (firstNbl) was broken, instead of the current one (newNbl). > > This patch is to be applied both on master and on 2.4 branch. > > Signed-off-by: Sorin Vinturis Good catch. Ben, as Sorin has indicated this is for master as well as for the 2.4 branch. Acked-by: Nithin Raju ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 1/2] netdev-dpdk: Fix sparse and clang warnings
Signed-off-by: Daniele Di Proietto --- lib/netdev-dpdk.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 3af1ee7..c03ff3d 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -93,8 +93,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF)) #define NIC_PORT_RX_Q_SIZE 2048 /* Size of Physical NIC RX Queue, Max (n+32<=4096)*/ #define NIC_PORT_TX_Q_SIZE 2048 /* Size of Physical NIC TX Queue, Max (n+32<=4096)*/ -char *cuse_dev_name = NULL;/* Character device cuse_dev_name. */ -char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */ +static char *cuse_dev_name = NULL;/* Character device cuse_dev_name. */ +static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */ /* * Maximum amount of time in micro seconds to try and enqueue to vhost. @@ -600,7 +600,7 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[], } static int -vhost_construct_helper(struct netdev *netdev_) +vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex) { struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); @@ -2091,7 +2091,7 @@ static const struct netdev_class dpdk_ring_class = netdev_dpdk_get_status, netdev_dpdk_rxq_recv); -static const struct netdev_class dpdk_vhost_cuse_class = +static const struct netdev_class OVS_UNUSED dpdk_vhost_cuse_class = NETDEV_DPDK_CLASS( "dpdkvhostcuse", dpdk_vhost_cuse_class_init, @@ -2105,7 +2105,7 @@ static const struct netdev_class dpdk_vhost_cuse_class = NULL, netdev_dpdk_vhost_rxq_recv); -const struct netdev_class dpdk_vhost_user_class = +static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class = NETDEV_DPDK_CLASS( "dpdkvhostuser", dpdk_vhost_user_class_init, -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/2] netdev-dpdk: Reset RSS hash when receiving from a vhost device.
A vhost device, being a virtual interface, doesn't provide a valid RSS hash. As a workaround we set the value to 0, which suggests the datapath to recompute the hash in software. Reported-by: Dongjun CC: Traynor, Kevin CC: Flavio Leitner Signed-off-by: Daniele Di Proietto --- lib/netdev-dpdk.c | 9 + 1 file changed, 9 insertions(+) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c03ff3d..7565462 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -897,6 +897,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq_, struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(vhost_dev); int qid = 1; uint16_t nb_rx = 0; +uint16_t i; if (OVS_UNLIKELY(!is_vhost_running(virtio_dev))) { return EAGAIN; @@ -910,6 +911,14 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq_, return EAGAIN; } +/* Vhost doesn't provide a valid RSS hash. We tell the datapath to + * compute the hash in software by setting the field to 0. This is + * a temporary workaround until we can rely on mbuf ol_flags + * PKT_RX_RSS_HASH. */ +for (i = 0; i < nb_rx; i++) { +dp_packet_set_rss_hash(packets[i], 0); +} + rte_spinlock_lock(&vhost_dev->stats_lock); vhost_dev->stats.rx_packets += (uint64_t)nb_rx; rte_spinlock_unlock(&vhost_dev->stats_lock); -- 2.1.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH flag.
On 24/06/2015 09:47, "Traynor, Kevin" wrote: > > >> -Original Message- > >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Panu >>Matilainen > >> Sent: Wednesday, June 24, 2015 9:33 AM > >> To: Pravin Shelar; Jesse Gross > >> Cc: dev@openvswitch.org; Flavio Leitner > >> Subject: Re: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH >>flag. > >> > >> On 06/24/2015 05:06 AM, Pravin Shelar wrote: > >> > On Tue, Jun 23, 2015 at 2:51 PM, Jesse Gross wrote: > >> >> On Mon, Jun 22, 2015 at 8:08 PM, Pravin Shelar >>wrote: > >> >>> On Fri, Jun 19, 2015 at 11:24 AM, Daniele Di Proietto > >> >>> wrote: > >> > >> > >> On 18/06/2015 23:57, "Traynor, Kevin" >>wrote: > >> > >> > > >> > > >> >> -Original Message- > >> > > >> >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of >>Daniele Di > >> > > >> >> Proietto > >> > > >> >> Sent: Tuesday, June 16, 2015 7:39 PM > >> > > >> >> To: dev@openvswitch.org > >> > > >> >> Subject: [ovs-dev] [PATCH] dpif-netdev: Check for PKT_RX_RSS_HASH > >> flag. > >> > > >> >> > >> > > >> >> DPDK mbufs contain a valid RSS hash only if PKT_RX_RSS_HASH is > >> > > >> >> set in 'ol_flags'. Otherwise the hash is garbage and doesn't > >> > > >> >> relate to the packet. > >> > > >> >> > >> > > >> >> This fixes an issue with vhost, which, being a virtual NIC, >>doesn't > >> > > >> >> compute the hash. > >> > > >> >> > >> > > >> >> Unfortunately the ixgbe vPMD doesn't set the PKT_RX_RSS_HASH, >>forcing > >> > > >> >> OVS to compute an hash is software. This has a significant >>impact on > >> > > >> >> performance (-30% throughput in a single flow setup) which can be > >> > > >> >> mitigated in the CPU supports crc32c instructions. > >> > > >> > > >> > > >> > As per the other thread on this I'm a bit concerned about the > >> performance > >> > > >> > drop from this patch, so I did some testing of this and >>alternative/ > >> > > >> > complimentary solutions. > >> > > >> > > >> > > >> > Here's the options I looked at and some comments: > >> > > >> > 1. This patch in isolation: vhost drops about ~15% vhost-vhost and > >> > > >> > phy-vhost-phy (because of sw hash) but also there is drops of >>~25% for > >> > > >> > phy-phy and ~15% drop for phy-ivshmem-phy. > >> > > >> > > >> > > >> > 2. Leave the code as is and let EMC misses happen for vhost rx >>pkts: > >> > > >> > I measure this at ~35% drop if missed *everytime* for >>vhost-vhost. We > >> > > >> > see in testing that it can also never happen, but this is not > >> realistic. > >> > > >> > There should be no impact to other DPDK interfaces. > >> > > >> > > >> > > >> > 3. Add hash reset for packets from vhost: This is another way of > >> forcing > >> > > >> > the software hash for vhost rx and it is roughly equivalent in > >> performance > >> > > >> > to 1. for vhost-vhost (~15% drop). While there is a no >>significant drop > >> > > >> > for phy-vhost-phy. There should be no impact to other DPDK >>interfaces. > >> > > >> > > >> > > >> > 4. Apply this patch and turn off Rx Vectorisation. vhost-vhost >>will > >> drop > >> > > >> > ~15% as per 1. and there should be nothing significant for >>phy-vhost- > >> phy. > >> > > >> > We would lose the 10% gain that rx vectorisation gave us for >>phy-phy. > >> > > >> > There should be no impact for dpdkr ports. > >> > > >> > > >> > > >> > > >> > > >> > In terms of not knowing whether the hw hash is valid or not if >>the flag > >> is > >> > > >> > not checked, I would have expected the pmd to return an error on >>config > >> if > >> > > >> > the hash wasn't supported, but I'm not sure that it does. > >> > > >> > In the worst case where there was an incorrect hash, it would >>miss the > >> EMC > >> > > >> > which is about a 45% drop for phy-phy. I would think it's pretty >>safe > >> that > >> > > >> > if we configure it, the hash will be correct but I guess there is >>a > >> > > >> > possibility it wouldn't be. > >> > > >> > > >> > > >> > Even if it is possible to get a smaller patch to fix the >>underlying > >> issue > >> > > >> > in DPDK, it would be in DPDK 2.1 at the earliest meaning the > >> performance > >> > > >> > would remain low until sometime in August. If it's DPDK 2.2, then >>it > >> would > >> > > >> > be sometime in December. This would mean any performance drops >>would be > >> > > >> > present in OVS 2.4 and possibly OVS 2.5. > >> > > >> > > >> > > >> > Sorry :( but based on the performance drop with this patch in >>isolation > >> it > >> > > >> >
Re: [ovs-dev] [PATCH] INSTALL.DPDK: remove experimental statement
On 24/06/2015 02:48, "Pravin Shelar" wrote: >On Tue, Jun 23, 2015 at 11:42 AM, Ben Pfaff wrote: >> Do you two have an opinion on this? If DPDK support is pretty solid now >> then it makes sense to apply this to master and backport it to >> branch-2.4. > >Personally I would like to have better testing with vhost, tunneling >and PMD thread management before removing experimental status. Once >that is done I will be more comfortable with it. I agree. While we've been testing basic forwarding for over a year, we just started stressing those features (tunnelling and vhost). On the bright side, I think OVN will give us a nice opportunity to prove that everything works as it should ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 1/6] ofproto: Hold ofproto_mutex when enabling or disabling eviction.
ofproto_enable_eviction() and ofproto_disable_eviction() require ofproto_mutex (and they were even annotated that way, though not on their prototypes but only at definition), but it wasn't being held. This fixes the problem. Found by inspection. Signed-off-by: Ben Pfaff --- ofproto/ofproto.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c index 08ba043..278c97f 100644 --- a/ofproto/ofproto.c +++ b/ofproto/ofproto.c @@ -83,10 +83,12 @@ static void oftable_set_name(struct oftable *, const char *name); static enum ofperr evict_rules_from_table(struct oftable *) OVS_REQUIRES(ofproto_mutex); -static void oftable_disable_eviction(struct oftable *); +static void oftable_disable_eviction(struct oftable *) +OVS_REQUIRES(ofproto_mutex); static void oftable_enable_eviction(struct oftable *, const struct mf_subfield *fields, -size_t n_fields); +size_t n_fields) +OVS_REQUIRES(ofproto_mutex); /* A set of rules within a single OpenFlow table (oftable) that have the same * values for the oftable's eviction_fields. A rule to be evicted, when one is @@ -1419,13 +1421,6 @@ ofproto_configure_table(struct ofproto *ofproto, int table_id, return; } -if (s->groups) { -oftable_enable_eviction(table, s->groups, s->n_groups); -} else { -oftable_disable_eviction(table); -} - -table->max_flows = s->max_flows; if (classifier_set_prefix_fields(&table->cls, s->prefix_fields, s->n_prefix_fields)) { @@ -1433,6 +1428,12 @@ ofproto_configure_table(struct ofproto *ofproto, int table_id, } ovs_mutex_lock(&ofproto_mutex); +if (s->groups) { +oftable_enable_eviction(table, s->groups, s->n_groups); +} else { +oftable_disable_eviction(table); +} +table->max_flows = s->max_flows; evict_rules_from_table(table); ovs_mutex_unlock(&ofproto_mutex); } @@ -7431,7 +7432,9 @@ static void oftable_destroy(struct oftable *table) { ovs_assert(classifier_is_empty(&table->cls)); +ovs_mutex_lock(&ofproto_mutex); oftable_disable_eviction(table); +ovs_mutex_unlock(&ofproto_mutex); classifier_destroy(&table->cls); free(table->name); } -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/6] ofproto: Make 'importance' immutable.
OpenFlow provides no means to change the importance of a flow after it is added, so make it immutable so that it is reasonable to read it without locking. Signed-off-by: Ben Pfaff --- ofproto/ofproto-provider.h | 2 +- ofproto/ofproto.c | 5 ++--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h index 527823a..495f787 100644 --- a/ofproto/ofproto-provider.h +++ b/ofproto/ofproto-provider.h @@ -354,7 +354,7 @@ struct rule { uint16_t idle_timeout OVS_GUARDED; /* In seconds from ->used. */ /* Eviction precedence. */ -uint16_t importance OVS_GUARDED; +const uint16_t importance; /* Removal reason for sending flow removed message. * Used only if 'flags' has OFPUTIL_FF_SEND_FLOW_REM set and if the diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c index 278c97f..d48c31c 100644 --- a/ofproto/ofproto.c +++ b/ofproto/ofproto.c @@ -4633,7 +4633,7 @@ replace_rule_create(struct ofproto *ofproto, struct ofputil_flow_mod *fm, ovs_mutex_lock(&rule->mutex); rule->idle_timeout = fm->idle_timeout; rule->hard_timeout = fm->hard_timeout; -rule->importance = fm->importance; +*CONST_CAST(uint16_t *, &rule->importance) = fm->importance; rule->removed_reason = OVS_OFPRR_NONE; *CONST_CAST(uint8_t *, &rule->table_id) = table_id; @@ -4649,7 +4649,6 @@ replace_rule_create(struct ofproto *ofproto, struct ofputil_flow_mod *fm, /* Copy values from old rule for modify semantics. */ if (old_rule && fm->delete_reason != OFPRR_EVICTION) { -/* 'fm' says that */ bool change_cookie = (fm->modify_cookie && fm->new_cookie != OVS_BE64_MAX && fm->new_cookie != old_rule->flow_cookie); @@ -4658,7 +4657,7 @@ replace_rule_create(struct ofproto *ofproto, struct ofputil_flow_mod *fm, if (fm->command != OFPFC_ADD) { rule->idle_timeout = old_rule->idle_timeout; rule->hard_timeout = old_rule->hard_timeout; -rule->importance = old_rule->importance; +*CONST_CAST(uint16_t *, &rule->importance) = old_rule->importance; rule->flags = old_rule->flags; rule->created = old_rule->created; } -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 3/6] ofp-util: Fix typo in comment.
Signed-off-by: Ben Pfaff --- lib/ofp-util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/ofp-util.c b/lib/ofp-util.c index 89359c1..c1b2394 100644 --- a/lib/ofp-util.c +++ b/lib/ofp-util.c @@ -1317,7 +1317,7 @@ ofputil_decode_hello_bitmap(const struct ofp_hello_elem_header *oheh, /* Only use the first 32-bit element of the bitmap as that is all the * current implementation supports. Subsequent elements are ignored which - * should have no effect on session negotiation until Open vSwtich supports + * should have no effect on session negotiation until Open vSwitch supports * wire-protocol versions greater than 31. */ allowed_versions = ntohl(bitmap[0]); -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 4/6] ofproto: Use OF1.4+ "importance" as part of eviction criteria.
The "importance" field is considered before flow timeout because I figure that if you set the importance, you think it's important. Signed-off-by: Ben Pfaff Co-authored-by: Saloni Jain Signed-off-by: Saloni Jain --- NEWS | 1 + ofproto/ofproto.c| 40 +++--- tests/ofproto.at | 79 vswitchd/vswitch.xml | 19 ++--- 4 files changed, 118 insertions(+), 21 deletions(-) diff --git a/NEWS b/NEWS index 395444b..363bd8c 100644 --- a/NEWS +++ b/NEWS @@ -6,6 +6,7 @@ Post-v2.4.0 - OpenFlow: * Group chaining (where one OpenFlow group triggers another) is now supported. + * OpenFlow 1.4+ "importance" is now considered for flow eviction. v2.4.0 - xx xxx diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c index d48c31c..fd14030 100644 --- a/ofproto/ofproto.c +++ b/ofproto/ofproto.c @@ -114,7 +114,7 @@ struct eviction_group { static bool choose_rule_to_evict(struct oftable *table, struct rule **rulep) OVS_REQUIRES(ofproto_mutex); -static uint32_t rule_eviction_priority(struct ofproto *ofproto, struct rule *) +static uint64_t rule_eviction_priority(struct ofproto *ofproto, struct rule *) OVS_REQUIRES(ofproto_mutex);; static void eviction_group_add_rule(struct rule *) OVS_REQUIRES(ofproto_mutex); @@ -7331,23 +7331,21 @@ eviction_group_find(struct oftable *table, uint32_t id) } /* Returns an eviction priority for 'rule'. The return value should be - * interpreted so that higher priorities make a rule more attractive candidates - * for eviction. - * Called only if have a timeout. */ -static uint32_t + * interpreted so that higher priorities make a rule a more attractive + * candidate for eviction. */ +static uint64_t rule_eviction_priority(struct ofproto *ofproto, struct rule *rule) OVS_REQUIRES(ofproto_mutex) { +/* Calculate absolute time when this flow will expire. If it will never + * expire, then return 0 to make it unevictable. */ long long int expiration = LLONG_MAX; -long long int modified; -uint32_t expiration_offset; - -/* 'modified' needs protection even when we hold 'ofproto_mutex'. */ -ovs_mutex_lock(&rule->mutex); -modified = rule->modified; -ovs_mutex_unlock(&rule->mutex); - if (rule->hard_timeout) { +/* 'modified' needs protection even when we hold 'ofproto_mutex'. */ +ovs_mutex_lock(&rule->mutex); +long long int modified = rule->modified; +ovs_mutex_unlock(&rule->mutex); + expiration = modified + rule->hard_timeout * 1000; } if (rule->idle_timeout) { @@ -7359,7 +7357,6 @@ rule_eviction_priority(struct ofproto *ofproto, struct rule *rule) idle_expiration = used + rule->idle_timeout * 1000; expiration = MIN(expiration, idle_expiration); } - if (expiration == LLONG_MAX) { return 0; } @@ -7369,10 +7366,19 @@ rule_eviction_priority(struct ofproto *ofproto, struct rule *rule) * * This should work OK for program runs that last UINT32_MAX seconds or * less. Therefore, please restart OVS at least once every 136 years. */ -expiration_offset = (expiration >> 10) - (time_boot_msec() >> 10); +uint32_t expiration_ofs = (expiration >> 10) - (time_boot_msec() >> 10); -/* Invert the expiration offset because we're using a max-heap. */ -return UINT32_MAX - expiration_offset; +/* Combine expiration time with OpenFlow "importance" to form a single + * priority value. We want flows with relatively low "importance" to be + * evicted before even considering expiration time, so put "importance" in + * the most significant bits and expiration time in the least significant + * bits. + * + * Small 'priority' should be evicted before those with large 'priority'. + * The caller expects the opposite convention (a large return value being + * more attractive for eviction) so we invert it before returning. */ +uint64_t priority = ((uint64_t) rule->importance << 32) + expiration_ofs; +return UINT64_MAX - priority; } /* Adds 'rule' to an appropriate eviction group for its oftable's diff --git a/tests/ofproto.at b/tests/ofproto.at index 08585d1..7467ca0 100644 --- a/tests/ofproto.at +++ b/tests/ofproto.at @@ -1787,6 +1787,85 @@ OFPST_FLOW reply (OF1.2): OVS_VSWITCHD_STOP AT_CLEANUP +AT_SETUP([ofproto - eviction using importance upon table overflow (OpenFlow 1.4)]) +OVS_VSWITCHD_START +# Configure a maximum of 4 flows. +AT_CHECK( + [ovs-vsctl \ + -- --id=@t0 create Flow_Table name=evict flow-limit=4 overflow-policy=evict \ + -- set bridge br0 flow_tables:0=@t0 \ + | ${PERL} $srcdir/uuidfilt.pl], + [0], [<0> +]) +# Add 4 flows. +for in_port in 4 3 2 1; do +ovs-ofctl -O Openflow14 add-flow br0 importance=$((in_port + 30)),priority=$((in_port + 5)),hard_timeout=$((in_port + 500)),actions=drop +done +AT_CHECK([ovs-ofctl -O Openflow14 dum
[ovs-dev] [PATCH 6/6] Implement OpenFlow 1.4+ OFPMP_TABLE_DESC message.
Signed-off-by: Ben Pfaff Co-authored-by: Saloni Jain Signed-off-by: Saloni Jain --- NEWS| 1 + include/openflow/openflow-1.4.h | 10 +++ lib/learning-switch.c | 2 + lib/ofp-msgs.h | 10 +++ lib/ofp-print.c | 39 lib/ofp-util.c | 136 lib/ofp-util.h | 21 +++ lib/rconn.c | 2 + ofproto/ofproto.c | 48 ++ ovn/controller/ofctrl.c | 2 + tests/ofproto.at| 22 +++ utilities/ovs-ofctl.8.in| 12 utilities/ovs-ofctl.c | 19 ++ 13 files changed, 324 insertions(+) diff --git a/NEWS b/NEWS index 8dccdcd..c2786cf 100644 --- a/NEWS +++ b/NEWS @@ -8,6 +8,7 @@ Post-v2.4.0 now supported. * OpenFlow 1.4+ "importance" is now considered for flow eviction. * OpenFlow 1.4+ OFPTC_EVICTION is now implemented. + * OpenFlow 1.4+ OFPMP_TABLE_DESC is now implemented. v2.4.0 - xx xxx diff --git a/include/openflow/openflow-1.4.h b/include/openflow/openflow-1.4.h index 7631e47..567aaae 100644 --- a/include/openflow/openflow-1.4.h +++ b/include/openflow/openflow-1.4.h @@ -155,6 +155,16 @@ struct ofp14_table_mod { }; OFP_ASSERT(sizeof(struct ofp14_table_mod) == 8); +/* Body of reply to OFPMP_TABLE_DESC request. */ +struct ofp14_table_desc { +ovs_be16 length; /* Length is padded to 64 bits. */ +uint8_t table_id; /* Identifier of table. Lower numbered tables + are consulted first. */ +uint8_t pad[1];/* Align to 32-bits. */ +ovs_be32 config; /* Bitmap of OFPTC_* values. */ +/* Followed by 0 or more OFPTMPT14_* properties. */ +}; +OFP_ASSERT(sizeof(struct ofp14_table_desc) == 8); /* ## ## */ /* ## ofp14_port_stats ## */ diff --git a/lib/learning-switch.c b/lib/learning-switch.c index 3c8536d..6e0a82a 100644 --- a/lib/learning-switch.c +++ b/lib/learning-switch.c @@ -447,6 +447,8 @@ lswitch_process_packet(struct lswitch *sw, const struct ofpbuf *msg) case OFPTYPE_METER_FEATURES_STATS_REPLY: case OFPTYPE_TABLE_FEATURES_STATS_REQUEST: case OFPTYPE_TABLE_FEATURES_STATS_REPLY: +case OFPTYPE_TABLE_DESC_REQUEST: +case OFPTYPE_TABLE_DESC_REPLY: case OFPTYPE_BUNDLE_CONTROL: case OFPTYPE_BUNDLE_ADD_MESSAGE: default: diff --git a/lib/ofp-msgs.h b/lib/ofp-msgs.h index 23c334a..d14c569 100644 --- a/lib/ofp-msgs.h +++ b/lib/ofp-msgs.h @@ -369,6 +369,12 @@ enum ofpraw { /* OFPST 1.3+ (12): struct ofp13_table_features, uint8_t[8][]. */ OFPRAW_OFPST13_TABLE_FEATURES_REPLY, +/* OFPST 1.4+ (15): void. */ +OFPRAW_OFPST14_TABLE_DESC_REQUEST, + +/* OFPST 1.4+ (15): struct ofp14_table_desc, uint8_t[8][]. */ +OFPRAW_OFPST14_TABLE_DESC_REPLY, + /* OFPST 1.0-1.4 (13): void. */ OFPRAW_OFPST10_PORT_DESC_REQUEST, /* OFPST 1.5+ (13): ovs_be32. */ @@ -602,6 +608,10 @@ enum ofptype { OFPTYPE_TABLE_FEATURES_STATS_REPLY, /* OFPRAW_OFPST13_TABLE_FEATURES_REPLY. */ +OFPTYPE_TABLE_DESC_REQUEST, /* OFPRAW_OFPST14_TABLE_DESC_REQUEST. */ + +OFPTYPE_TABLE_DESC_REPLY,/* OFPRAW_OFPST14_TABLE_DESC_REPLY. */ + OFPTYPE_PORT_DESC_STATS_REQUEST, /* OFPRAW_OFPST10_PORT_DESC_REQUEST. * OFPRAW_OFPST15_PORT_DESC_REQUEST. */ diff --git a/lib/ofp-print.c b/lib/ofp-print.c index 1a6a9d8..21f5ca7 100644 --- a/lib/ofp-print.c +++ b/lib/ofp-print.c @@ -1025,6 +1025,18 @@ ofp_print_table_mod(struct ds *string, const struct ofp_header *oh) } } +/* This function will print the Table description properties. */ +static void +ofp_print_table_desc(struct ds *string, const struct ofputil_table_desc *td) +{ +ds_put_format(string, "\n table %"PRIu8, td->table_id); +ds_put_cstr(string, ":\n"); +ds_put_format(string, " eviction=%s eviction_flags=", + ofputil_table_eviction_to_string(td->eviction)); +ofputil_put_eviction_flags(string, td->eviction_flags); +ds_put_char(string, '\n'); +} + static void ofp_print_queue_get_config_request(struct ds *string, const struct ofp_header *oh) @@ -2607,6 +2619,28 @@ ofp_print_table_features_reply(struct ds *s, const struct ofp_header *oh) } } +static void +ofp_print_table_desc_reply(struct ds *s, const struct ofp_header *oh) +{ +struct ofpbuf b; + +ofpbuf_use_const(&b, oh, ntohs(oh->length)); + +for (;;) { +struct ofputil_table_desc td; +int retval; + +retval = ofputil_decode_table_desc(&b, &td, oh->version); +if (retval) { +if (retval != EOF) { +ofp_print_error(s, retval); +} +return; +} +ofp_print_table_desc(s, &td); +} +} + static const char * bundle_flags_to_name(uint32_t bit) { @@ -2735,6 +2
[ovs-dev] [PATCH 5/6] Implement OpenFlow 1.4+ OFPTC_EVICTION.
OpenFlow 1.4 introduces the ability to turn on flow table eviction with an OFPT_TABLE_MOD message specifying OFPTC_EVICTION. It also adds related machinery to other messages that mention OFPTC_* fields. This commit adds support for the new feature, implementing it as a second, parallel way to enable flow table eviction. It takes more work than it seems like it should because there is so much weirdness with the treatment of OFPTC_* flags over the evolution of OpenFlow; please refer to the explanation in DESIGN.md for more information. This commit also adds related support to ovs-ofctl, plus tests. Signed-off-by: Ben Pfaff Co-authored-by: Saloni Jain Signed-off-by: Saloni Jain --- DESIGN.md | 85 +- NEWS| 1 + include/openflow/openflow-1.3.h | 9 +- lib/ofp-parse.c | 52 ++--- lib/ofp-parse.h | 2 +- lib/ofp-print.c | 90 ++ lib/ofp-util.c | 220 ++- lib/ofp-util.h | 52 - ofproto/ofproto-provider.h | 9 +- ofproto/ofproto.c | 164 -- ofproto/ofproto.h | 18 +-- tests/ofp-print.at | 2 +- tests/ofproto.at| 23 +--- utilities/ovs-ofctl.8.in| 29 ++--- utilities/ovs-ofctl.c | 34 +++--- vswitchd/bridge.c | 7 +- vswitchd/vswitch.xml| 252 ++-- 17 files changed, 704 insertions(+), 345 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index e533b7c..38413d7 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -277,13 +277,19 @@ The table for 1.3 is the same as the one shown above for 1.2. OpenFlow 1.4 - +--- + +OpenFlow 1.4 makes these changes: + + - Adds the "importance" field to flow_mods, but it does not +explicitly specify which kinds of flow_mods set the importance. +For consistency, Open vSwitch uses the same rule for importance +as for idle_timeout and hard_timeout, that is, only an "ADD" +flow_mod sets the importance. (This issue has been filed with +the ONF as EXT-496.) -OpenFlow 1.4 adds the "importance" field to flow_mods, but it does not -explicitly specify which kinds of flow_mods set the importance. For -consistency, Open vSwitch uses the same rule for importance as for -idle_timeout and hard_timeout, that is, only an "ADD" flow_mod sets -the importance. (This issue has been filed with the ONF as EXT-496.) + - Eviction Mechanism to automatically delete entries of lower +importance to make space for newer entries. OpenFlow 1.4 Bundles @@ -606,6 +612,73 @@ Tables 128 and above are reserved for use by the switch itself. Controllers should use only tables 0 through 127. +OFPTC_* Table Configuration +=== + +This section covers the history of the OFPTC_* table configuration +bits across OpenFlow versions. + +OpenFlow 1.0 flow tables had fixed configurations. + +OpenFlow 1.1 enabled controllers to configure behavior upon flow table +miss and added the OFPTC_MISS_* constants for that purpose. OFPTC_* +did not control anything else but it was nevertheless conceptualized +as a set of bit-fields instead of an enum. OF1.1 added the +OFPT_TABLE_MOD message to set OFPTC_MISS_* for a flow table and added +the 'config' field to the OFPST_TABLE reply to report the current +setting. + +OpenFlow 1.2 did not change anything in this regard. + +OpenFlow 1.3 switched to another means to changing flow table miss +behavior and deprecated OFPTC_MISS_* without adding any more OFPTC_* +constants. This meant that OFPT_TABLE_MOD now had no purpose at all, +but OF1.3 kept it around "for backward compatibility with older and +newer versions of the specification." At the same time, OF1.3 +introduced a new message OFPMP_TABLE_FEATURES that included a field +'config' documented as reporting the OFPTC_* values set with +OFPT_TABLE_MOD; of course this served no real purpose because no +OFPTC_* values are defined. OF1.3 did remove the OFPTC_* field from +OFPMP_TABLE (previously named OFPST_TABLE). + +OpenFlow 1.4 defined two new OFPTC_* constants, OFPTC_EVICTION and +OFPTC_VACANCY_EVENTS, using bits that did not overlap with +OFPTC_MISS_* even though those bits had not been defined since OF1.2. +OFPT_TABLE_MOD still controlled these settings. The field for OFPTC_* +values in OFPMP_TABLE_FEATURES was renamed from 'config' to +'capabilities' and documented as reporting the flags that are +supported in a OFPT_TABLE_MOD message. The OFPMP_TABLE_DESC message +newly added in OF1.4 reported the OFPTC_* setting. + +OpenFlow 1.5 did not change anything in this regard. + +The following table summarizes. The columns say: + + - OpenFlow version(s). + + - The OFPTC_* flags defined in those versions. + + - Whether OFPT_TABLE_MOD can modify OFPTC_* flags. + + - Whether OFP
Re: [ovs-dev] [PATCH v11] ovs-ofctl:Implementation of eviction on the basis of Importance
I reworked this into a patch series, starting here: http://openvswitch.org/pipermail/dev/2015-June/056771.html ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] datapath-windows: Wrong cleanup of newly created multiple NBLs
On Wed, Jun 24, 2015 at 10:56:55AM +, Sorin Vinturis wrote: > Bug found in OvsPartialCopyToMultipleNBLs function in the cleanup part of > the code. Before completing the current NBL (newNbl) the NEXT link for the > following NBL (firstNbl) was broken, instead of the current one (newNbl). > > This patch is to be applied both on master and on 2.4 branch. > > Signed-off-by: Sorin Vinturis > Reported-by: Sorin Vinturis > Reported-at: https://github.com/openvswitch/ovs-issues/issues/87 Thanks, applied to master and branch-2.4. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] dpif_probe_feature not suported in windows datapath
I guess this solution is OK with me for 2.4. Nithin (or anyone), do you agree that it's the best solution for now? On Fri, Jun 19, 2015 at 04:41:47PM +, Alin Serdean wrote: > To do it the proper way yes. > > I really hate to do it this way but given the timeframe for 2.4 we could nuke > it later on. > > Alin. > > -Mesaj original- > De la: Nithin Raju [mailto:nit...@vmware.com] > Trimis: Friday, June 19, 2015 7:36 PM > Către: Alin Serdean > Cc: dev@openvswitch.org > Subiect: Re: [ovs-dev] [PATCH v2] dpif_probe_feature not suported in windows > datapath > > hi Alin, > Do you know what is required to complete the implementation of > dpif_probe_feature() in the Windows datapath? Is it the flow validation logic? > > thanks, > -- Nithin > > > > On Jun 18, 2015, at 6:47 PM, Alin Serdean > > wrote: > > > > This patch disables features which are not currently supported in the > > windows datapath. > > > > Unfortunately we have to do it in userspace because dpif_probe_feature > > is not treated accordingly in the windows the datapath. > > > > I opened the issue to track the feature for later implementations: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvs > > witch_ovs-2Dissues_issues_85&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw- > > YihVMNtXt-uEs&r=pNHQcdr7B40b4h6Yb7FIedI1dnBsxdDuTLBYD3JqV80&m=H9ymRGwE > > SDpjH0qHp5EifPK6QimZgzvHEIyjM3QzRwg&s=-rOo7BG2BoxcGi2BlSQASDz8v5Hkhp4y > > MoGbPZ-zQus&e= > > --- > > v2: Rebase > > --- > > ofproto/ofproto-dpif.c | 17 + > > 1 file changed, 17 insertions(+) > > > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index > > 0de8686..55d004e 100644 > > --- a/ofproto/ofproto-dpif.c > > +++ b/ofproto/ofproto-dpif.c > > @@ -1014,8 +1014,14 @@ check_recirc(struct dpif_backer *backer) > > > > ofpbuf_use_stack(&key, &keybuf, sizeof keybuf); > > odp_flow_key_from_flow(&odp_parms, &key); > > +#ifdef _WIN32 > > +/* XXX Force disable of datapath recirculation from userspace until the > > + * dpif_probe_feature is properly implemented in the windows datapath > > */ > > +enable_recirc = false; > > +#else > > enable_recirc = dpif_probe_feature(backer->dpif, "recirculation", &key, > >NULL); > > +#endif > > > > if (enable_recirc) { > > VLOG_INFO("%s: Datapath supports recirculation", @@ -1052,7 > > +1058,13 @@ check_ufid(struct dpif_backer *backer) > > odp_flow_key_from_flow(&odp_parms, &key); > > dpif_flow_hash(backer->dpif, key.data, key.size, &ufid); > > > > +#ifdef _WIN32 > > +/* XXX Force disable of datapath recirculation from userspace until the > > + * dpif_probe_feature is properly implemented in the windows datapath > > */ > > +enable_ufid = false; > > +#else > > enable_ufid = dpif_probe_feature(backer->dpif, "UFID", &key, > > &ufid); > > +#endif > > > > if (enable_ufid) { > > VLOG_INFO("%s: Datapath supports unique flow ids", @@ -1161,6 > > +1173,11 @@ check_max_mpls_depth(struct dpif_backer *backer) > > > > ofpbuf_use_stack(&key, &keybuf, sizeof keybuf); > > odp_flow_key_from_flow(&odp_parms, &key); > > +#ifdef _WIN32 > > +/* XXX Force disable of datapath recirculation from userspace > > until the > > + * dpif_probe_feature is properly implemented in the windows > > datapath */ > > +break; > > +#endif > > if (!dpif_probe_feature(backer->dpif, "MPLS", &key, NULL)) { > > break; > > } > > -- > > 1.9.5.msysgit.0 > > ___ > > dev mailing list > > dev@openvswitch.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_ma > > ilman_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt- > > uEs&r=pNHQcdr7B40b4h6Yb7FIedI1dnBsxdDuTLBYD3JqV80&m=H9ymRGwESDpjH0qHp5 > > EifPK6QimZgzvHEIyjM3QzRwg&s=voVGofZZJTP055uwJnpQBjFDxPbFyaiHEtEtt140MO > > E&e= > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] datapath-windows: demote some logs in flow validation
On Fri, Jun 19, 2015 at 04:14:44PM +, Nithin Raju wrote: > > > On Jun 19, 2015, at 8:59 AM, Alin Serdean > > wrote: > > > > I would rather leave them the way they are until we fix the actual problem > > (disable from userspace, treat the dpif_probe_feature correctly) > > Sure, let’s wait for reviews for the dpif_probe_feature() change. Personally, > I feel we should not make exceptions for the Windows datapath, and let the > usual mechanisms take care of this. Let’s wait for others’ reviews. It's best if we can get dpif_probe_feature() to work correctly. I'm really looking for a judgment from the Windows developers whether that can happen soon enough for 2.4. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 01/11] metaflow: Allow fields to be marked as variable length.
On Fri, Jun 19, 2015 at 04:13:15PM -0700, Jesse Gross wrote: > Until now, all fields that OVS can match against have been fixed > size (variable length headers can be skipped during parsing but > the match is fixed). However, Geneve options can vary in size > so we must not require the size of these fields to be known > at compile time. > > This allows data types to be annotated with not only their size > but whether the field can be smaller than that. The following > patches will change OpenFlow parsing based on that. > > Signed-off-by: Jesse Gross Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 02/11] nx-match: Support variable length header lookup.
On Fri, Jun 19, 2015 at 04:13:16PM -0700, Jesse Gross wrote: > Currently we treat the entire NXM/OXM header, including length, > as an ID to define a field. However, this does not allow for > multiple lengths of a particular field. > > If a field has been marked as variable, we should ignore the length > when looking up the field and only use the class and field. We > continue to use the length for non-variable fields to ensure that > we don't accept something that can never match. > > Signed-off-by: Jesse Gross "sparse" says: ../lib/nx-match.c:100:63: warning: constant 0xff80 is so big it is unsigned long long so I'd suggest adding a ULL suffix there (even though the warning does make me just think "duh"). While reading nxm_field_by_header() I realized that we've got a terrible existing mistake in our hashing. Ugh. I'll send a fix, which we'll want to pre-apply (and maybe backport). ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] nx-match: Fix distribution of hash function for NXM/OXM headers.
NXM/OXM headers as represented in this file are 64-bit long and the low bits are essentially constant (almost always 0) so using hash_int(), which takes an uint32_t, is going to be a useless hash function. This commit fixes the problem. Found by inspection. CC: Jesse Gross Signed-off-by: Ben Pfaff --- lib/nx-match.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lib/nx-match.c b/lib/nx-match.c index 21f291c..0560370 100644 --- a/lib/nx-match.c +++ b/lib/nx-match.c @@ -1893,7 +1893,7 @@ nxm_init(void) for (struct nxm_field_index *nfi = all_nxm_fields; nfi < &all_nxm_fields[ARRAY_SIZE(all_nxm_fields)]; nfi++) { hmap_insert(&nxm_header_map, &nfi->header_node, -hash_int(nfi->nf.header, 0)); +hash_uint64(nfi->nf.header)); hmap_insert(&nxm_name_map, &nfi->name_node, hash_string(nfi->nf.name, 0)); list_push_back(&nxm_mf_map[nfi->nf.id], &nfi->mf_node); @@ -1912,7 +1912,7 @@ nxm_field_by_header(uint64_t header) header = nxm_make_exact_header(header); } -HMAP_FOR_EACH_IN_BUCKET (nfi, header_node, hash_int(header, 0), +HMAP_FOR_EACH_IN_BUCKET (nfi, header_node, hash_uint64(header), &nxm_header_map) { if (header == nfi->nf.header) { return &nfi->nf; -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 02/11] nx-match: Support variable length header lookup.
On Wed, Jun 24, 2015 at 11:13 AM, Ben Pfaff wrote: > On Fri, Jun 19, 2015 at 04:13:16PM -0700, Jesse Gross wrote: >> Currently we treat the entire NXM/OXM header, including length, >> as an ID to define a field. However, this does not allow for >> multiple lengths of a particular field. >> >> If a field has been marked as variable, we should ignore the length >> when looking up the field and only use the class and field. We >> continue to use the length for non-variable fields to ensure that >> we don't accept something that can never match. >> >> Signed-off-by: Jesse Gross > > "sparse" says: > > ../lib/nx-match.c:100:63: warning: constant 0xff80 is so big > it is unsigned long long > > so I'd suggest adding a ULL suffix there (even though the warning does > make me just think "duh"). Thanks, this didn't show up with my sparse. Fixed. > While reading nxm_field_by_header() I realized that we've got a terrible > existing mistake in our hashing. Ugh. I'll send a fix, which we'll want > to pre-apply (and maybe backport). I guess you mean that we hash an int instead of a uint64_t? I just saw that. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 02/11] nx-match: Support variable length header lookup.
On Wed, Jun 24, 2015 at 11:20:20AM -0700, Jesse Gross wrote: > On Wed, Jun 24, 2015 at 11:13 AM, Ben Pfaff wrote: > > While reading nxm_field_by_header() I realized that we've got a terrible > > existing mistake in our hashing. Ugh. I'll send a fix, which we'll want > > to pre-apply (and maybe backport). > > I guess you mean that we hash an int instead of a uint64_t? I just saw that. Yeah--and all the entropy is in the high bits that we discard. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] nx-match: Fix distribution of hash function for NXM/OXM headers.
On Wed, Jun 24, 2015 at 11:17 AM, Ben Pfaff wrote: > NXM/OXM headers as represented in this file are 64-bit long and the low > bits are essentially constant (almost always 0) so using hash_int(), > which takes an uint32_t, is going to be a useless hash function. This > commit fixes the problem. > > Found by inspection. > > CC: Jesse Gross > Signed-off-by: Ben Pfaff Acked-by: Jesse Gross ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] nx-match: Fix distribution of hash function for NXM/OXM headers.
On Wed, Jun 24, 2015 at 11:21:51AM -0700, Jesse Gross wrote: > On Wed, Jun 24, 2015 at 11:17 AM, Ben Pfaff wrote: > > NXM/OXM headers as represented in this file are 64-bit long and the low > > bits are essentially constant (almost always 0) so using hash_int(), > > which takes an uint32_t, is going to be a useless hash function. This > > commit fixes the problem. > > > > Found by inspection. > > > > CC: Jesse Gross > > Signed-off-by: Ben Pfaff > > Acked-by: Jesse Gross Thanks, applied to master and branch-2.4. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 03/11] nx-match: Handle receiving variable length fields.
On Fri, Jun 19, 2015 at 04:13:17PM -0700, Jesse Gross wrote: > This adds support for receiving variable length fields encoded in > NXM/OXM and mapping them into OVS internal structures. In order > for this to make sense, we need to define some semantics: > > There are three lengths that matter in this process: the maximum > size of the field (represented as the existing mf->n_bytes), the > size of the field in the incoming NXM (given by the length in the > NXM header), and the currently configured length of the field > (defined by the consumer of the field and outside the scope of > this patch). > > Fields are modeled as being their maximum length and have the > characteristics expected by exsiting code (i.e. exact match fields > have masks that are all 1's for the whole field, etc.). Incoming > NXMs are stored in the field in the least significant bits. If > the NXM length is larger than the field, is is truncated, if it > is smaller it is zero-extended. When the field is consumed, the > component that needs data picks the configured length out of the > generated field. > > In most cases, the configured and NXM lengths will be equal and > these edge cases do not matter. However, since we cannot easily > enforce that the lengths match (and might not even know what the > right length is, such as in the case of a string parsed by > ovs-ofctl), these semantics should provide deterministic results > that are easy to understand and not require most existing code > to be aware of variable length fields. > > Signed-off-by: Jesse Gross I think that this changes behavior in a corner case. Previously, if nx_pull_entry__() were passed a null 'field' parameter, then nx_pull_header__() would not check for a field at all and would never return OFPERR_OFPBMC_BAD_FIELD. Now, nx_pull_header__() always checks for a field, and although nx_pull_entry__() initially ignores OFPERR_OFPBMC_BAD_FIELD, it still eventually passes that error along to its caller where it previously returned 0. I don't know whether that's a big deal but I do recall trying to get this exactly right before. The fix seems simple, just change the last bit of the function to: if (field_) { *field_ = field; return header_error; } return 0; There's cut-and-paste copy code in this function, maybe a helper would be nice. Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 04/11] nx-match: Enable senders of NXM fields to specify length.
On Fri, Jun 19, 2015 at 04:13:18PM -0700, Jesse Gross wrote: > Currently when an NXM field is encoded, the caller must specify > the length of the data being provided. However, this data is > always placed into a field of standard length. In order to > support variable length options, the length field must also > alter the size in the header. The previous implementation > already required callers to pass in the exact (fixed) size of > the field or it would not work properly, so there is no danger > that this will change the behavior for non-variable length > fields. > > Signed-off-by: Jesse Gross Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] nx-match: Fix typo in comment.
Signed-off-by: Ben Pfaff --- lib/nx-match.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/nx-match.c b/lib/nx-match.c index 3dcee5d..e9de0fd 100644 --- a/lib/nx-match.c +++ b/lib/nx-match.c @@ -1150,7 +1150,7 @@ oxm_put_field_array(struct ofpbuf *b, const struct field_array *fa, int i; /* Field arrays are only used with the group selection method - * property and group properties are only available in OpenFlow * 1.5+. + * property and group properties are only available in OpenFlow 1.5+. * So the following assertion should never fail. * * If support for older OpenFlow versions is desired then some care -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 03/11] nx-match: Handle receiving variable length fields.
On Wed, Jun 24, 2015 at 11:38 AM, Ben Pfaff wrote: > On Fri, Jun 19, 2015 at 04:13:17PM -0700, Jesse Gross wrote: >> This adds support for receiving variable length fields encoded in >> NXM/OXM and mapping them into OVS internal structures. In order >> for this to make sense, we need to define some semantics: >> >> There are three lengths that matter in this process: the maximum >> size of the field (represented as the existing mf->n_bytes), the >> size of the field in the incoming NXM (given by the length in the >> NXM header), and the currently configured length of the field >> (defined by the consumer of the field and outside the scope of >> this patch). >> >> Fields are modeled as being their maximum length and have the >> characteristics expected by exsiting code (i.e. exact match fields >> have masks that are all 1's for the whole field, etc.). Incoming >> NXMs are stored in the field in the least significant bits. If >> the NXM length is larger than the field, is is truncated, if it >> is smaller it is zero-extended. When the field is consumed, the >> component that needs data picks the configured length out of the >> generated field. >> >> In most cases, the configured and NXM lengths will be equal and >> these edge cases do not matter. However, since we cannot easily >> enforce that the lengths match (and might not even know what the >> right length is, such as in the case of a string parsed by >> ovs-ofctl), these semantics should provide deterministic results >> that are easy to understand and not require most existing code >> to be aware of variable length fields. >> >> Signed-off-by: Jesse Gross > > I think that this changes behavior in a corner case. Previously, if > nx_pull_entry__() were passed a null 'field' parameter, then > nx_pull_header__() would not check for a field at all and would never > return OFPERR_OFPBMC_BAD_FIELD. Now, nx_pull_header__() always checks > for a field, and although nx_pull_entry__() initially ignores > OFPERR_OFPBMC_BAD_FIELD, it still eventually passes that error along to > its caller where it previously returned 0. > > I don't know whether that's a big deal but I do recall trying to get > this exactly right before. The fix seems simple, just change the last > bit of the function to: > if (field_) { > *field_ = field; > return header_error; > } > return 0; Thanks for noticing that - the intention was to keep the behavior the same for non-variable length fields. I took your suggested code. > There's cut-and-paste copy code in this function, maybe a helper would > be nice. Agreed; it does look much nicer this way. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] nx-match: Fix typo in comment.
On Wed, Jun 24, 2015 at 11:46 AM, Ben Pfaff wrote: > Signed-off-by: Ben Pfaff > --- > lib/nx-match.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Acked-by: Jesse Gross ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 05/11] nx-match: Trim variable length fields when encoding as actions.
On Fri, Jun 19, 2015 at 04:13:19PM -0700, Jesse Gross wrote: > It is technically correct to send the entire maximum length of > a field when it is variable length. However, it is awkward to > do so and not what one would naively expect. Since receivers will > internally zero-extend fields, we can do the opposite and trim > off leading zeros. This results in encodings that are generally > sensible without specific knowledge of what is being transmitted. > (Of course, other implementations, such as controllers, may know > exactly the expected length of the field and are free to encode > it that way even if it has leading zeros.) > > Signed-off-by: Jesse Gross Acked-by: Ben Pfaff I personally find this implementation of mf_field_len() easier to read (I know that's terribly nitpicky): int len = field_len(mf, value); if (mask && !is_all_ones(mask, mf->n_bytes)) { int mask_len = field_len(mf, mask); len = MAX(len, mask_len); } return len; ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 06/11] nx-match: Enable parsing string representations of variable fields.
On Fri, Jun 19, 2015 at 04:13:20PM -0700, Jesse Gross wrote: > When reading in hex strings that form NXM fields, we don't need to > enforce size constraints if the fields are variable length. > Instead, we can set the header size based on the string length. > > Signed-off-by: Jesse Gross I think that this will give an inappropriate error message in the case where the user supplies more data than allowed in a field. It would be good to fix that. Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 05/11] nx-match: Trim variable length fields when encoding as actions.
On Wed, Jun 24, 2015 at 12:12 PM, Ben Pfaff wrote: > On Fri, Jun 19, 2015 at 04:13:19PM -0700, Jesse Gross wrote: >> It is technically correct to send the entire maximum length of >> a field when it is variable length. However, it is awkward to >> do so and not what one would naively expect. Since receivers will >> internally zero-extend fields, we can do the opposite and trim >> off leading zeros. This results in encodings that are generally >> sensible without specific knowledge of what is being transmitted. >> (Of course, other implementations, such as controllers, may know >> exactly the expected length of the field and are free to encode >> it that way even if it has leading zeros.) >> >> Signed-off-by: Jesse Gross > > Acked-by: Ben Pfaff > > I personally find this implementation of mf_field_len() easier to read > (I know that's terribly nitpicky): > > int len = field_len(mf, value); > if (mask && !is_all_ones(mask, mf->n_bytes)) { > int mask_len = field_len(mf, mask); > len = MAX(len, mask_len); > } > return len; That's fine with me, it doesn't bother me much either way. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] dpif_probe_feature not suported in windows datapath
I'm wondering if we could fail this call in the datapath level by examining the DPIF_FP_PROBE bit in the flags. If it is too hard we can still live with user mode code change. Thanks, Eitan -Original Message- From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff Sent: Wednesday, June 24, 2015 11:05 AM To: Alin Serdean Cc: dev@openvswitch.org Subject: Re: [ovs-dev] [PATCH v2] dpif_probe_feature not suported in windows datapath I guess this solution is OK with me for 2.4. Nithin (or anyone), do you agree that it's the best solution for now? On Fri, Jun 19, 2015 at 04:41:47PM +, Alin Serdean wrote: > To do it the proper way yes. > > I really hate to do it this way but given the timeframe for 2.4 we could nuke > it later on. > > Alin. > > -Mesaj original- > De la: Nithin Raju [mailto:nit...@vmware.com] > Trimis: Friday, June 19, 2015 7:36 PM > Către: Alin Serdean > Cc: dev@openvswitch.org > Subiect: Re: [ovs-dev] [PATCH v2] dpif_probe_feature not suported in > windows datapath > > hi Alin, > Do you know what is required to complete the implementation of > dpif_probe_feature() in the Windows datapath? Is it the flow validation logic? > > thanks, > -- Nithin > > > > On Jun 18, 2015, at 6:47 PM, Alin Serdean > > wrote: > > > > This patch disables features which are not currently supported in > > the windows datapath. > > > > Unfortunately we have to do it in userspace because > > dpif_probe_feature is not treated accordingly in the windows the datapath. > > > > I opened the issue to track the feature for later implementations: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_open > > vs > > witch_ovs-2Dissues_issues_85&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeA > > w- > > YihVMNtXt-uEs&r=pNHQcdr7B40b4h6Yb7FIedI1dnBsxdDuTLBYD3JqV80&m=H9ymRG > > wE > > SDpjH0qHp5EifPK6QimZgzvHEIyjM3QzRwg&s=-rOo7BG2BoxcGi2BlSQASDz8v5Hkhp > > 4y > > MoGbPZ-zQus&e= > > --- > > v2: Rebase > > --- > > ofproto/ofproto-dpif.c | 17 + > > 1 file changed, 17 insertions(+) > > > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index > > 0de8686..55d004e 100644 > > --- a/ofproto/ofproto-dpif.c > > +++ b/ofproto/ofproto-dpif.c > > @@ -1014,8 +1014,14 @@ check_recirc(struct dpif_backer *backer) > > > > ofpbuf_use_stack(&key, &keybuf, sizeof keybuf); > > odp_flow_key_from_flow(&odp_parms, &key); > > +#ifdef _WIN32 > > +/* XXX Force disable of datapath recirculation from userspace until the > > + * dpif_probe_feature is properly implemented in the windows datapath > > */ > > +enable_recirc = false; > > +#else > > enable_recirc = dpif_probe_feature(backer->dpif, "recirculation", &key, > >NULL); > > +#endif > > > > if (enable_recirc) { > > VLOG_INFO("%s: Datapath supports recirculation", @@ -1052,7 > > +1058,13 @@ check_ufid(struct dpif_backer *backer) > > odp_flow_key_from_flow(&odp_parms, &key); > > dpif_flow_hash(backer->dpif, key.data, key.size, &ufid); > > > > +#ifdef _WIN32 > > +/* XXX Force disable of datapath recirculation from userspace until the > > + * dpif_probe_feature is properly implemented in the windows datapath > > */ > > +enable_ufid = false; > > +#else > > enable_ufid = dpif_probe_feature(backer->dpif, "UFID", &key, > > &ufid); > > +#endif > > > > if (enable_ufid) { > > VLOG_INFO("%s: Datapath supports unique flow ids", @@ > > -1161,6 > > +1173,11 @@ check_max_mpls_depth(struct dpif_backer *backer) > > > > ofpbuf_use_stack(&key, &keybuf, sizeof keybuf); > > odp_flow_key_from_flow(&odp_parms, &key); > > +#ifdef _WIN32 > > +/* XXX Force disable of datapath recirculation from userspace > > until the > > + * dpif_probe_feature is properly implemented in the windows > > datapath */ > > +break; > > +#endif > > if (!dpif_probe_feature(backer->dpif, "MPLS", &key, NULL)) { > > break; > > } > > -- > > 1.9.5.msysgit.0 > > ___ > > dev mailing list > > dev@openvswitch.org > > https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_ > > ma > > ilman_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtX > > t- > > uEs&r=pNHQcdr7B40b4h6Yb7FIedI1dnBsxdDuTLBYD3JqV80&m=H9ymRGwESDpjH0qH > > p5 > > EifPK6QimZgzvHEIyjM3QzRwg&s=voVGofZZJTP055uwJnpQBjFDxPbFyaiHEtEtt140 > > MO > > E&e= > > ___ > dev mailing list > dev@openvswitch.org > https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_ma > ilman_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt- > uEs&r=CWsgHUxi6ExLXY798tmo3LJ4e3geGYp56lkcH-5cLCY&m=VgcETFFbbuqnE2agha > rVPMWHqSyP7L2kf73r940PsMg&s=HMaPgTvdxvCtlAQc-yi3VXOLpNatDI43OgRlwifibU > M&e= ___ dev mailing list dev@openvswitch.org https://urldefense.proofpoint.com/v2/ur
Re: [ovs-dev] [PATCH 07/11] openflow: Table maintenance commands for Geneve options.
On Fri, Jun 19, 2015 at 04:13:21PM -0700, Jesse Gross wrote: > In order to work with Geneve options, we need to maintain a mapping > table between an option (defined by ) and > an NXM field that can be operated on for the purposes of matches, > actions, etc. This mapping must be explicitly specified by the > user. > > Conceptually, this table could be communicated using either OpenFlow > or OVSDB. Using OVSDB requires less code and definition of extensions > than OpenFlow but introduces the possibility that mapping table > updates and flow modifications are desynchronized from each other. > This is dangerous because the mapping table signifcantly impacts the > way that flows using Geneve options are installed and processed by > OVS. Therefore, the mapping table is maintained using OpenFlow commands > instead, which opens the possibility of using synchronization between > table changes and flow modifications through barriers, bundles, etc. > > There are two primary groups of OpenFlow messages that are introduced > as Nicira extensions: modification commands (add, modify, delete, > clear mappings) and table status request/reply to dump the current > table along with switch information. > > Note that mappings should not be changed while they are in active use by > a flow. The result of doing so is undefined. > > This only adds the OpenFlow infrastructure but doesn't actually > do anything with the information yet after the messages have been > decoded. > > Signed-off-by: Jesse Gross This is quite clean, thank you! In parse_ofp_geneve_table_mod_str(), I'd prefer to use %i instead of %x, because one of my long-term goals for OVS is to switch away from this awful scanf() stuff to a proper parser and having hex numbers without 0x prefixes will make that a lot harder. This ovs-ofctl syntax is going to trip up anyone who isn't careful about shell quoting, both for {a,b,c} and for the > in the syntax. I guess that's OK but the example in the ovs-ofctl manpage should really show single or double quotes in my opinion. I think decode_geneve_table_mappings() has a memory leak in the error case. You could fix it by doing the list_push_back() immediately instead of only following validation. It would be kind to include in the documentation the minimum required OVS version and that this is an Open vSwitch extension. Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 08/11] metaflow: Extend size of mf_value to 128 bytes.
On Fri, Jun 19, 2015 at 04:13:22PM -0700, Jesse Gross wrote: > Tunnel metadata can be substantially larger than our existing fields > (up to 124 bytes in a single Geneve option) so this extends the size > of the data that we can handle with metaflow fields. This also > breaks a few tests that assume that their max size is also the > maximum that can be handled in a field. > > Signed-off-by: Jesse Gross Did you look around at all to see whether this will unreasonably blow up any data or algorithms? Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 06/11] nx-match: Enable parsing string representations of variable fields.
On Wed, Jun 24, 2015 at 12:26 PM, Ben Pfaff wrote: > On Fri, Jun 19, 2015 at 04:13:20PM -0700, Jesse Gross wrote: >> When reading in hex strings that form NXM fields, we don't need to >> enforce size constraints if the fields are variable length. >> Instead, we can set the header size based on the string length. >> >> Signed-off-by: Jesse Gross > > I think that this will give an inappropriate error message in the case > where the user supplies more data than allowed in a field. It would be > good to fix that. Fixed, thanks. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 09/11] odp-util: Pass down flow netlink attributes when translating masks.
On Fri, Jun 19, 2015 at 04:13:23PM -0700, Jesse Gross wrote: > Sometimes we need to look at flow fields to understand how to parse > an attribute. However, masks don't have this information - just the > mask on the field. We already use the translated flow structure for > this purpose but this isn't always enough since sometimes we actually > need the raw netlink information. Fortunately, that is also readily > available so this passes it down from the appropriate callers. > > Signed-off-by: Jesse Gross I am not looking forward to finding out when we need to use this information ;-) Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/3] OVSDB: Flush JSON cache only when necessary
On Mon, Jun 22, 2015 at 9:58 AM, Ben Pfaff wrote: > On Wed, Jun 10, 2015 at 06:37:59PM -0700, Andy Zhou wrote: >> Currently, JSON cache is always flushed whenever the monitor receives >> a transaction. This patch improves the JSON cache efficiency by only >> flush the JSON cache when a transaction causes client visible >> changes, avoiding unnecessary flushes. >> >> Suggested-by: Ben Pfaff >> Signed-off-by: Andy Zhou > > Acked-by: Ben Pfaff Thanks, pushed the patch series to master with suggested changes applied. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 07/11] openflow: Table maintenance commands for Geneve options.
On Wed, Jun 24, 2015 at 1:11 PM, Ben Pfaff wrote: > On Fri, Jun 19, 2015 at 04:13:21PM -0700, Jesse Gross wrote: > In parse_ofp_geneve_table_mod_str(), I'd prefer to use %i instead of %x, > because one of my long-term goals for OVS is to switch away from this > awful scanf() stuff to a proper parser and having hex numbers without 0x > prefixes will make that a lot harder. Yeah, this is actually just a mistake - the hex values should be prefixed with 0x. > This ovs-ofctl syntax is going to trip up anyone who isn't careful about > shell quoting, both for {a,b,c} and for the > in the syntax. I guess > that's OK but the example in the ovs-ofctl manpage should really show > single or double quotes in my opinion. I added the quotes to the examples in the man page. I wouldn't mind having a syntax that doesn't require quoting but this seemed somewhat descriptive (to me at least) and consistent with our other fields. > I think decode_geneve_table_mappings() has a memory leak in the error > case. You could fix it by doing the list_push_back() immediately > instead of only following validation. Oops - thanks for catching that. > It would be kind to include in the documentation the minimum required > OVS version and that this is an Open vSwitch extension. Yes, I added that. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 07/11] openflow: Table maintenance commands for Geneve options.
On Fri, Jun 19, 2015 at 04:13:21PM -0700, Jesse Gross wrote: > In order to work with Geneve options, we need to maintain a mapping > table between an option (defined by ) and > an NXM field that can be operated on for the purposes of matches, > actions, etc. This mapping must be explicitly specified by the > user. > > Conceptually, this table could be communicated using either OpenFlow > or OVSDB. Using OVSDB requires less code and definition of extensions > than OpenFlow but introduces the possibility that mapping table > updates and flow modifications are desynchronized from each other. > This is dangerous because the mapping table signifcantly impacts the > way that flows using Geneve options are installed and processed by > OVS. Therefore, the mapping table is maintained using OpenFlow commands > instead, which opens the possibility of using synchronization between > table changes and flow modifications through barriers, bundles, etc. > > There are two primary groups of OpenFlow messages that are introduced > as Nicira extensions: modification commands (add, modify, delete, > clear mappings) and table status request/reply to dump the current > table along with switch information. > > Note that mappings should not be changed while they are in active use by > a flow. The result of doing so is undefined. > > This only adds the OpenFlow infrastructure but doesn't actually > do anything with the information yet after the messages have been > decoded. > > Signed-off-by: Jesse Gross Oh, also GCC reports: ../lib/ofp-util.c: In function 'decode_geneve_table_mappings': ../lib/ofp-util.c:8958:24: error: comparison is always true due to limited range of data type [-Werror=type-limits] if (map->index >= TUN_METADATA_NUM_OPTS) { ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 08/11] metaflow: Extend size of mf_value to 128 bytes.
On Wed, Jun 24, 2015 at 1:17 PM, Ben Pfaff wrote: > On Fri, Jun 19, 2015 at 04:13:22PM -0700, Jesse Gross wrote: >> Tunnel metadata can be substantially larger than our existing fields >> (up to 124 bytes in a single Geneve option) so this extends the size >> of the data that we can handle with metaflow fields. This also >> breaks a few tests that assume that their max size is also the >> maximum that can be handled in a field. >> >> Signed-off-by: Jesse Gross > > Did you look around at all to see whether this will unreasonably blow up > any data or algorithms? I don't believe that it should have any significant effects. Generally, code does operations on the fields based on mf->n_bytes (with the exception of some memset()s here and there). I don't think that we really store these in a large number for any real period of time. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 09/11] odp-util: Pass down flow netlink attributes when translating masks.
On Wed, Jun 24, 2015 at 1:20 PM, Ben Pfaff wrote: > On Fri, Jun 19, 2015 at 04:13:23PM -0700, Jesse Gross wrote: >> Sometimes we need to look at flow fields to understand how to parse >> an attribute. However, masks don't have this information - just the >> mask on the field. We already use the translated flow structure for >> this purpose but this isn't always enough since sometimes we actually >> need the raw netlink information. Fortunately, that is also readily >> available so this passes it down from the appropriate callers. >> >> Signed-off-by: Jesse Gross > > I am not looking forward to finding out when we need to use this > information ;-) Things get a bit more interesting in the next patch... ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 07/11] openflow: Table maintenance commands for Geneve options.
On Wed, Jun 24, 2015 at 1:45 PM, Ben Pfaff wrote: > On Fri, Jun 19, 2015 at 04:13:21PM -0700, Jesse Gross wrote: >> In order to work with Geneve options, we need to maintain a mapping >> table between an option (defined by ) and >> an NXM field that can be operated on for the purposes of matches, >> actions, etc. This mapping must be explicitly specified by the >> user. >> >> Conceptually, this table could be communicated using either OpenFlow >> or OVSDB. Using OVSDB requires less code and definition of extensions >> than OpenFlow but introduces the possibility that mapping table >> updates and flow modifications are desynchronized from each other. >> This is dangerous because the mapping table signifcantly impacts the >> way that flows using Geneve options are installed and processed by >> OVS. Therefore, the mapping table is maintained using OpenFlow commands >> instead, which opens the possibility of using synchronization between >> table changes and flow modifications through barriers, bundles, etc. >> >> There are two primary groups of OpenFlow messages that are introduced >> as Nicira extensions: modification commands (add, modify, delete, >> clear mappings) and table status request/reply to dump the current >> table along with switch information. >> >> Note that mappings should not be changed while they are in active use by >> a flow. The result of doing so is undefined. >> >> This only adds the OpenFlow infrastructure but doesn't actually >> do anything with the information yet after the messages have been >> decoded. >> >> Signed-off-by: Jesse Gross > > Oh, also GCC reports: > > ../lib/ofp-util.c: In function 'decode_geneve_table_mappings': > ../lib/ofp-util.c:8958:24: error: comparison is always true due to > limited range of data type [-Werror=type-limits] > if (map->index >= TUN_METADATA_NUM_OPTS) { Yeah, I actually knew about this one. It's because this patch only contains the OpenFlow infrastructure for mapping options but not the support for doing anything with them so the maximum number of options is zero at this point. Therefore, the check for out of range indexes is always true. It should get resolved in patch 10, I guess it didn't seem worth doing weird things to avoid a transient warning. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] tunneling: Don't match on source IP address for native tunnels.
When doing native tunneling, we look at packets destined to the local port to see if they match tunnel protocols that we should intercept. The criteria are IP protocol, destination UDP port, etc. However, we also look at the source IP address of the packets. This should be a function of the port-based tunnel layer and not the tunnel receive code itself. For comparison, the kernel tunnel code has no idea about the IP addresses of its link partners. If port based tunnel is desired, it can be handled using the normal port tunnel layer, regardless of whether the packets originally came from userspace or the kernel. For port based tunneling, this bug has no effect - the check is simply redundant. However, it breaks flow-based native tunnels because the remote IP address is not known at port creation time. CC: Pravin Shelar Reported-by: David Griswold Signed-off-by: Jesse Gross --- lib/tnl-ports.c | 15 +-- lib/tnl-ports.h | 4 ++-- ofproto/tunnel.c | 5 ++--- 3 files changed, 9 insertions(+), 15 deletions(-) diff --git a/lib/tnl-ports.c b/lib/tnl-ports.c index a0a73c8..79c9631 100644 --- a/lib/tnl-ports.c +++ b/lib/tnl-ports.c @@ -56,7 +56,7 @@ tnl_port_free(struct tnl_port_in *p) } static void -tnl_port_init_flow(struct flow *flow, ovs_be32 ip_dst, ovs_be16 udp_port) +tnl_port_init_flow(struct flow *flow, ovs_be16 udp_port) { memset(flow, 0, sizeof *flow); flow->dl_type = htons(ETH_TYPE_IP); @@ -66,21 +66,17 @@ tnl_port_init_flow(struct flow *flow, ovs_be32 ip_dst, ovs_be16 udp_port) flow->nw_proto = IPPROTO_GRE; } flow->tp_dst = udp_port; -/* When matching on incoming flow from remove tnl end point, - * our dst ip address is source ip for them. */ -flow->nw_src = ip_dst; } void -tnl_port_map_insert(odp_port_t port, ovs_be32 ip_dst, ovs_be16 udp_port, -const char dev_name[]) +tnl_port_map_insert(odp_port_t port, ovs_be16 udp_port, const char dev_name[]) { const struct cls_rule *cr; struct tnl_port_in *p; struct match match; memset(&match, 0, sizeof match); -tnl_port_init_flow(&match.flow, ip_dst, udp_port); +tnl_port_init_flow(&match.flow, udp_port); ovs_mutex_lock(&mutex); do { @@ -97,7 +93,6 @@ tnl_port_map_insert(odp_port_t port, ovs_be32 ip_dst, ovs_be16 udp_port, match.wc.masks.nw_proto = 0xff; match.wc.masks.nw_frag = 0xff; /* XXX: No fragments support. */ match.wc.masks.tp_dst = OVS_BE16_MAX; -match.wc.masks.nw_src = OVS_BE32_MAX; cls_rule_init(&p->cr, &match, 0, CLS_MIN_VERSION); /* Priority == 0. */ ovs_refcount_init(&p->ref_cnt); @@ -123,12 +118,12 @@ tnl_port_unref(const struct cls_rule *cr) } void -tnl_port_map_delete(ovs_be32 ip_dst, ovs_be16 udp_port) +tnl_port_map_delete(ovs_be16 udp_port) { const struct cls_rule *cr; struct flow flow; -tnl_port_init_flow(&flow, ip_dst, udp_port); +tnl_port_init_flow(&flow, udp_port); cr = classifier_lookup(&cls, CLS_MAX_VERSION, &flow, NULL); tnl_port_unref(cr); diff --git a/lib/tnl-ports.h b/lib/tnl-ports.h index 37a689f..8e4911d 100644 --- a/lib/tnl-ports.h +++ b/lib/tnl-ports.h @@ -26,10 +26,10 @@ odp_port_t tnl_port_map_lookup(struct flow *flow, struct flow_wildcards *wc); -void tnl_port_map_insert(odp_port_t port, ovs_be32 ip_dst, ovs_be16 udp_port, +void tnl_port_map_insert(odp_port_t port, ovs_be16 udp_port, const char dev_name[]); -void tnl_port_map_delete(ovs_be32 ip_dst, ovs_be16 udp_port); +void tnl_port_map_delete(ovs_be16 udp_port); void tnl_port_map_init(void); diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c index d2ac7c6..42f760e 100644 --- a/ofproto/tunnel.c +++ b/ofproto/tunnel.c @@ -195,8 +195,7 @@ tnl_port_add__(const struct ofport_dpif *ofport, const struct netdev *netdev, tnl_port_mod_log(tnl_port, "adding"); if (native_tnl) { -tnl_port_map_insert(odp_port, tnl_port->match.ip_dst, -cfg->dst_port, name); +tnl_port_map_insert(odp_port, cfg->dst_port, name); } return true; } @@ -263,7 +262,7 @@ tnl_port_del__(const struct ofport_dpif *ofport) OVS_REQ_WRLOCK(rwlock) netdev_get_tunnel_config(tnl_port->netdev); struct hmap **map; -tnl_port_map_delete(tnl_port->match.ip_dst, cfg->dst_port); +tnl_port_map_delete(cfg->dst_port); tnl_port_mod_log(tnl_port, "removing"); map = tnl_match_map(&tnl_port->match); hmap_remove(*map, &tnl_port->match_node); -- 2.1.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 07/11] openflow: Table maintenance commands for Geneve options.
On Wed, Jun 24, 2015 at 01:52:46PM -0700, Jesse Gross wrote: > On Wed, Jun 24, 2015 at 1:45 PM, Ben Pfaff wrote: > > On Fri, Jun 19, 2015 at 04:13:21PM -0700, Jesse Gross wrote: > >> In order to work with Geneve options, we need to maintain a mapping > >> table between an option (defined by ) and > >> an NXM field that can be operated on for the purposes of matches, > >> actions, etc. This mapping must be explicitly specified by the > >> user. > >> > >> Conceptually, this table could be communicated using either OpenFlow > >> or OVSDB. Using OVSDB requires less code and definition of extensions > >> than OpenFlow but introduces the possibility that mapping table > >> updates and flow modifications are desynchronized from each other. > >> This is dangerous because the mapping table signifcantly impacts the > >> way that flows using Geneve options are installed and processed by > >> OVS. Therefore, the mapping table is maintained using OpenFlow commands > >> instead, which opens the possibility of using synchronization between > >> table changes and flow modifications through barriers, bundles, etc. > >> > >> There are two primary groups of OpenFlow messages that are introduced > >> as Nicira extensions: modification commands (add, modify, delete, > >> clear mappings) and table status request/reply to dump the current > >> table along with switch information. > >> > >> Note that mappings should not be changed while they are in active use by > >> a flow. The result of doing so is undefined. > >> > >> This only adds the OpenFlow infrastructure but doesn't actually > >> do anything with the information yet after the messages have been > >> decoded. > >> > >> Signed-off-by: Jesse Gross > > > > Oh, also GCC reports: > > > > ../lib/ofp-util.c: In function 'decode_geneve_table_mappings': > > ../lib/ofp-util.c:8958:24: error: comparison is always true due to > > limited range of data type [-Werror=type-limits] > > if (map->index >= TUN_METADATA_NUM_OPTS) { > > Yeah, I actually knew about this one. It's because this patch only > contains the OpenFlow infrastructure for mapping options but not the > support for doing anything with them so the maximum number of options > is zero at this point. Therefore, the check for out of range indexes > is always true. It should get resolved in patch 10, I guess it didn't > seem worth doing weird things to avoid a transient warning. That's fine, same conclusion here, wanted to make sure you were aware. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 10/11] tunnel: Geneve TLV handling support for OpenFlow.
On Fri, Jun 19, 2015 at 04:13:24PM -0700, Jesse Gross wrote: > The current support for Geneve in OVS is exactly equivalent to VXLAN: > it is possible to set and match on the VNI but not on any options > contained in the header. This patch enables the use of options. > > The goal for Geneve support is not to add support for any particular option > but to allow end users or controllers to specify what they would like to > match. That is, the full range of Geneve's capabilities should be exposed > without modifying the code (the one exception being options that require > per-packet computation in the fast path). > > The main issue with supporting Geneve options is how to integrate the > fields into the existing OpenFlow pipeline. All existing operations > are referred to by their NXM/OXM field name - matches, action generation, > arithmetic operations (i.e. tranfer to a register). However, the Geneve > option space is exactly the same as the OXM space, so a direct mapping > is not feasible. Instead, we create a pool of 64 NXMs that are then > dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped, > these fields become first-class citizens in the OpenFlow pipeline. > > An example of how to use Geneve options: > ovs-ofctl add-geneve-map br0 {class=0x,type=0,len=4}->tun_metadata0 > ovs-ofctl add-flow br0 > in_port=LOCAL,actions=set_field:0x->tun_metadata0,1 > > This will add a 4 bytes option (filled will all 1's) to all packets > coming from the LOCAL port and then send then out to port 1. > > A limitation of this patch is that although the option table is specified > for a particular switch over OpenFlow, it is currently global to all > switches. This will be addressed in a future patch. Out of curiosity, when you say that do you mean "This is left as future work" or "I'm cleaning up a patch as we speak" or somewhere in between? > Based on work originally done by Madhu Challa. > > Signed-off-by: Jesse Gross Among this series, this patch is clearly the money shot. Bugs Building on i386, this makes the compiler really angry at me because struct tun_metadata is an odd multiple of 32 bits in size, which makes struct flow an odd multiple of 32 bits in size and fails the flow.h assertion BUILD_ASSERT_DECL(sizeof(struct flow) % sizeof(uint64_t) == 0); This fixes it but obviously it's not the right fix: diff --git a/lib/tun-metadata.h b/lib/tun-metadata.h index 8ba0757..4e1e84a 100644 --- a/lib/tun-metadata.h +++ b/lib/tun-metadata.h @@ -37,6 +37,7 @@ struct tun_metadata { uint8_t opts[TUN_METADATA_TOT_OPT_SIZE]; uint64_t opt_map; struct tun_table *tab; +uint8_t pad[4]; }; BUILD_ASSERT_DECL(sizeof(((struct tun_metadata *)0)->opt_map) * 8 >= TUN_METADATA_NUM_OPTS); I see a lot of use of "1 << (something)" in tun-metadata, but I believe that these need to use 1ULL or UINT64_C(1) instead. I'm a little confused about the alloc_offset field in struct tun_metadata_allocation. It seems to be a byte offset but metadata_loc_from_match() compares it against TUN_METADATA_NUM_OPTS. Is that a typo for TUN_METADATA_TOT_OPT_SIZE? Other - tun-metadata.c has a few hard tabs. Fragmented tunnel metadata is nasty. Is there a way we can avoid it? I like the implementation approach used in ULONG_FOR_EACH_1 better than the one in PRESENT_OPT_FOR_EACH, since the user doesn't have to provide a scratch variable. Also it protects the macro arguments better. Actually ULONG_FOR_EACH_1 could be made to work with 32- or 64-bit maps just by changing "unsigned long" to "uint64_t" in its definition; maybe we should. I'm not certain whether the tun_metadata layer intends to limit itself to option values that are a multiple of 4 bytes (as Geneve has) or whether it's supposed to be an abstraction that doesn't care. tun_meta_find_key() could probably be a tiny bit faster by switching from HMAP_FOR_EACH_WITH_HASH to HMAP_FOR_EACH_IN_BUCKET. I spent some of my time studying the code by annotating it with comments. I'm appending a patch, hope you'll adopt a lot of it. Acked-by: Ben Pfaff --8<--cut here-->8-- diff --git a/lib/tun-metadata.c b/lib/tun-metadata.c index b09085e..eb7724c 100644 --- a/lib/tun-metadata.c +++ b/lib/tun-metadata.c @@ -31,16 +31,22 @@ #include "tun-metadata.h" struct tun_meta_entry { -struct hmap_node node; -uint32_t key; /* Class and Type */ +struct hmap_node node; /* In struct tun_table's key_hmap. */ +uint32_t key; /* (class << 16) | type. */ struct tun_metadata_loc loc; -bool valid; +bool valid; /* True if allocated to a class and type. */ }; +/* Maps from Geneve option class+type to positions in a struct tun_metadata's + * 'opts' array. */ struct tun_table { +/* TUN_METADATA is stored in element . */ struct tun_meta_entry entries[TUN_METADATA_NUM_OPTS]; -/* Each bit represents 4 bytes of al
Re: [ovs-dev] [PATCH 11/11] pkt-metadata: Avoid introducing overhead for userspace tunnels.
On Fri, Jun 19, 2015 at 04:13:25PM -0700, Jesse Gross wrote: > The addition of Geneve metadata requires a large amount of additional > space to handle the maximum set of options. In most cases, this is > not a big deal since it is only temporary storage on the stack or > can be automatically stripped out for miniflows. However, userspace > tunnels need to deal with this on a per-packet basis, so we should > avoid introducing additional overhead if possible. Two small changes > are aimed at this: > > * Move struct flow_tnl to the end of the packet metadata. Since >the Geneve metadata is already at the end of flow_tnl and pkt_metadata >is at the end of struct dp_packet, this avoids putting a large >amount metadata (which might be empty) in hot cache lines. > > * Only push the new metadata into a miniflow if any options are present >during miniflow_extract(). This does not necessarily provide the >most fine-grained flow generation but it is a quick check and >the userspace implementation of Geneve does not currently support >options anyways. > > Signed-off-by: Jesse Gross Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] nx-match: Fix typo in comment.
On Wed, Jun 24, 2015 at 11:59:03AM -0700, Jesse Gross wrote: > On Wed, Jun 24, 2015 at 11:46 AM, Ben Pfaff wrote: > > Signed-off-by: Ben Pfaff > > --- > > lib/nx-match.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > Acked-by: Jesse Gross Thanks, applied. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] Committing two issues:
Hi Sumit. Sorry it took me a while to review this--I'm catching up as best I can. Can I have a Signed-off-by for this commit? CONTRIBUTING.md says this about Signed-off-by: Signed-off-by: Author Name Informally, this indicates that Author Name is the author or submitter of a patch and has the authority to submit it under the terms of the license. The formal meaning is to agree to the Developer's Certificate of Origin (see below). If the author and submitter are different, each must sign off. If the patch has more than one author, all must sign off. Signed-off-by: Author Name Signed-off-by: Submitter Name and its meaning is: Developer's Certificate of Origin - To help track the author of a patch as well as the submission chain, and be clear that the developer has authority to submit a patch for inclusion in openvswitch please sign off your work. The sign off certifies the following: Developer's Certificate of Origin 1.1 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. (d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved. More below. On Sat, Jun 20, 2015 at 05:09:54PM -0700, Ben Pfaff wrote: > From: Sumit Garg > > 1. A bool (has_lock) was being accessed as a function call >leading to a runtime exception. This seems obviously correct, I'll happily commit it when I have a signoff. > 2. When 'alert' was turned off on a column, the code was >erroring out when value for that column was being set >in a newly inserted row. This is because the row._data >was None at this time. > > A question related to change #2 - should a newly inserted > row get automatically intialized to default values? If so, > I'm not sure the initialization to defaults is happening > and maybe that's why I'm seeing the NULL error. Either way, > I don't see an issue with adding the additional check. I think it would be better to initialize a new row to defaults but the code doesn't do that at the moment. I'm happy to take this change. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] When opening a JSONRPC connection, the health probes are incorrectly getting turned off for connections that need probes.
On Sat, Jun 20, 2015 at 05:09:55PM -0700, Ben Pfaff wrote: > From: Sumit Garg > > In other words, when stream_or_pstream_needs_probes() > return non-zero, the probes are gettting disabled as > the probe interval is getting set to zero. This leads > to incorrect behavior such that probes are: > > - not getting turned off for unix: connections > - getting turned off for tcp:/ssl: connections > > The changes in this commit fix this issue. Seems obviously correct, I just need a sign-off as for the first patch. Thank you! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/4] ovs-lib: Ability to move ip address and routes.
On Mon, Jun 22, 2015 at 02:15:48AM -0700, Gurucharan Shetty wrote: > The ability to move IP address and routes between two interfaces > is useful when we want to make a physical interface (say eth0) > as a port of OVS bridge (say breth0) with all its IP address and route > information transferred to OVS bridge. An upcoming commit > uses the new ability. > > Signed-off-by: Gurucharan Shetty Patches 1 and 2 seem fine to me. Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/4] ovn-integrate: A new OVN utility script.
On Mon, Jun 22, 2015 at 02:15:49AM -0700, Gurucharan Shetty wrote: > Signed-off-by: Gurucharan Shetty Is this intended exclusively for use with Docker, or is it meant to be more generic? Is the documentation just the Docker installation instructions in the following patch? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC 4/4] Introduce an openvswitch driver for Docker networking.
On Mon, Jun 22, 2015 at 02:15:50AM -0700, Gurucharan Shetty wrote: > Docker committed experimental support for multi-host > networking yesterday. This commit adds a driver that > works with that experimental support. Since Docker > code is not part of any official Docker releases yet, > this patch is sent as a RFC. > > Signed-off-by: Gurucharan Shetty I don't really feel qualified to review this but I will if you like. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] Makefiles: Move xml2nroff rule from ovn directory to top level.
On Mon, Jun 22, 2015 at 04:44:28PM -0700, Alex Wang wrote: > On Mon, Jun 22, 2015 at 3:58 PM, Alex Wang wrote: > > > > > > > On Mon, Jun 22, 2015 at 3:42 PM, Ben Pfaff wrote: > > > >> Originally only the OVN documentation used the XML format, but now it's > >> used outside the ovn directory (initially for ovs-sim.1) so it's more > >> logical to have the xml->nroff rule at the top level. > >> > >> Reported-by: Alex Wang > >> Signed-off-by: Ben Pfaff > >> --- > >> Makefile.am | 18 ++ > >> ovn/automake.mk | 18 -- > >> 2 files changed, 18 insertions(+), 18 deletions(-) > >> > >> diff --git a/Makefile.am b/Makefile.am > >> index 178e82c..ad536b6 100644 > >> --- a/Makefile.am > >> +++ b/Makefile.am > >> @@ -181,6 +181,24 @@ SUFFIXES += .in > >> fi > >> $(AM_V_at) mv $@.tmp $@ > >> > >> > > > > Need to put one more newline here, otherwise configure will fail > > due to: > > > > Makefile:5239: *** missing separator. Stop. > > > > Acked-by: Alex Wang > > > > > After I dist-clean and ran again, it worked. So, please ignore my comment. > > Acked-by: Alex Wang Thanks. I doubt this patch was related to the problem, because it only moves text around without changing it. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/3] Add OVS_VSWITCHD_STOP to bfd unit tests
On Wed, Jun 24, 2015 at 07:49:08AM -0700, Gurucharan Shetty wrote: > You can look at "commit 5e65e080ad4d57e" for more information on the > exact problem with taskkill. > > One option is to check for tskill and if it does not exist, use taskkill. Here's a concept for that, it's probably buggy because I don't have a handy Windows system for testing. (The minimal change would be to just add the new bit at the beginning but I couldn't resist "improving" the rest of the code.) diff --git a/tests/ovs-macros.at b/tests/ovs-macros.at index c583c3d..7427de0 100644 --- a/tests/ovs-macros.at +++ b/tests/ovs-macros.at @@ -37,37 +37,31 @@ if test "$IS_WIN32" = "yes"; then command diff --strip-trailing-cr "$@" } +# tskill is more effective than taskkill but it isn't always installed. +if (tskill //?) >/dev/null 2>&1; then :; else +tskill () { taskkill //F //PID $1 >/dev/null; } +fi + kill () { -case "$1" in --0) -shift -for i in $*; do -# tasklist will always have return code 0. -# If pid does exist, there will be a line with the pid. -if tasklist //fi "PID eq $i" | grep $i >/dev/null; then -: -else -return 1 -fi -done -return 0 -;; --[1-9]*) -shift -for i in $*; do -if tasklist //fi "PID eq $i" | grep $i >/dev/null; then -tskill $i -fi -done -;; + signal= + retval=0 +for arg; do + case $arg in +-*) signal=$arg ;; [1-9][0-9]*) -for i in $*; do -if tasklist //fi "PID eq $i" | grep $i >/dev/null; then -tskill $i + # tasklist always returns 0. + # If pid does exist, there will be a line with the pid. + if tasklist //fi "PID eq $arg" | grep $arg >/dev/null; then + if test "X$signal" != "X-0"; then + tskill $arg fi -done + else +retval=1 + fi ;; -esac +esac +done + return $retval } fi ] ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/3] Update windows test documentation
On Wed, Jun 24, 2015 at 07:43:03AM -0700, Gurucharan Shetty wrote: > On Tue, Jun 23, 2015 at 6:11 PM, Alin Serdean > wrote: > > How about the following: > > > > Before running the unit tests make sure to add the pthread libraries to your > > PATH environment variable. > The above is already part of the documentation with the following wording: > > You should add the pthread-win32's dll > path (e.g.: C:\pthread\dll\x86) to the Windows' PATH environment variable. > > So why repeat it? Can we just fix this, instead of making users deal with it, with something like this? diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4 index f89cde0..18d874a 100644 --- a/m4/openvswitch.m4 +++ b/m4/openvswitch.m4 @@ -86,9 +86,11 @@ AC_DEFUN([OVS_CHECK_WIN32], AC_MSG_ERROR([Invalid --with-pthread value]) ;; *) +PTHREAD_WIN32_DIR=$withval/lib/x86 PTHREAD_INCLUDES="-I$withval/include" PTHREAD_LDFLAGS="-L$withval/lib/x86" PTHREAD_LIBS="-lpthreadVC2" +AC_SUBST([PTHREAD_WIN32_DIR]) AC_SUBST([PTHREAD_INCLUDES]) AC_SUBST([PTHREAD_LDFLAGS]) AC_SUBST([PTHREAD_LIBS]) diff --git a/tests/automake.mk b/tests/automake.mk index 3f57114..714bc91 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -95,7 +95,7 @@ TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch KMOD_TESTSUITE = $(srcdir)/tests/kmod-testsuite DISTCLEANFILES += tests/atconfig tests/atlocal -AUTOTEST_PATH = utilities:vswitchd:ovsdb:vtep:tests +AUTOTEST_PATH = utilities:vswitchd:ovsdb:vtep:tests:$(PTHREAD_WIN32_DIR) check-local: tests/atconfig tests/atlocal $(TESTSUITE) $(SHELL) '$(TESTSUITE)' -C tests AUTOTEST_PATH=$(AUTOTEST_PATH) $(TESTSUITEFLAGS) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH]: bfd: display last wall clock time of last flap
On Tue, Jun 23, 2015 at 08:11:22AM -0700, Sabyasachi Sengupta wrote: > Extend bfd to save wall clock time of the last flap in bfd_forwarding__, > and display it throught bfd/show. This information is also exported out > to ovsdb in bfd_get_status. > > Signed-off-by: Sabyasachi Sengupta This seems useful, although I could not apply the patch: Applying: bfd: display last wall clock time of last flap fatal: corrupt patch at line 55 Patch failed at 0001 bfd: display last wall clock time of last flap The copy of the patch that failed is found in: /home/blp/nicira/ovs/.git/rebase-apply/patch When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". It's a little unconventional for us to use a wall clock time for this. I'd be more inclined to report it as "N seconds ago" or "N ms ago". Any particular reason to use a wall clock time? Please also document the new bfd_status feature in vswitchd/vswitch.xml near the rest of the bfd_status keys. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ovn: Add logical port 'enabled' state.
On Tue, Jun 23, 2015 at 02:22:08PM -0400, Russell Bryant wrote: > This patch adds a new column to the Logical_Port table of the > OVN_Northbound database called 'enabled'. The purpose is to allow a > port to be administratively enabled or disabled. It is sometimes > useful to keep a port and its related configuration, but temporarily > disable it, which means no traffic is allowed in or out of the port. > > The implementation is fairly non-invasive as it only required minor > changes to the logical pipeline. > > Signed-off-by: Russell Bryant Applied, thanks! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 0/2] ovn: Add administrative state to logical ports.
On Tue, Jun 23, 2015 at 03:18:57PM -0400, Russell Bryant wrote: > On 06/23/2015 02:22 PM, Russell Bryant wrote: > > While working on OpenStack Neutron integration for OVN, I came across a > > small > > feature gap. The Neutron API supports setting a port as administratively > > down. > > One way to implement that would be to delete ports from OVN while > > administratively down. However, it seemed to me that it would be nice to > > keep > > the list of ports that currently exist in sync between the CMS (Neutron in > > this > > case) and OVN. This patch is a first attempt at supporting this OVN by > > updating > > the Pipeline to drop packets to/from a port that is administratively down. > > > > [PATCH 1/2] ovn: Add logical port 'enabled' state. > > [PATCH 2/2] ovn: Add get/set-enabled to ovn-nbctl. > > > > northd/ovn-northd.c | 14 +--- > > ovn-nb.ovsschema|1 > > ovn-nb.xml |7 ++ > > ovn-nbctl.8.xml | 13 +++ > > ovn-nbctl.c | 59 > > > > 5 files changed, 91 insertions(+), 3 deletions(-) > > > > After some more looking around, it seems the appropriate way to do this > is with the equivalent of "ovs-ofctl mod-port". I'll look into how to > do that some more. I'm happy enough to have this concept at the OVN level. I don't want Neutron to have to talk to OpenFlow at all, if OVN is in the picture. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] ovn: Add get/set-enabled to ovn-nbctl.
On Tue, Jun 23, 2015 at 02:22:09PM -0400, Russell Bryant wrote: > This patch adds support for getting and setting the 'enabled' column > for logical ports using ovn-nbctl. > > Signed-off-by: Russell Bryant Applied, thanks! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/3] ofproto/ofproto-dpif.c: fix coding style
On Tue, Jun 23, 2015 at 05:03:14PM -0300, Thadeu Lima de Souza Cascardo wrote: > Identation was one extra level at ofproto_unixctl_mcast_snooping_show. > > Signed-off-by: Thadeu Lima de Souza Cascardo Applied, thanks! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/3] mcast-snooping: Use IPv6 address for MDB
On Tue, Jun 23, 2015 at 05:03:15PM -0300, Thadeu Lima de Souza Cascardo wrote: > Use IPv6 internally for storing multicast addresses. IPv4 addresses are > translated to their IPv4-mapped equivalent. > > Signed-off-by: Thadeu Lima de Souza Cascardo From Clang: ../lib/mcast-snooping.c:93:23: error: cast from 'const uint8_t *' (aka 'const unsigned char *') to 'const uint32_t *' (aka 'const unsigned int *') increases required alignment from 1 to 4 [-Werror,-Wcast-align] return hash_words((const uint32_t *) grp_addr->s6_addr, 4, ^~~~ From "sparse": ../lib/mcast-snooping.c:133:20: warning: incorrect type in assignment (different base types) ../lib/mcast-snooping.c:133:20:expected restricted ovs_be16 ../lib/mcast-snooping.c:133:20:got int ../lib/mcast-snooping.c:133:20: warning: incorrect type in assignment (different base types) ../lib/mcast-snooping.c:133:20:expected restricted ovs_be16 ../lib/mcast-snooping.c:133:20:got int ../lib/mcast-snooping.c:133:20: warning: incorrect type in assignment (different base types) ../lib/mcast-snooping.c:133:20:expected restricted ovs_be16 ../lib/mcast-snooping.c:133:20:got int Probably the appropriate fixes for the above (also, use struct assignment): diff --git a/lib/mcast-snooping.c b/lib/mcast-snooping.c index f2684f3..294534e 100644 --- a/lib/mcast-snooping.c +++ b/lib/mcast-snooping.c @@ -90,7 +90,7 @@ static uint32_t mcast_table_hash(const struct mcast_snooping *ms, const struct in6_addr *grp_addr, uint16_t vlan) { -return hash_words((const uint32_t *) grp_addr->s6_addr, 4, +return hash_bytes(grp_addr->s6_addr, 16, hash_2words(ms->secret, vlan)); } @@ -130,7 +130,7 @@ in6_addr_set_mapped_ipv4(struct in6_addr *addr, ovs_be32 ip4) { union ovs_16aligned_in6_addr *taddr = (void *) addr; memset(taddr->be16, 0, sizeof(taddr->be16)); -taddr->be16[5] = 0x; +taddr->be16[5] = OVS_BE16_MAX; put_16aligned_be32(&taddr->be32[3], ip4); } @@ -422,7 +422,7 @@ mcast_snooping_add_group(struct mcast_snooping *ms, grp = xmalloc(sizeof *grp); hmap_insert(&ms->table, &grp->hmap_node, hash); -memcpy(grp->addr.s6_addr, addr->s6_addr, sizeof(addr->s6_addr)); +grp->addr = *addr; grp->vlan = vlan; list_init(&grp->bundle_lru); learned = true; Will you roll a v2? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/3] Multicast Listener Discovery support
On Tue, Jun 23, 2015 at 05:03:16PM -0300, Thadeu Lima de Souza Cascardo wrote: > Add support for MLDv1 and MLDv2. The behavior is not that different from > IGMP. Packets to all-hosts address and queries are always flooded, > reports go to routers, routers are added when a query is observed, and > all MLD packets go through slow path. > > Signed-off-by: Thadeu Lima de Souza Cascardo Thanks! This is quite nice. I think that this ought to update NEWS to mention the new feature. mld2_record should probably use ovs_16aligned_in6_addr, because corner cases can cause OVS userspace to process IPv6 packets that are on odd 16-bit boundaries. OVS coding style calls for {} around the function calls here: @@ -2073,12 +2106,17 @@ update_mcast_snooping_table(const struct xbridge *xbridge, break; } } if (!mcast_xbundle || mcast_xbundle != in_xbundle) { -update_mcast_snooping_table__(xbridge, flow, ms, flow->igmp_group_ip4, - vlan, in_xbundle, packet); +if (flow->dl_type == htons(ETH_TYPE_IP)) +update_mcast_snooping_table4__(xbridge, flow, ms, + flow->igmp_group_ip4, vlan, + in_xbundle, packet); +else +update_mcast_snooping_table6__(xbridge, flow, ms, vlan, + in_xbundle, packet); } ovs_rwlock_unlock(&ms->rwlock); } /* send the packet to ports having the multicast group learned */ Also here around the assignments: @ -2328,11 +2384,14 @@ xlate_normal(struct xlate_ctx *ctx) } } /* forwarding to group base ports */ ovs_rwlock_rdlock(&ms->rwlock); -grp = mcast_snooping_lookup4(ms, flow->nw_dst, vlan); +if (flow->dl_type == htons(ETH_TYPE_IP)) +grp = mcast_snooping_lookup4(ms, flow->nw_dst, vlan); +else if (flow->dl_type == htons(ETH_TYPE_IPV6)) +grp = mcast_snooping_lookup(ms, &flow->ipv6_dst, vlan); if (grp) { xlate_normal_mcast_send_group(ctx, ms, grp, in_xbundle, vlan); xlate_normal_mcast_send_fports(ctx, ms, in_xbundle, vlan); xlate_normal_mcast_send_mrouters(ctx, ms, in_xbundle, vlan); } else { and I see a couple more examples in mcast_snooping_add_mld(). I think the loop in mcast_snooping_add_mld() will loop forever if it hits an unknown record type. Also in mcast_snooping_add_mld(), I'd usually either write htons(0) in a case like this because GCC optimizes it better: +if (ntohs(record->nsrcs) == 0 I am looking forward to v2! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] datapath-windows: Remove the external/internal port only if it is removed on the Hyper-V switch
On Tue, Jun 23, 2015 at 08:16:41PM +, Nithin Raju wrote: > > On Jun 23, 2015, at 12:42 PM, Ben Pfaff wrote: > > > > On Thu, Jun 18, 2015 at 09:05:32PM +, Nithin Raju wrote: > >>> On Jun 18, 2015, at 1:57 PM, Alin Serdean > >>> wrote: > >>> > >>> I have no issue with it if it is tested. > >>> > >>> Yes Nithin you are right we agreed that you will work on it but it was in > >>> the backlog for some time and it is a show stopper because it also > >>> fixes(https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openvswitch_ovs-2Dissues_issues_62&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=pNHQcdr7B40b4h6Yb7FIedI1dnBsxdDuTLBYD3JqV80&m=mGRgSLlhBsu9jwYpHagt2xLxS1GSklQgTGXxMSuEL-g&s=YIhRLJ49Q4cetS_HYw_jccyA5VQFWGMPvnX6bq7yGKY&e= > >>> ). > >> > >> I’ll send out my fix today. I ran into a crash while testing > >> creation/deletion of vxlan port. Debugging that right now. > >> > >> Thanks for your understanding. > > > > Does this mean that you're working on a different fix? Or is this patch > > one that should be applied? > > hi Ben, > Alin, Sorin and I agreed that we’ll use my patch that I send out (later) here: > http://openvswitch.org/pipermail/dev/2015-June/056579.html > > Alin found some issues with the patch while validating it, and I am working > with him offline to get the patch validated. Once I get an ACK from him > offline, I’ll post a new revision of the patch. > > As of now, we don’t have any “datapath-windows” patches pending to be > committed. Thanks for the update! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] Makefiles: Move xml2nroff rule from ovn directory to top level.
> > > Acked-by: Alex Wang > > Thanks. > > I doubt this patch was related to the problem, because it only moves > text around without changing it. > Yeah, it is unrelated, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH]: bfd: display last wall clock time of last flap
Hi Ben, It's a little unconventional for us to use a wall clock time for this. I'd be more inclined to report it as "N seconds ago" or "N ms ago". Any particular reason to use a wall clock time? I've seen that all BFD other outputs use "now -/+" convention, but just that I thought wall clock time was more user readable, especially because last flap could be hrs/days ago. If we imagine that this output (last flap time) could be parsed through a remote script as to which link went down and when, its probably easier for them to see the absolute time rather than having to convert it (note that the notion of 'now' could be different in both machines?). I'd think that it makes more sense for next/last TX times (should continue to) be in milliseconds as that reflects a more ongoing activity. This is how it looked in my system (just for your reference). [root@rtr-29-225-196-232 ~]# ovs-appctl bfd/show port3.4094 Forwarding: true Detect Multiplier: 3 Concatenated Path Down: false TX Interval: Approx 500ms RX Interval: Approx 500ms Detect Time: now -1116ms Next TX Time: now -436ms Last TX Time: now +19ms Last Flap Time: 2015-06-25 00:36:22.372 Local Flags: none Local Session State: up Local Diagnostic: No Diagnostic Local Discriminator: 0x10001 Local Minimum TX Interval: 500ms Local Minimum RX Interval: 500ms Remote Flags: none Remote Session State: up Remote Diagnostic: No Diagnostic Remote Discriminator: 0x10001 Remote Minimum TX Interval: 500ms Remote Minimum RX Interval: 500ms I understand that this will be based on what ovs-vswitchd 'thinks' as to when the last flap occurred. Its still not the accurate information as ovs-vswitchd might itself have restarted in between the last flap and when the user actually reads it. This probably means that we save this info in ovsdb and read it through a separate CLI, but probably we can keep it simple for now? Please also document the new bfd_status feature in vswitchd/vswitch.xml near the rest of the bfd_status keys. I'll send across a v2. Thanks, Sabya ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/3] Update windows test documentation
+1. We still need to add the description for the SYSTEM one or put in another patch using the setx command (https://support.microsoft.com/en-us/kb/195050). Alin. -Mesaj original- De la: Ben Pfaff [mailto:b...@nicira.com] Trimis: Thursday, June 25, 2015 3:22 AM Către: Gurucharan Shetty Cc: Alin Serdean; dev@openvswitch.org Subiect: Re: [ovs-dev] [PATCH 3/3] Update windows test documentation On Wed, Jun 24, 2015 at 07:43:03AM -0700, Gurucharan Shetty wrote: > On Tue, Jun 23, 2015 at 6:11 PM, Alin Serdean > wrote: > > How about the following: > > > > Before running the unit tests make sure to add the pthread libraries > > to your PATH environment variable. > The above is already part of the documentation with the following wording: > > You should add the pthread-win32's dll path (e.g.: C:\pthread\dll\x86) > to the Windows' PATH environment variable. > > So why repeat it? Can we just fix this, instead of making users deal with it, with something like this? diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4 index f89cde0..18d874a 100644 --- a/m4/openvswitch.m4 +++ b/m4/openvswitch.m4 @@ -86,9 +86,11 @@ AC_DEFUN([OVS_CHECK_WIN32], AC_MSG_ERROR([Invalid --with-pthread value]) ;; *) +PTHREAD_WIN32_DIR=$withval/lib/x86 PTHREAD_INCLUDES="-I$withval/include" PTHREAD_LDFLAGS="-L$withval/lib/x86" PTHREAD_LIBS="-lpthreadVC2" +AC_SUBST([PTHREAD_WIN32_DIR]) AC_SUBST([PTHREAD_INCLUDES]) AC_SUBST([PTHREAD_LDFLAGS]) AC_SUBST([PTHREAD_LIBS]) diff --git a/tests/automake.mk b/tests/automake.mk index 3f57114..714bc91 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -95,7 +95,7 @@ TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch KMOD_TESTSUITE = $(srcdir)/tests/kmod-testsuite DISTCLEANFILES += tests/atconfig tests/atlocal -AUTOTEST_PATH = utilities:vswitchd:ovsdb:vtep:tests +AUTOTEST_PATH = +utilities:vswitchd:ovsdb:vtep:tests:$(PTHREAD_WIN32_DIR) check-local: tests/atconfig tests/atlocal $(TESTSUITE) $(SHELL) '$(TESTSUITE)' -C tests AUTOTEST_PATH=$(AUTOTEST_PATH) $(TESTSUITEFLAGS) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 10/11] tunnel: Geneve TLV handling support for OpenFlow.
On Wed, Jun 24, 2015 at 3:57 PM, Ben Pfaff wrote: > On Fri, Jun 19, 2015 at 04:13:24PM -0700, Jesse Gross wrote: >> The current support for Geneve in OVS is exactly equivalent to VXLAN: >> it is possible to set and match on the VNI but not on any options >> contained in the header. This patch enables the use of options. >> >> The goal for Geneve support is not to add support for any particular option >> but to allow end users or controllers to specify what they would like to >> match. That is, the full range of Geneve's capabilities should be exposed >> without modifying the code (the one exception being options that require >> per-packet computation in the fast path). >> >> The main issue with supporting Geneve options is how to integrate the >> fields into the existing OpenFlow pipeline. All existing operations >> are referred to by their NXM/OXM field name - matches, action generation, >> arithmetic operations (i.e. tranfer to a register). However, the Geneve >> option space is exactly the same as the OXM space, so a direct mapping >> is not feasible. Instead, we create a pool of 64 NXMs that are then >> dynamically mapped on Geneve option TLVs using OpenFlow. Once mapped, >> these fields become first-class citizens in the OpenFlow pipeline. >> >> An example of how to use Geneve options: >> ovs-ofctl add-geneve-map br0 {class=0x,type=0,len=4}->tun_metadata0 >> ovs-ofctl add-flow br0 >> in_port=LOCAL,actions=set_field:0x->tun_metadata0,1 >> >> This will add a 4 bytes option (filled will all 1's) to all packets >> coming from the LOCAL port and then send then out to port 1. >> >> A limitation of this patch is that although the option table is specified >> for a particular switch over OpenFlow, it is currently global to all >> switches. This will be addressed in a future patch. > > Out of curiosity, when you say that do you mean "This is left as future > work" or "I'm cleaning up a patch as we speak" or somewhere in between? Somewhere in between. Madhu said that he would take a look at it, although I guess that it is still early. I have a list of things that I want to at least consider improving but that I deemed not critical to the initial implementation and pushed off to avoid further expanding the series, so there will be some more enhancements coming. I plan to address all of these before 2.5 is released, at a minimum. > Building on i386, this makes the compiler really angry at me because > struct tun_metadata is an odd multiple of 32 bits in size, which makes > struct flow an odd multiple of 32 bits in size and fails the flow.h > assertion > BUILD_ASSERT_DECL(sizeof(struct flow) % sizeof(uint64_t) == 0); > > This fixes it but obviously it's not the right fix: > > diff --git a/lib/tun-metadata.h b/lib/tun-metadata.h > index 8ba0757..4e1e84a 100644 > --- a/lib/tun-metadata.h > +++ b/lib/tun-metadata.h > @@ -37,6 +37,7 @@ struct tun_metadata { > uint8_t opts[TUN_METADATA_TOT_OPT_SIZE]; > uint64_t opt_map; > struct tun_table *tab; > +uint8_t pad[4]; > }; > BUILD_ASSERT_DECL(sizeof(((struct tun_metadata *)0)->opt_map) * 8 >= >TUN_METADATA_NUM_OPTS); I ended up with this: uint8_t pad[sizeof(uint64_t) - sizeof(struct tun_table *)]; /* Make 8 bytes */ Ideally I would just use __attribute__((aligned(8))) but that's not very portable. > I see a lot of use of "1 << (something)" in tun-metadata, but I believe > that these need to use 1ULL or UINT64_C(1) instead. Fixed. > I'm a little confused about the alloc_offset field in struct > tun_metadata_allocation. It seems to be a byte offset but > metadata_loc_from_match() compares it against TUN_METADATA_NUM_OPTS. Is > that a typo for TUN_METADATA_TOT_OPT_SIZE? It's a dumb typo - it should be TUN_METADATA_TOT_OPT_SIZE. > tun-metadata.c has a few hard tabs. Fixed. > Fragmented tunnel metadata is nasty. Is there a way we can avoid it? I think (hope) that it is relatively uncommon that it will come up in practice. However, I don't think that it's possible to avoid supporting it at all. The alternatives are: * Don't support it at all. I don't think this is an acceptable cop-out because it would mean that controllers have no real idea of knowing what options they can use. Setting up new options would seem to non-deterministically fail and the only mechanism that they would have to fix it is to clear both the flow table and option table and start over. * Compact the option table to remove holes. This would be a little faster on flow setup because we wouldn't have to traverse the chains of fragments but I think the code is much nastier. We would have to update all the flows in the classifier to match the new option table. It doesn't seem like it is worth it at this point in time. > I like the implementation approach used in ULONG_FOR_EACH_1 better than > the one in PRESENT_OPT_FOR_EACH, since the user doesn't have to provide > a scratch variable. Also it protects the macro arguments better. > Actua
Re: [ovs-dev] [PATCH 10/11] tunnel: Geneve TLV handling support for OpenFlow.
On Wed, Jun 24, 2015 at 06:34:05PM -0700, Jesse Gross wrote: > On Wed, Jun 24, 2015 at 3:57 PM, Ben Pfaff wrote: > > On Fri, Jun 19, 2015 at 04:13:24PM -0700, Jesse Gross wrote: > > I like the implementation approach used in ULONG_FOR_EACH_1 better than > > the one in PRESENT_OPT_FOR_EACH, since the user doesn't have to provide > > a scratch variable. Also it protects the macro arguments better. > > Actually ULONG_FOR_EACH_1 could be made to work with 32- or 64-bit maps > > just by changing "unsigned long" to "uint64_t" in its definition; maybe > > we should. > > I looked at converting ULONG_FOR_EACH_1 but decided against it since > it would introduce an additional branch (to decide whether to use > __builtin_ctz or __builtin_ctz_ll) Really? It shouldn't, "__builtin_constant_p(n <= UINT32_MAX) && n <= UINT32_MAX" should be a compile-time constant. > and it is used in places that are very careful about that kind thing, > like the DPDK datapath. I just adopted the logic instead in > PRESENT_OPT_FOR_EACH. That's fine too. > > Acked-by: Ben Pfaff > > Thanks a lot for all of the reviews (it turns out userspace has become > a lot more complicated since the last time that I did any serious work > on it...). True, and it's unfortunate, and if you have any ideas for making it simpler again I'm all ears! I like simple. > I'll wait a bit before pushing to look it over and give you a chance > to object to anything I said. I have the patch for Geneve option > support in the userspace datapath ready to go out once this in to > complete the initial needs for OVN. No objections here, apply it when you're satisfied. I'll be really happy to be able to use this in OVN. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] tests: Automatically add pthread-win32 directory to $PATH.
This reduces the user burden for running "make check". Signed-off-by: Ben Pfaff --- I can't test this; whoever reviews it ought to. m4/openvswitch.m4 | 6 -- tests/automake.mk | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4 index f89cde0..57df79e 100644 --- a/m4/openvswitch.m4 +++ b/m4/openvswitch.m4 @@ -86,9 +86,11 @@ AC_DEFUN([OVS_CHECK_WIN32], AC_MSG_ERROR([Invalid --with-pthread value]) ;; *) -PTHREAD_INCLUDES="-I$withval/include" -PTHREAD_LDFLAGS="-L$withval/lib/x86" +PTHREAD_WIN32_DIR=$withval/lib/x86 +PTHREAD_INCLUDES=-I$withval/include +PTHREAD_LDFLAGS=-L$PTHREAD_WIN32_DIR PTHREAD_LIBS="-lpthreadVC2" +AC_SUBST([PTHREAD_WIN32_DIR]) AC_SUBST([PTHREAD_INCLUDES]) AC_SUBST([PTHREAD_LDFLAGS]) AC_SUBST([PTHREAD_LIBS]) diff --git a/tests/automake.mk b/tests/automake.mk index 3f57114..714bc91 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -95,7 +95,7 @@ TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch KMOD_TESTSUITE = $(srcdir)/tests/kmod-testsuite DISTCLEANFILES += tests/atconfig tests/atlocal -AUTOTEST_PATH = utilities:vswitchd:ovsdb:vtep:tests +AUTOTEST_PATH = utilities:vswitchd:ovsdb:vtep:tests:$(PTHREAD_WIN32_DIR) check-local: tests/atconfig tests/atlocal $(TESTSUITE) $(SHELL) '$(TESTSUITE)' -C tests AUTOTEST_PATH=$(AUTOTEST_PATH) $(TESTSUITEFLAGS) -- 2.1.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/3] Update windows test documentation
It's a step forward, then. I sent out an official patch, I'd appreciate it if anyone out there is willing to test it: http://openvswitch.org/pipermail/dev/2015-June/056833.html On Thu, Jun 25, 2015 at 01:19:14AM +, Alin Serdean wrote: > +1. > > We still need to add the description for the SYSTEM one or put in another > patch using the setx command (https://support.microsoft.com/en-us/kb/195050). > > Alin. > > -Mesaj original- > De la: Ben Pfaff [mailto:b...@nicira.com] > Trimis: Thursday, June 25, 2015 3:22 AM > Către: Gurucharan Shetty > Cc: Alin Serdean; dev@openvswitch.org > Subiect: Re: [ovs-dev] [PATCH 3/3] Update windows test documentation > > On Wed, Jun 24, 2015 at 07:43:03AM -0700, Gurucharan Shetty wrote: > > On Tue, Jun 23, 2015 at 6:11 PM, Alin Serdean > > wrote: > > > How about the following: > > > > > > Before running the unit tests make sure to add the pthread libraries > > > to your PATH environment variable. > > The above is already part of the documentation with the following wording: > > > > You should add the pthread-win32's dll path (e.g.: C:\pthread\dll\x86) > > to the Windows' PATH environment variable. > > > > So why repeat it? > > Can we just fix this, instead of making users deal with it, with something > like this? > > diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4 index f89cde0..18d874a > 100644 > --- a/m4/openvswitch.m4 > +++ b/m4/openvswitch.m4 > @@ -86,9 +86,11 @@ AC_DEFUN([OVS_CHECK_WIN32], > AC_MSG_ERROR([Invalid --with-pthread value]) >;; > *) > +PTHREAD_WIN32_DIR=$withval/lib/x86 > PTHREAD_INCLUDES="-I$withval/include" > PTHREAD_LDFLAGS="-L$withval/lib/x86" > PTHREAD_LIBS="-lpthreadVC2" > +AC_SUBST([PTHREAD_WIN32_DIR]) > AC_SUBST([PTHREAD_INCLUDES]) > AC_SUBST([PTHREAD_LDFLAGS]) > AC_SUBST([PTHREAD_LIBS]) > diff --git a/tests/automake.mk b/tests/automake.mk index 3f57114..714bc91 > 100644 > --- a/tests/automake.mk > +++ b/tests/automake.mk > @@ -95,7 +95,7 @@ TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch > KMOD_TESTSUITE = $(srcdir)/tests/kmod-testsuite DISTCLEANFILES += > tests/atconfig tests/atlocal > > -AUTOTEST_PATH = utilities:vswitchd:ovsdb:vtep:tests > +AUTOTEST_PATH = > +utilities:vswitchd:ovsdb:vtep:tests:$(PTHREAD_WIN32_DIR) > > check-local: tests/atconfig tests/atlocal $(TESTSUITE) > $(SHELL) '$(TESTSUITE)' -C tests AUTOTEST_PATH=$(AUTOTEST_PATH) > $(TESTSUITEFLAGS) > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] tunneling: Don't match on source IP address for native tunnels.
On Wed, Jun 24, 2015 at 2:55 PM, Jesse Gross wrote: > When doing native tunneling, we look at packets destined to the > local port to see if they match tunnel protocols that we should > intercept. The criteria are IP protocol, destination UDP port, etc. > > However, we also look at the source IP address of the packets. This > should be a function of the port-based tunnel layer and not the > tunnel receive code itself. For comparison, the kernel tunnel code > has no idea about the IP addresses of its link partners. If port > based tunnel is desired, it can be handled using the normal port > tunnel layer, regardless of whether the packets originally came > from userspace or the kernel. > > For port based tunneling, this bug has no effect - the check is > simply redundant. However, it breaks flow-based native tunnels > because the remote IP address is not known at port creation time. > > CC: Pravin Shelar > Reported-by: David Griswold > Signed-off-by: Jesse Gross Can you add this test case? > --- > lib/tnl-ports.c | 15 +-- > lib/tnl-ports.h | 4 ++-- > ofproto/tunnel.c | 5 ++--- > 3 files changed, 9 insertions(+), 15 deletions(-) > > diff --git a/lib/tnl-ports.c b/lib/tnl-ports.c > index a0a73c8..79c9631 100644 > --- a/lib/tnl-ports.c > +++ b/lib/tnl-ports.c > @@ -56,7 +56,7 @@ tnl_port_free(struct tnl_port_in *p) > } > > static void > -tnl_port_init_flow(struct flow *flow, ovs_be32 ip_dst, ovs_be16 udp_port) > +tnl_port_init_flow(struct flow *flow, ovs_be16 udp_port) > { > memset(flow, 0, sizeof *flow); > flow->dl_type = htons(ETH_TYPE_IP); > @@ -66,21 +66,17 @@ tnl_port_init_flow(struct flow *flow, ovs_be32 ip_dst, > ovs_be16 udp_port) > flow->nw_proto = IPPROTO_GRE; > } > flow->tp_dst = udp_port; > -/* When matching on incoming flow from remove tnl end point, > - * our dst ip address is source ip for them. */ > -flow->nw_src = ip_dst; > } > > void > -tnl_port_map_insert(odp_port_t port, ovs_be32 ip_dst, ovs_be16 udp_port, > -const char dev_name[]) > +tnl_port_map_insert(odp_port_t port, ovs_be16 udp_port, const char > dev_name[]) > { > const struct cls_rule *cr; > struct tnl_port_in *p; > struct match match; > > memset(&match, 0, sizeof match); > -tnl_port_init_flow(&match.flow, ip_dst, udp_port); > +tnl_port_init_flow(&match.flow, udp_port); > > ovs_mutex_lock(&mutex); > do { > @@ -97,7 +93,6 @@ tnl_port_map_insert(odp_port_t port, ovs_be32 ip_dst, > ovs_be16 udp_port, > match.wc.masks.nw_proto = 0xff; > match.wc.masks.nw_frag = 0xff; /* XXX: No fragments support. */ > match.wc.masks.tp_dst = OVS_BE16_MAX; > -match.wc.masks.nw_src = OVS_BE32_MAX; > > cls_rule_init(&p->cr, &match, 0, CLS_MIN_VERSION); /* Priority == 0. > */ > ovs_refcount_init(&p->ref_cnt); > @@ -123,12 +118,12 @@ tnl_port_unref(const struct cls_rule *cr) > } > > void > -tnl_port_map_delete(ovs_be32 ip_dst, ovs_be16 udp_port) > +tnl_port_map_delete(ovs_be16 udp_port) > { > const struct cls_rule *cr; > struct flow flow; > > -tnl_port_init_flow(&flow, ip_dst, udp_port); > +tnl_port_init_flow(&flow, udp_port); > > cr = classifier_lookup(&cls, CLS_MAX_VERSION, &flow, NULL); > tnl_port_unref(cr); > diff --git a/lib/tnl-ports.h b/lib/tnl-ports.h > index 37a689f..8e4911d 100644 > --- a/lib/tnl-ports.h > +++ b/lib/tnl-ports.h > @@ -26,10 +26,10 @@ > > odp_port_t tnl_port_map_lookup(struct flow *flow, struct flow_wildcards *wc); > > -void tnl_port_map_insert(odp_port_t port, ovs_be32 ip_dst, ovs_be16 udp_port, > +void tnl_port_map_insert(odp_port_t port, ovs_be16 udp_port, > const char dev_name[]); > > -void tnl_port_map_delete(ovs_be32 ip_dst, ovs_be16 udp_port); > +void tnl_port_map_delete(ovs_be16 udp_port); > > void tnl_port_map_init(void); > > diff --git a/ofproto/tunnel.c b/ofproto/tunnel.c > index d2ac7c6..42f760e 100644 > --- a/ofproto/tunnel.c > +++ b/ofproto/tunnel.c > @@ -195,8 +195,7 @@ tnl_port_add__(const struct ofport_dpif *ofport, const > struct netdev *netdev, > tnl_port_mod_log(tnl_port, "adding"); > > if (native_tnl) { > -tnl_port_map_insert(odp_port, tnl_port->match.ip_dst, > -cfg->dst_port, name); > +tnl_port_map_insert(odp_port, cfg->dst_port, name); > } > return true; > } > @@ -263,7 +262,7 @@ tnl_port_del__(const struct ofport_dpif *ofport) > OVS_REQ_WRLOCK(rwlock) > netdev_get_tunnel_config(tnl_port->netdev); > struct hmap **map; > > -tnl_port_map_delete(tnl_port->match.ip_dst, cfg->dst_port); > +tnl_port_map_delete(cfg->dst_port); > tnl_port_mod_log(tnl_port, "removing"); > map = tnl_match_map(&tnl_port->match); > hmap_remove(*map, &tnl_port->match_node); > -- Patch looks good to me. ___ dev mailing li
Re: [ovs-dev] [PATCH] ofproto: Fix use-after-free in bridge destruction with groups.
Hi Ben, On Tue, Jun 23, 2015 at 11:38:56AM -0700, Ben Pfaff wrote: > Groups were not destroyed until after lots of other important bridge > data had been destroyed, including the connection manager. There was an > indirect dependency on the connection manager for bridge destruction > because destroying a group also destroys all of the flows that reference > the group, which in turn causes the ofmonitor to be invoked to report that > the flows had been destroyed. This commit fixes the problem by destroying > groups earlier. > > The problem can be observed by reverting the code changes in this commit > then running "make check-valgrind" with the test that this commit > introduces. > > Reported-by: Simon Horman > Signed-off-by: Ben Pfaff Thanks for following up on this; your patch looks good to me. Reviewed-by: Simon Horman ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH/RFC] connmgr: clear ofproto's pointer to connmgr when the latter is destroyed
On Tue, Jun 23, 2015 at 11:39:45AM -0700, Ben Pfaff wrote: > On Wed, Jun 17, 2015 at 02:22:32PM +0900, Simon Horman wrote: > > As per the testcase included in this patch it has been observed > > that ovs-vswtichd may segfault when deleting a bridge. > > > > Analysing the output of valgrind and gdb it appears that > > this is caused by the connmgr of a ofproto being accessed > > after the latter has been freed. > > > > It appears that this is occurring in different threads and > > is the result of the following postponement arrangement in > > ofproto_destroy(); > > > > /* We should not postpone this because it involves deleting a listening > > * socket which we may want to reopen soon. 'connmgr' should not be used > > * by other threads */ > > connmgr_destroy(p->connmgr); > > > > /* Destroying rules is deferred, must have 'ofproto' around for them. */ > > ovsrcu_postpone(ofproto_destroy__, p); > > Thanks a lot for the report and the fix. I found what I think is a > cleaner fix: > http://openvswitch.org/pipermail/dev/2015-June/056716.html > > Will you review it? Thanks! Thanks, I have done so. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev