On Thu, Jun 30, 2016 at 12:25 PM, Thadeu Lima de Souza Cascardo <casca...@redhat.com> wrote: > On Wed, Jun 29, 2016 at 09:38:00PM -0700, Jesse Gross wrote: >> On Wed, Jun 22, 2016 at 8:47 AM, Thadeu Lima de Souza Cascardo >> <casca...@redhat.com> wrote: >> > This series adds support for the creation of tunnels using the rtnetlink >> > interface. This will open the possibility for new features and flags on >> > those >> > vports without the need to change vport compatibility code. >> > >> > Support for STT and LISP have not been added because these are not >> > upstream yet, >> > so we don't know how the interface will be like upstream. And there are no >> > features in the current drivers right now we could make use of. >> >> I noticed some interesting test failures while looking at this series. >> Some of them are existing but they appear to be related to the earlier >> conversation that we had with the tunnel routing code. >> >> The background is that I normally run unit tests on my development >> machine and real tests with traffic on a different machine. In that >> case the problem does not appear. The issue came up when I happened to >> run the unit tests on a machine with a running configuration of OVS. >> In that case, it seems that something picks up the existing tunnel >> kernel netdev, opens it, and then uses that for the unit tests (which >> should be purely in userspace and theoretically not affected). >> >> Here is what I get in the above scenario (some other related tests >> appear to fail intermittently as well): >> tunnel_push_pop >> >> 750: tunnel_push_pop - action FAILED >> (tunnel-push-pop.at:154) >> 751: tunnel_push_pop - packet_out FAILED >> (tunnel-push-pop.at:201) >> >> tunnel_push_pop_ipv6 >> >> 752: tunnel_push_pop_ipv6 - action FAILED >> (tunnel-push-pop-ipv6.at:149) >> >> This is presumably caused by the wrong type being used. Wouldn't >> problems like this be avoided by using the other mechanism that you >> proposed, whereby we identify the type on netdev_open() and use that? >> I guess it doesn't seem like that big of a change to me and looks to >> be more robust (or at least easier to diagnose if things go wrong). >> > > What are the testsuite logs? If I have a vxlan_sys_4789 on my system, I can > make > at least 750 and 752 to fail immediately with: > > 2016-06-30T18:51:07.978Z|00183|netdev_linux|ERR|failed to create raw socket > (Operation not permitted) > 2016-06-30T18:51:07.978Z|00184|dpif_netdev|ERR|vxlan_sys_4789: cannot receive > packets on this network device (Operation not permitted) > > That is the same for almost every test that creates a bridge, if I have a br0 > on > my system. And this is pretty much related to this problem, of having the > route > table open the device. But setting vxlan_sys_* interfaces type to vxlan won't > fix the br0 problem. > > One of the problems here is that the route table doesn't care if the port is > in > the database or not, and which type it has in the database. The userspace > tunneling code, however, will only output to a bridge which must have been > configured from the database, and I see no other current user of tnl ports and > the routing table. So, maybe we need to ignore every route and listening to > devices we won't get any packets from anyway? But I don't want this bug to > block > this patchset.
I think this problem is too closely related to this patchset to gloss over and it affects the course of action that we take. On the previous version of the series, you gave multiple choices of how to solve the problem related to restarting ovs-vswitchd and I asked that you look at the version that ensures the open netdev is the right type. We now have an example of an issue that comes up when types are not correct, so I'm asking again that you look at the other solutions to patch #1. Even if it is preexisting, I don't want to continue to pile more complexity and workarounds on top. > It's odd that you only saw this problem with my patchset applied. I run into > it > even without my patchset. In fact, I just need to have an interface named > vxlan_sys_4789. It doesn't need to be attached to any bridge. Just make sure > it's UP and you will see this problem. As I said, they existed earlier but are related to the work you are doing here and the discussions we had earlier. >> The other area that we talked about before is how to handle backports >> in the kernel modules with this new mechanism. I'm not sure that this >> has been resolved. Do you have any more thoughts? > > Hum. Right now, ovs_vxlan has a newlink function that returns EINVAL. So, work > would be needed there in any case. Geneve returns EINVAL if there is no > REMOTE, > and there is no way to set collect_md for GRE as well. > > So only changing userspace to try to create ovs_ prefixed interface types > won't > do it, so I see no point in doing that exercise right now before much more > work > is done on the backports. Again, I don't see why block this patchset on that. We already have a proposed patch series that will need this. (It brings in some bug fixes from upstream): http://openvswitch.org/pipermail/dev/2016-June/074062.html I see ensuring that the whole system works similar to before as an essential part of this patch. Support for backports is an important part of OVS for many users and is something that we're going to need almost immediately after this goes in. Honestly, I don't think that it is very fair to others to push this work onto them. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev