On Thu, Jun 30, 2016 at 03:44:37PM -0700, Jesse Gross wrote: > On Thu, Jun 30, 2016 at 12:25 PM, Thadeu Lima de Souza Cascardo > <casca...@redhat.com> wrote: > > On Wed, Jun 29, 2016 at 09:38:00PM -0700, Jesse Gross wrote: > >> On Wed, Jun 22, 2016 at 8:47 AM, Thadeu Lima de Souza Cascardo > >> <casca...@redhat.com> wrote: > >> > This series adds support for the creation of tunnels using the rtnetlink > >> > interface. This will open the possibility for new features and flags on > >> > those > >> > vports without the need to change vport compatibility code. > >> > > >> > Support for STT and LISP have not been added because these are not > >> > upstream yet, > >> > so we don't know how the interface will be like upstream. And there are > >> > no > >> > features in the current drivers right now we could make use of. > >> > >> I noticed some interesting test failures while looking at this series. > >> Some of them are existing but they appear to be related to the earlier > >> conversation that we had with the tunnel routing code. > >> > >> The background is that I normally run unit tests on my development > >> machine and real tests with traffic on a different machine. In that > >> case the problem does not appear. The issue came up when I happened to > >> run the unit tests on a machine with a running configuration of OVS. > >> In that case, it seems that something picks up the existing tunnel > >> kernel netdev, opens it, and then uses that for the unit tests (which > >> should be purely in userspace and theoretically not affected). > >> > >> Here is what I get in the above scenario (some other related tests > >> appear to fail intermittently as well): > >> tunnel_push_pop > >> > >> 750: tunnel_push_pop - action FAILED > >> (tunnel-push-pop.at:154) > >> 751: tunnel_push_pop - packet_out FAILED > >> (tunnel-push-pop.at:201) > >> > >> tunnel_push_pop_ipv6 > >> > >> 752: tunnel_push_pop_ipv6 - action FAILED > >> (tunnel-push-pop-ipv6.at:149) > >> > >> This is presumably caused by the wrong type being used. Wouldn't > >> problems like this be avoided by using the other mechanism that you > >> proposed, whereby we identify the type on netdev_open() and use that? > >> I guess it doesn't seem like that big of a change to me and looks to > >> be more robust (or at least easier to diagnose if things go wrong). > >> > > > > What are the testsuite logs? If I have a vxlan_sys_4789 on my system, I can > > make > > at least 750 and 752 to fail immediately with: > > > > 2016-06-30T18:51:07.978Z|00183|netdev_linux|ERR|failed to create raw socket > > (Operation not permitted) > > 2016-06-30T18:51:07.978Z|00184|dpif_netdev|ERR|vxlan_sys_4789: cannot > > receive packets on this network device (Operation not permitted) > > > > That is the same for almost every test that creates a bridge, if I have a > > br0 on > > my system. And this is pretty much related to this problem, of having the > > route > > table open the device. But setting vxlan_sys_* interfaces type to vxlan > > won't > > fix the br0 problem. > > > > One of the problems here is that the route table doesn't care if the port > > is in > > the database or not, and which type it has in the database. The userspace > > tunneling code, however, will only output to a bridge which must have been > > configured from the database, and I see no other current user of tnl ports > > and > > the routing table. So, maybe we need to ignore every route and listening to > > devices we won't get any packets from anyway? But I don't want this bug to > > block > > this patchset. > > I think this problem is too closely related to this patchset to gloss > over and it affects the course of action that we take. On the previous > version of the series, you gave multiple choices of how to solve the > problem related to restarting ovs-vswitchd and I asked that you look > at the version that ensures the open netdev is the right type. We now > have an example of an issue that comes up when types are not correct, > so I'm asking again that you look at the other solutions to patch #1. > Even if it is preexisting, I don't want to continue to pile more > complexity and workarounds on top. >
OK, so let's make sure it is preexisting and that we are talking about the same issue here. I want to fix this, but I would rather make sure any fix I come up with fails to handle the issue you saw. Can you recover the logs from that test, or can you reproduce it and check this is the same problem? > > It's odd that you only saw this problem with my patchset applied. I run > > into it > > even without my patchset. In fact, I just need to have an interface named > > vxlan_sys_4789. It doesn't need to be attached to any bridge. Just make sure > > it's UP and you will see this problem. > > As I said, they existed earlier but are related to the work you are > doing here and the discussions we had earlier. > > >> The other area that we talked about before is how to handle backports > >> in the kernel modules with this new mechanism. I'm not sure that this > >> has been resolved. Do you have any more thoughts? > > > > Hum. Right now, ovs_vxlan has a newlink function that returns EINVAL. So, > > work > > would be needed there in any case. Geneve returns EINVAL if there is no > > REMOTE, > > and there is no way to set collect_md for GRE as well. > > > > So only changing userspace to try to create ovs_ prefixed interface types > > won't > > do it, so I see no point in doing that exercise right now before much more > > work > > is done on the backports. Again, I don't see why block this patchset on > > that. > > We already have a proposed patch series that will need this. (It > brings in some bug fixes from upstream): > http://openvswitch.org/pipermail/dev/2016-June/074062.html > > I see ensuring that the whole system works similar to before as an > essential part of this patch. Support for backports is an important > part of OVS for many users and is something that we're going to need > almost immediately after this goes in. Honestly, I don't think that it > is very fair to others to push this work onto them. OK, I don't want to duplicate some of Pravin's work. I will verify what is in progress on the list, see what is missing and check again with you and Pravin. Thanks. Cascardo. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev