On Mon, Jan 21, 2013 at 4:09 PM, Ben Pfaff <b...@nicira.com> wrote: > On Mon, Jan 21, 2013 at 04:04:22PM -0800, Jesse Gross wrote: >> On Mon, Jan 21, 2013 at 2:25 PM, Jesse Gross <je...@nicira.com> wrote: >> > On Mon, Jan 21, 2013 at 10:11 AM, Ben Pfaff <b...@nicira.com> wrote: >> >> skb_gso_segment() has the following comment: >> >> >> >> * It may return NULL if the skb requires no segmentation. This is >> >> * only possible when GSO is used for verifying header integrity. >> >> >> >> Somehow queue_gso_packets() has never hit this case before, but some >> >> failures have suddenly been reported. This commit should fix the problem. >> >> >> >> Bug #14772. >> >> Reported-by: Deepesh Govindan <dgovin...@vmware.com>. >> >> Signed-off-by: Ben Pfaff <b...@nicira.com> >> > >> > Acked-by: Jesse Gross <je...@nicira.com> >> > >> > However, I think we also potentially have a similar problem in >> > tunnel.c:handle_offloads(). >> > >> >> --- >> >> I don't know how to trigger this condition, so I haven't tested this >> >> patch beyond verifying that it compiles against the Linux 3.2 kernel >> >> on which the problem was reported. The patch was generated against >> >> branch-1.7 because that's where the problem occurred; it looks like >> >> it should apply cleanly against master also. >> > >> > We shouldn't normally be hitting this case because we're actually >> > trying to do GSO, not header validation. However, I guess the >> > guest/backend must be generating a packet with an MSS, which tricks us >> > into thinking that it's GSO, but no GSO is actually requested. In the >> > case of the bridge, header validation does take place so the situation >> > is handled already. It seems not ideal that the network backend >> > doesn't sanitize these packets but it's probably good that we handle >> > it in any case. >> >> I kept on looking into this and realized that my explanation is >> completely wrong. Both KVM and Xen do correctly sanitize the GSO >> metadata. The problem is actually on the receive side - LRO was >> somehow enabled and that creates packets of the form that are causing >> problems here. We disable LRO when interfaces are added to OVS but >> it's possible that someone turned it on by hand later on. The real >> source of the problem is that we're not checking and dropping LRO >> packets on receive. We're supposed to be doing this but when code was >> being restructured for upstreaming the check was accidentally moved >> from the receive to the transmit function where it obviously doesn't >> do anything. This is clearly a bug and moving it back should solve >> the problem (the logs clearly confirm LRO as the problem). >> >> As far as the current patch, it should also mostly avoid the problem >> but can potentially cause problems later on in the system. With this >> patch, we will send large packets up to userspace and they will come >> back down to be forwarded but without any indication that they were >> LRO packets originally. This will probably just cause them to get >> dropped later on but could cause issues with some subsystems (like >> GRO, which tries to infer MSS based on packet size). None of the >> other places in the kernel besides dev.c that use skb_gso_segment() >> have this check, so I think we probably should revert it. > > Ugh. > > Presumably you have a real fix coming up, I hope you're willing to pair > a revert and a fix patch as you commit the fix?
Yeah, I'm working on it. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev