On Mon, Jan 21, 2013 at 04:04:22PM -0800, Jesse Gross wrote: > On Mon, Jan 21, 2013 at 2:25 PM, Jesse Gross <je...@nicira.com> wrote: > > On Mon, Jan 21, 2013 at 10:11 AM, Ben Pfaff <b...@nicira.com> wrote: > >> skb_gso_segment() has the following comment: > >> > >> * It may return NULL if the skb requires no segmentation. This is > >> * only possible when GSO is used for verifying header integrity. > >> > >> Somehow queue_gso_packets() has never hit this case before, but some > >> failures have suddenly been reported. This commit should fix the problem. > >> > >> Bug #14772. > >> Reported-by: Deepesh Govindan <dgovin...@vmware.com>. > >> Signed-off-by: Ben Pfaff <b...@nicira.com> > > > > Acked-by: Jesse Gross <je...@nicira.com> > > > > However, I think we also potentially have a similar problem in > > tunnel.c:handle_offloads(). > > > >> --- > >> I don't know how to trigger this condition, so I haven't tested this > >> patch beyond verifying that it compiles against the Linux 3.2 kernel > >> on which the problem was reported. The patch was generated against > >> branch-1.7 because that's where the problem occurred; it looks like > >> it should apply cleanly against master also. > > > > We shouldn't normally be hitting this case because we're actually > > trying to do GSO, not header validation. However, I guess the > > guest/backend must be generating a packet with an MSS, which tricks us > > into thinking that it's GSO, but no GSO is actually requested. In the > > case of the bridge, header validation does take place so the situation > > is handled already. It seems not ideal that the network backend > > doesn't sanitize these packets but it's probably good that we handle > > it in any case. > > I kept on looking into this and realized that my explanation is > completely wrong. Both KVM and Xen do correctly sanitize the GSO > metadata. The problem is actually on the receive side - LRO was > somehow enabled and that creates packets of the form that are causing > problems here. We disable LRO when interfaces are added to OVS but > it's possible that someone turned it on by hand later on. The real > source of the problem is that we're not checking and dropping LRO > packets on receive. We're supposed to be doing this but when code was > being restructured for upstreaming the check was accidentally moved > from the receive to the transmit function where it obviously doesn't > do anything. This is clearly a bug and moving it back should solve > the problem (the logs clearly confirm LRO as the problem). > > As far as the current patch, it should also mostly avoid the problem > but can potentially cause problems later on in the system. With this > patch, we will send large packets up to userspace and they will come > back down to be forwarded but without any indication that they were > LRO packets originally. This will probably just cause them to get > dropped later on but could cause issues with some subsystems (like > GRO, which tries to infer MSS based on packet size). None of the > other places in the kernel besides dev.c that use skb_gso_segment() > have this check, so I think we probably should revert it.
Ugh. Presumably you have a real fix coming up, I hope you're willing to pair a revert and a fix patch as you commit the fix? _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev