Evan and I have demonstrated this bug on Kubernetes as well, so it's not just a problem in Mesos. (See https://github.com/kubernetes/kubernetes/issues/18898)
Sorry about my email client, I've re-sent the patch in another thread from git-email as I should have initially. I'll read through the TX path again to see if we missed something, but I'd love input from anyone else! -- Vijay Pandurangan https://www.twitter.com/vijayp http://www.vijayp.ca On Fri, Dec 18, 2015 at 2:00 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote: > (Cc'ing Eric B and Tom) > > On Fri, Dec 18, 2015 at 9:54 AM, Vijay Pandurangan <vij...@vijayp.ca> wrote: >> Packets that arrive from real hardware devices have ip_summed == >> CHECKSUM_UNNECESSARY if the hardware verified the checksums, or >> CHECKSUM_NONE if the packet is bad or it was unable to verify it. The >> current version of veth will replace CHECKSUM_NONE with >> CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to >> a veth device to be delivered to the application. This caused applications >> at Twitter to receive corrupt data when network hardware was corrupting >> packets. > > Yeah, https://reviews.apache.org/r/41158/. > > This is because normally packets to a veth device are _only_ from its pair > device, Mesos network isolator redirects packets from a hardware interface > to veth, which violates this expectation. This is also why no one else sees > this bug. ;) > >> >> We believe this was added as an optimization to skip computing and >> verifying checksums for communication between containers. However, locally >> generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as >> written does nothing for them. As far as we can tell, after removing this >> code, these packets are transmitted from one stack to another unmodified >> (tcpdump shows invalid checksums on both sides, as expected), and they are >> delivered correctly to applications. We didn’t test every possible network >> configuration, but we tried a few common ones such as bridging containers, >> using NAT between the host and a container, and routing from hardware >> devices to containers. We have effectively deployed this in production at >> Twitter (by disabling RX checksum offloading on veth devices). > > > I am wondering if there is any other CHECKSUM_NONE case in the tx > path we could miss here. Mesos case is too special not only because > it redirects packets from hardware to veth, but also because it moves > packets from RX path to TX path. > > Eric? Tom? > >> >> This code dates back to the first version of the driver, commit >> <e314dbdc1c0dc6a548ecf> ("[NET]: Virtual ethernet device driver"), so I >> suspect this bug occurred mostly because the driver API has evolved >> significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix >> packet checksumming") (in December 2010) fixed this for packets that get >> created locally and sent to hardware devices, by not changing >> CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming >> in from hardware devices. >> >> Co-authored-by: Evan Jones <e...@evanjones.ca> >> Signed-off-by: Evan Jones <e...@evanjones.ca> >> Cc: Nicolas Dichtel <nicolas.dich...@6wind.com> >> Cc: Phil Sutter <p...@nwl.cc> >> Cc: Toshiaki Makita <makita.toshi...@lab.ntt.co.jp> >> Cc: netdev@vger.kernel.org >> Cc: linux-ker...@vger.kernel.org >> Signed-off-by: Vijay Pandurangan <vij...@vijayp.ca> > > Your patch looks good to me but your email client corrupts your patch, > so please resend. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html