From: "Ilpo_Järvinen" <[EMAIL PROTECTED]> Date: Fri, 7 Dec 2007 13:05:46 +0200 (EET)
> I guess if you get a large cumulative ACK, the amount of processing is > still overwhelming (added DaveM if he has some idea how to combat it). > > Even a simple scenario (this isn't anything fancy at all, will occur all > the time): Just one loss => rest skbs grow one by one into a single > very large SACK block (and we do that efficiently for sure) => then the > fast retransmit gets delivered and a cumulative ACK for whole orig_window > arrives => clean_rtx_queue has to do a lot of processing. In this case we > could optimize RB-tree cleanup away (by just blanking it all) but still > getting rid of all those skbs is going to take a larger moment than I'd > like to see. > > That tree blanking could be extended to cover anything which ACK more than > half of the tree by just replacing the root (and dealing with potential > recolorization of the root). Yes, it's the classic problem. But it ought to be at least partially masked when TSO is in use, because we'll only process a handful of SKBs. The more effectively TSO batches, the less work clean_rtx_queue() will do. When not doing TSO the behavior is super-stupid, we bump reference counts on the same page multiple times while running over the SKBs since consequetive SKBs cover data in different spans of the same page. The core issue is that we have a poorly behaving data container, and therefore that's obviously what we need to change. Conceptually what we probably need to do is seperate the data maintainence from the SKB objects themselves. There is a blob that maintains the paged data state for everything in the retransmit queue. SKBs are built and get the page pointers but don't actually grab references to the pages, the blob does that and it keeps track of how many SKB references to each page there are, non-atomically. The hardest part is dealing with the page lifetime issues. Unfortunately, when we trim the rtx queue, references to the clones can still exist in the driver output path. It's a difficult problem to overcome in fact, so in the end my suggestion above might not even be workable. > No idea about what it could do, haven't yet looked web100, I was planning > at some point of time... Web100 just provides statistics and other kinds of connection data to userspace, all the actual algorithm etc. modifications have been merged upstream and yanked out of the web100 patch. I was looking at it the other night and it's frankly totally uninteresting these days :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html