On Wed, Feb 22, 2017 at 1:43 AM, Jesper Dangaard Brouer <bro...@redhat.com> wrote: > > On Tue, 21 Feb 2017 14:54:35 -0800 Tom Herbert <t...@herbertland.com> wrote: >> On Tue, Feb 21, 2017 at 2:29 PM, Saeed Mahameed <sae...@dev.mellanox.co.il> >> wrote: > [...] >> > The only complexity XDP is adding to the drivers is the constrains on >> > RX memory management and memory model, calling the XDP program itself >> > and handling the action is really a simple thing once you have the >> > correct memory model. > > Exactly, that is why I've been looking at introducing a generic > facility for a memory model for drivers. This should help simply > drivers. Due to performance needs this need to be a very thin API layer > on top of the page allocator. (That's why I'm working with Mel Gorman > to get more close integration with the page allocator e.g. a bulking > facility). > >> > Who knows! maybe someday XDP will define one unified RX API for all >> > drivers and it even will handle normal stack delivery it self :). >> > >> That's exactly the point and what we need for TXDP. I'm missing why >> doing this is such rocket science other than the fact that all these >> drivers are vastly different and changing the existing API is >> unpleasant. The only functional complexity I see in creating a generic >> batching interface is handling return codes asynchronously. This is >> entirely feasible though... > > I'll be happy as long as we get a batching interface, then we can > incrementally do the optimizations later. > > In the future, I do hope (like Saeed) this RX API will evolve into > delivering (a bulk of) raw-packet-pages into the netstack, this should > simplify drivers, and we can keep the complexity and SKB allocations > out of the drivers. > To start with, we can play with doing this delivering (a bulk of) > raw-packet-pages into Tom's TXDP engine/system? > Hi Jesper,
Maybe we can to start to narrow in on what a batching API might look like. Looking at mlx5 (as a model of how XDP is implemented) the main RX loop in ml5e_poll_rx_cq calls the backend handler in one indirect function call. The XDP path goes through mlx5e_handle_rx_cqe, skb_from_cqe, and mlx5e_xdp_handle. The first two deal a lot with building the skbuf. As a prerequisite to RX batching it would be helpful if this could be flatten so that most of the logic is obvious in the main RX loop. The model of RX batching seems straightforward enough-- pull packets from the ring, save xdp_data information in a vector, periodically call into the stack to handle a batch where argument is the vector of packets and another argument is an output vector that gives return codes (XDP actions), process the each return code for each packet in the driver accordingly. Presumably, there is a maximum allowed batch that may or may not be the same as the NAPI budget so the so the batching call needs to be done when the limit is reach and also before exiting NAPI. For each packet the stack can return an XDP code, XDP_PASS in this case could be interpreted as being consumed by the stack; this would be used in the case the stack creates an skbuff for the packet. The stack on it's part can process the batch how it sees fit, it can process each packet individual in the canonical model, or we can continue processing a batch in a VPP-like fashion. The batching API could be transparent to the stack or not. In the transparent case, the driver calls what looks like a receive function but the stack may defer processing for batching. A callback function (that can be inlined) is used to process return codes as I mentioned previously. In the non-transparent model, the driver knowingly creates the packet vector and then explicitly calls another function to process the vector. Personally, I lean towards the transparent API, this may be less complexity in drivers and gives the stack more control over the parameters of batching (for instance it may choose some batch size to optimize its processing instead of driver guessing the best size). Btw the logic for RX batching is very similar to how we batch packets for RPS (I think you already mention an skb-less RPS and that should hopefully be something would falls out from this design). Tom