On Sun, Jan 24, 2016 at 6:28 AM, Jesper Dangaard Brouer <bro...@redhat.com> wrote: > On Thu, 21 Jan 2016 10:54:01 -0800 (PST) > David Miller <da...@davemloft.net> wrote: > >> From: Jesper Dangaard Brouer <bro...@redhat.com> >> Date: Thu, 21 Jan 2016 12:27:30 +0100 >> >> > eth_type_trans() does two things: >> > >> > 1) determine skb->protocol >> > 2) setup skb->pkt_type = PACKET_{BROADCAST,MULTICAST,OTHERHOST} >> > >> > Could the HW descriptor deliver the "proto", or perhaps just some bits >> > on the most common proto's? >> > >> > The skb->pkt_type don't need many bits. And I bet the HW already have >> > the information. The BROADCAST and MULTICAST indication are easy. The >> > PACKET_OTHERHOST, can be turned around, by instead set a PACKET_HOST >> > indication, if the eth->h_dest match the devices dev->dev_addr (else a >> > SW compare is required). >> > >> > Is that doable in hardware? >> >> I feel like we've had this discussion before several years ago. >> >> I think having just the protocol value would be enough. >> >> skb->pkt_type we could deal with by using always an accessor and >> evaluating it lazily. Nothing needs it until we hit ip_rcv() or >> similar. > > First I thought, I liked the idea delaying the eval of skb->pkt_type. > > BUT then I realized, what if we take this even further. What if we > actually use this information, for something useful, at this very > early RX stage. > > The information I'm interested in, from the HW descriptor, is if this > packet is NOT for local delivery. If so, we can send the packet on a > "fast-forward" code path. > > Think about bridging packets to a guest OS. Because we know very > early at RX (from packet HW descriptor) we might even avoid allocating > a SKB. We could just "forward" the packet-page to the guest OS. > > Taking Eric's idea, of remote CPUs, we could even send these > packet-pages to a remote CPU (e.g. where the guest OS is running), > without having touched a single cache-line in the packet-data. I > would still bundle them up first, to amortize the (100-133ns) cost of > transferring something to another CPU. > You mean like RPS/RFS/aRFS/flow_director already does (except for the zero-touch part)?
> The data-cache trick, would be to instruct prefetcher only to start > prefetching to L3 or L2, when these packet are destined for a remote > CPU. At-least Intel CPUs have prefetch operations that specify only > L2/L3 cache. > > > Maybe, we need a combined solution. Lazy eval skb->pkt_type, for > local delivery, but set the information if avail from HW desc. And > fast page-forward don't even need a SKB. > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > Author of http://www.iptv-analyzer.org > LinkedIn: http://www.linkedin.com/in/brouer