> > My initial reaction is negative on this. The DPDK does not need more nerd > knobs for performance. If it is a performance win, it should be automatic and > handled by the driver. > > If you absolutely have to have another flag, then it should be in existing > config > (yes, extend the ABI) rather than adding more flags and calls in ethdev.
Thanks, Steve, for the feedback. My thesis is that in a DPDK-based packet processing system, the application is more knowledgeable of memory buffer (packets) usage than the generic underlying hardware or the PMD (I have provided some examples below with the hint they would map into). Recognizing such cases, PCI SIG introduced TLP Packet Processing Hints (TPH). Consequently, many interconnect designers enabled support for TPH in their interconnects so that based on steering tags provided by an application to a NIC, which sets them in the TLP header, memory buffers can be targeted toward a CPU at the desired level in the cache hierarchy. With this proposed API, applications provide cache-stashing hints to ethernet devices to improve memory access latencies from the CPU and the NIC to improve system performance. Listed below are some use cases. - A run-to-completion application may not need the next packet immediately in L1D. It may rather issue a prefetch and do other work with packet and application data already in L1D before it needs the next packet. A generic PMD will not know such subtleties in the application endpoint, and it would resolve to stash buffers into the L1D indiscriminately or not do it at all. But, with a hint from the application that buffers of the packets will be stashed at a cache level suitable for the application. (like UNIX MADV_DONOTNEED but for mbufs at cache line granularity) - Similarly, a pipelined application may use a hint that advice the buffers are needed in L1D as soon as they arrive. (parallels MADV_WILLNEED) - Let's call the time between a mbuf being allocated into an Rx queue, freed back into mempool in the Tx path, and once again reallocated back in the Same Rx queue the "buffer recycle window". The length of the buffer recycle window is a function of the application in question; the PMD or the NIC has no prior knowledge of this property of an application. A buffer may stay in the L1D of a CPU throughout the entire recycle window if the window is short enough for that application. An application with a short buffer recycle window may hint to the platform that the Tx buffer is not needed anytime soon in the CPU cache via a hint to avoid unnecessary cache invalidations when the buffer gets written by the Rx packet for the second time. (parallels MADV_DONOTNEED)