> 
> My initial reaction is negative on this. The DPDK does not need more nerd
> knobs for performance. If it is a performance win, it should be automatic and
> handled by the driver.
> 
> If you absolutely have to have another flag, then it should be in existing 
> config
> (yes, extend the ABI) rather than adding more flags and calls in ethdev.


Thanks, Steve, for the feedback. My thesis is that in a DPDK-based packet 
processing system,
the application is more knowledgeable of memory buffer (packets) usage than the 
generic
underlying hardware or the PMD (I have provided some examples below with the 
hint they
would map into). Recognizing such cases, PCI SIG introduced TLP Packet 
Processing Hints (TPH).
Consequently, many interconnect designers enabled support for TPH in their 
interconnects so
that based on steering tags provided by an application to a NIC, which sets 
them in the TLP
header, memory buffers can be targeted toward a CPU at the desired level in the 
cache hierarchy.
With this proposed API, applications provide cache-stashing hints to ethernet 
devices to improve
memory access latencies from the CPU and the NIC to improve system performance.

Listed below are some use cases.

- A run-to-completion application may not need the next packet immediately in 
L1D. It may rather
issue a prefetch and do other work with packet and application data already in 
L1D before it needs
the next packet. A generic PMD will not know such subtleties in the application 
endpoint, and it
would resolve to stash buffers into the L1D indiscriminately or not do it at 
all. But, with a hint from
the application that buffers of the packets will be stashed at a cache level 
suitable for the
application. (like UNIX MADV_DONOTNEED but for mbufs at cache line granularity)

- Similarly, a pipelined application may use a hint that advice the buffers are 
needed in L1D as soon
as they arrive. (parallels MADV_WILLNEED)

- Let's call the time between a mbuf being allocated into an Rx queue, freed 
back into mempool in
the Tx path, and once again reallocated back in the Same Rx queue the "buffer 
recycle window". 
The length of the buffer recycle window is a function of the application in 
question; the PMD or the
NIC has no prior knowledge of this property of an application. A buffer may 
stay in the L1D of a CPU
throughout the entire recycle window if the window is short enough for that 
application.
An application with a short buffer recycle window may hint to the platform that 
the Tx buffer is not
needed anytime soon in the CPU cache via a hint to avoid unnecessary cache 
invalidations when
the buffer gets written by the Rx packet for the second time. (parallels 
MADV_DONOTNEED)

Reply via email to