14/10/2020 20:11, Viacheslav Ovsiienko:
> The DPDK datapath in the transmit direction is very flexible.
> An application can build the multi-segment packet and manages
> almost all data aspects - the memory pools where segments
> are allocated from, the segment lengths, the memory attributes
> like external buffers, registered for DMA, etc.
> 
> In the receiving direction, the datapath is much less flexible,
> an application can only specify the memory pool to configure the
> receiving queue and nothing more. In order to extend receiving
> datapath capabilities it is proposed to add the way to provide
> extended information how to split the packets being received.
> 
> The following structure is introduced to specify the Rx packet
> segment:
> 
> struct rte_eth_rxseg {
>     struct rte_mempool *mp; /* memory pools to allocate segment from */
>     uint16_t length; /* segment maximal data length,
>                       configures "split point" */
>     uint16_t offset; /* data offset from beginning
>                       of mbuf data buffer */
>     uint32_t reserved; /* reserved field */
> };
> 
> The segment descriptions are added to the rte_eth_rxconf structure:
>    rx_seg - pointer the array of segment descriptions, each element
>              describes the memory pool, maximal data length, initial
>              data offset from the beginning of data buffer in mbuf.
>            This array allows to specify the different settings for
>            each segment in individual fashion.
>    rx_nseg - number of elements in the array
> 
> If the extended segment descriptions is provided with these new
> fields the mp parameter of the rte_eth_rx_queue_setup must be
> specified as NULL to avoid ambiguity.
> 
> There are two options to specifiy Rx buffer configuration:
> - mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero,
>   it is compatible configuraion, follows existing implementation,
>   provides single pool and no description for segment sizes
>   and offsets.
> - mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
>   zero, it provides the extended configuration, individually for
>   each segment.
> 
> The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
> capabilities is introduced to present the way for PMD to report to
> application about supporting Rx packet split to configurable
> segments. Prior invoking the rte_eth_rx_queue_setup() routine
> application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
> 
> If the Rx queue is configured with new settings the packets being
> received will be split into multiple segments pushed to the mbufs
> with specified attributes. The PMD will split the received packets
> into multiple segments according to the specification in the
> description array:
> 
> - the first network buffer will be allocated from the memory pool,
>   specified in the first segment description element, the second
>   network buffer - from the pool in the second segment description
>   element and so on. If there is no enough elements to describe
>   the buffer for entire packet of maximal length the pool from the
>   last valid element will be used to allocate the buffers from for the
>   rest of segments
> 
> - the offsets from the segment description elements will provide
>   the data offset from the buffer beginning except the first mbuf -
>   for this one the offset is added to the RTE_PKTMBUF_HEADROOM to get
>   actual offset from the buffer beginning. If there is no enough
>   elements to describe the buffer for entire packet of maximal length
>   the offsets for the rest of segment will be supposed to be zero.
> 
> - the data length being received to each segment is limited  by the
>   length specified in the segment description element. The data
>   receiving starts with filling up the first mbuf data buffer, if the
>   specified maximal segment length is reached and there are data
>   remaining (packet is longer than buffer in the first mbuf) the
>   following data will be pushed to the next segment up to its own
>   maximal length. If the first two segments is not enough to store
>   all the packet remaining data  the next (third) segment will
>   be engaged and so on. If the length in the segment description
>   element is zero the actual buffer size will be deduced from
>   the appropriate memory pool properties. If there is no enough
>   elements to describe the buffer for entire packet of maximal
>   length the buffer size will be deduced from the pool of the last
>   valid element for the remaining segments.
> 
> For example, let's suppose we configured the Rx queue with the
> following segments:
>     seg0 - pool0, len0=14B, off0=2
>     seg1 - pool1, len1=20B, off1=128B
>     seg2 - pool2, len2=20B, off2=0B
>     seg3 - pool3, len3=512B, off3=0B
> 
> The packet 46 bytes long will look like the following:
>     seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B long @ 128 in mbuf from pool1
>     seg2 - 12B long @ 0 in mbuf from pool2
> 
> The packet 1500 bytes long will look like the following:
>     seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
>     seg1 - 20B @ 128 in mbuf from pool1
>     seg2 - 20B @ 0 in mbuf from pool2
>     seg3 - 512B @ 0 in mbuf from pool3
>     seg4 - 512B @ 0 in mbuf from pool3
>     seg5 - 422B @ 0 in mbuf from pool3
> 
> The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
> configured to support new buffer split feature (if rx_nseg
> is greater than one).
> 
> The new approach would allow splitting the ingress packets into
> multiple parts pushed to the memory with different attributes.
> For example, the packet headers can be pushed to the embedded
> data buffers within mbufs and the application data into
> the external buffers attached to mbufs allocated from the
> different memory pools. The memory attributes for the split
> parts may differ either - for example the application data
> may be pushed into the external memory located on the dedicated
> physical device, say GPU or NVMe. This would improve the DPDK
> receiving datapath flexibility with preserving compatibility
> with existing API.
> 
> Signed-off-by: Viacheslav Ovsiienko <viachesl...@nvidia.com>

A large part of this commit log can be dropped because redundant
with the doxygen comments.

Acked-by: Thomas Monjalon <tho...@monjalon.net>


Reply via email to