On 8/3/20 7:51 PM, Slava Ovsiienko wrote: >> -----Original Message----- >> From: Andrew Rybchenko <arybche...@solarflare.com> >> Sent: Monday, August 3, 2020 18:31 >> To: Slava Ovsiienko <viachesl...@mellanox.com>; dev@dpdk.org >> Cc: Matan Azrad <ma...@mellanox.com>; Raslan Darawsheh >> <rasl...@mellanox.com>; Thomas Monjalon <tho...@monjalon.net>; >> ferruh.yi...@intel.com; jerinjac...@gmail.com; >> step...@networkplumber.org; ajit.khapa...@broadcom.com; >> maxime.coque...@redhat.com; olivier.m...@6wind.com; >> david.march...@redhat.com >> Subject: Re: [PATCH] doc: announce changes to ethdev rxconf structure >> >> Hi Slava, >> >> On 8/3/20 6:18 PM, Slava Ovsiienko wrote: >>> Hi, Andrew >>> >>> Thanks for the comment, please, see below. >>> >>>> -----Original Message----- >>>> From: Andrew Rybchenko <arybche...@solarflare.com> >>>> Sent: Monday, August 3, 2020 17:31 >>>> To: Slava Ovsiienko <viachesl...@mellanox.com>; dev@dpdk.org >>>> Cc: Matan Azrad <ma...@mellanox.com>; Raslan Darawsheh >>>> <rasl...@mellanox.com>; Thomas Monjalon <tho...@monjalon.net>; >>>> ferruh.yi...@intel.com; jerinjac...@gmail.com; >>>> step...@networkplumber.org; ajit.khapa...@broadcom.com; >>>> maxime.coque...@redhat.com; olivier.m...@6wind.com; >>>> david.march...@redhat.com >>>> Subject: Re: ***Spam*** [PATCH] doc: announce changes to ethdev >>>> rxconf structure >>>> >>>> On 8/3/20 1:58 PM, Viacheslav Ovsiienko wrote: >>>>> The DPDK datapath in the transmit direction is very flexible. >>>>> The applications can build multisegment packets and manages almost >>>>> all data aspects - the memory pools where segments are allocated >>>>> from, the segment lengths, the memory attributes like external, >> registered, etc. >>>>> >>>>> In the receiving direction, the datapath is much less flexible, the >>>>> applications can only specify the memory pool to configure the >>>>> receiving queue and nothing more. In order to extend the receiving >>>>> datapath capabilities it is proposed to add the new fields into >>>>> rte_eth_rxconf structure: >>>>> >>>>> struct rte_eth_rxconf { >>>>> ... >>>>> uint16_t rx_split_num; /* number of segments to split */ >>>>> uint16_t *rx_split_len; /* array of segment lengthes */ >>>>> struct rte_mempool **mp; /* array of segment memory pools */ >>>>> ... >>>>> }; >>>>> >>>>> The non-zero value of rx_split_num field configures the receiving >>>>> queue to split ingress packets into multiple segments to the mbufs >>>>> allocated from various memory pools according to the specified >>>>> lengths. The zero value of rx_split_num field provides the backward >>>>> compatibility and queue should be configured in a regular way (with >>>>> single/multiple mbufs of the same data buffer length allocated from >>>>> the single memory pool). >>>> >>>> From the above description it is not 100% clear how it will coexist with: >>>> - existing mb_pool argument of the rte_eth_rx_queue_setup() >>>> - DEV_RX_OFFLOAD_SCATTER >>> >>> DEV_RX_OFFLOAD_SCATTER flag is required to be reported and configured >>> for the new feature to indicate the application is prepared for the >>> multisegment packets. >> >> I hope it will be mentioned in the feature documentation in the future, but >> I'm not 100% sure that it is required. See below. > I suppose there is the hierarchy: > - applications configures DEV_RX_OFFLOAD_SCATTER on the port and tells in > this way: > "Hey, driver, I'm ready to handle multi-segment packets". Readiness in > general. > - application configures BUFFER_SPLIT and tells PMD _HOW_ it wants to split, > in particular way: > "Hey, driver, please, drop ten bytes here, here and here, and the rest - over > there"
My idea is to keep SCATTER and BUFFER_SPLIT independent. SCATTER is a possibility to make multi-segment packets getting mbufs from main rxq mempool as many as required. BUFFER_SPLIT is support of many mempools and splitting received packets as specified. >>> >>> But SCATTER it just tells that ingress packet length can exceed the >>> mbuf data buffer length and the chain of mbufs must be built to store >>> the entire packet. But there is the limitation - all mbufs are >>> allocated from the same memory pool, and all data buffers have the same >> length. >>> The new feature provides an opportunity to allocated mbufs from the >>> desired pools and specifies the length of each buffer/part. >> >> Yes, it is clear, but what happens if packet does not fit into the provided >> pools chain? Is the last used many times? May be it logical to use Rx queue >> setup mb_pool as well for the purpose? I.e. use suggested here pools only >> once and use mb_pool many times for the rest if SCATTER is supported and >> only once if SCATTER is not supported. > > It could be easily configured w/o involving SCATTER flag - just specify the > last pool > multiple times. I.e. > pool 0 - 14B > pool 1 - 20B > ... > pool N - 512B > pool N - 512B > pool N - 512B, sum of length >= max packet size 1518 I see, but IMHO it is excessive. pools 0 .. N-1 could be provided in buffer split config, Nth should be in main RxQ mempool plus SCATTER enabled. > It was supposed the sum of lengths in array covers the maximal packet size. > Currently there is the limitation on packet size, for example mlx5 PMD > just drops the packets with the length exceeded the one queue is configured > for. > >> >>> >>>> - DEV_RX_OFFLOAD_HEADER_SPLIT >>> The new feature (let's name it as "BUFFER_SPLIT") might be supported >>> in conjunction with HEADER_SPLIT (say, split the rest of the data >>> after the header) or rejected if HEADER_SPLIT is configured on the >>> port, depending on PMD implementation (return ENOTSUP if both features >> are requested on the same port). >> >> OK, consider to make SCATTER and BUFFER_SPLIT independent as suggested >> above. > Sorry, do you mean HEADER_SPLIT and BUFFER_SPLIT? See above. >> >>> >>>> How will application know that the feature is supported? Limitations? >>> It is subject for further discussion, I see two options: >>> - introduce the DEV_RX_OFFLOAD_BUFFER_SPLIT flag >> >> +1 > OK, got it. > >> >>> - return ENOTSUP/EINVAL from rx_queue_setup() if feature is requested >>> (mp parameter is supposed to be NULL for the case) >> >> I'd say that it should be used for corner cases only which are hard to >> formalize. It could be important to know maximum number of buffers to >> split, total length which could be split from the remaining, limitations on >> split >> lengths. > Agree, the dedicated OFFLOAD flag seems to be preferable. > > With best regards, Slava > >> >>> >>>> Is it always split by specified/fixed length? >>> Yes, it is simple feature, it splits the data to the buffers with >>> required memory attributes provided by specified pools according to the >> fixed lengths. >>> It should be OK for protocols like eCPRI or some tunneling. >> >> I see. Thanks. >> >>> >>>> What happens if header length is actually different? >>> It is per queue configuration, packets might be sorted with rte_flow engine >> between the queues. >>> The supposed use case is to filter out specific protocol packets (say >>> eCPRI with fixed header length) and split ones on specific Rx queue. >> >> Got it. >> >> Thanks, >> Andrew. >> >>> >>> >>> With best regards, >>> Slava >>> >>>> >>>>> The new approach would allow splitting the ingress packets into >>>>> multiple parts pushed to the memory with different attributes. >>>>> For example, the packet headers can be pushed to the embedded data >>>>> buffers within mbufs and the application data into the external >>>>> buffers attached to mbufs allocated from the different memory pools. >>>>> The memory attributes for the split parts may differ either - for >>>>> example the application data may be pushed into the external memory >>>>> located on the dedicated physical device, say GPU or NVMe. This >>>>> would improve the DPDK receiving datapath flexibility preserving >>>>> compatibility with existing API. >>>>> >>>>> Signed-off-by: Viacheslav Ovsiienko <viachesl...@mellanox.com> >>>>> --- >>>>> doc/guides/rel_notes/deprecation.rst | 5 +++++ >>>>> 1 file changed, 5 insertions(+) >>>>> >>>>> diff --git a/doc/guides/rel_notes/deprecation.rst >>>>> b/doc/guides/rel_notes/deprecation.rst >>>>> index ea4cfa7..cd700ae 100644 >>>>> --- a/doc/guides/rel_notes/deprecation.rst >>>>> +++ b/doc/guides/rel_notes/deprecation.rst >>>>> @@ -99,6 +99,11 @@ Deprecation Notices >>>>> In 19.11 PMDs will still update the field even when the offload is not >>>>> enabled. >>>>> >>>>> +* ethdev: add new fields to ``rte_eth_rxconf`` to configure the >>>>> +receiving >>>>> + queues to split ingress packets into multiple segments according >>>>> +to the >>>>> + specified lengths into the buffers allocated from the specified >>>>> + memory pools. The backward compatibility to existing API is >> preserved. >>>>> + >>>>> * ethdev: ``rx_descriptor_done`` dev_ops and >>>> ``rte_eth_rx_descriptor_done`` >>>>> will be deprecated in 20.11 and will be removed in 21.11. >>>>> Existing ``rte_eth_rx_descriptor_status`` and >>>>> ``rte_eth_tx_descriptor_status`` >>> >