>-----Original Message-----
>From: Trahe, Fiona <fiona.tr...@intel.com>
>Sent: 16 September 2018 15:38
>To: Verma, Shally <shally.ve...@cavium.com>; dev@dpdk.org
>Cc: Daly, Lee <lee.d...@intel.com>; Jozwiak, TomaszX
><tomaszx.jozw...@intel.com>; Akhil Goyal <akhil.go...@nxp.com>; Sahu,
>Sunila <sunila.s...@cavium.com>; Gupta, Ashish <ashish.gu...@cavium.com>;
>Trahe, Fiona <fiona.tr...@intel.com>
>Subject: RE: compressdev: append dest data in PMDs instead of in application
>
>External Email
>
>Hi Shally,
>
>> -----Original Message-----
>> From: Verma, Shally [mailto:shally.ve...@cavium.com]
>> Sent: Saturday, September 15, 2018 12:32 PM
>> To: Trahe, Fiona <fiona.tr...@intel.com>; dev@dpdk.org
>> Cc: Daly, Lee <lee.d...@intel.com>; Jozwiak, TomaszX
>> <tomaszx.jozw...@intel.com>; Akhil Goyal
>> <akhil.go...@nxp.com>; Sahu, Sunila <sunila.s...@cavium.com>; Gupta, Ashish
>> <ashish.gu...@cavium.com>
>> Subject: RE: compressdev: append dest data in PMDs instead of in application
>>
>> HI Fiona
>>
>> >-----Original Message-----
>> >From: Trahe, Fiona <fiona.tr...@intel.com>
>> >Sent: 11 September 2018 22:47
>> >To: dev@dpdk.org
>> >Cc: Trahe, Fiona <fiona.tr...@intel.com>; Verma, Shally
>> ><shally.ve...@cavium.com>; Daly, Lee
>> <lee.d...@intel.com>; Jozwiak,
>> >TomaszX <tomaszx.jozw...@intel.com>; Akhil Goyal <akhil.go...@nxp.com>
>> >Subject: RFC: compressdev: append dest data in PMDs instead of in
>> >application
>> >
>> >External Email
>> >
>> >Proposal:
>> >In compressdev, move the responsibility for appending data in the
>> >destination
>> >mbuf from the application to the PMD.
>> >
>> >Why:
>> >Compression operations are all out-of-place and the output size is unknown.
>> >Source and destination mbufs are passed to the PMDs.
>> >There's no problem with the src, but there is with the destination.
>> >The application allocates the dest mbuf from the pool, guesses how much of
>> >the buffer the PMD will write with output and calls rte_pktmbuf_append()
>> >for this amount.
>> >No data is actually copied into the dest mbuf by the application.
>> >
>> >The PMD writes output data to the destination mbuf, and returns the amount
>> >written in the
>> >op.produced parameter.
>> >
>> >Throughout this the mbuf is not consistent with expectations, i.e. a call to
>> >rte_pktmbuf_data_len() will NOT return the actual amount of data in the
>> >buffer. A call to
>> >rte_pktmbuf_append() will NOT add space at end of existing data.
>> >rte_pktmbuf_tailroom()
>> > will not return how much space is available at the end of the data.
>> >Also, some PMDs, e.g. ISA-L, need scratch space at end of the output buffer
>> >for checksum
>> >calculation. So though the appl has appended a specific amount in the
>> >expectation that it
>> >can be used for compressed data, the PMD needs to reduce this by 4 bytes to
>> >reserve
>> >space for the checksum - even though there may be space to append another
>> >4bytes.
>> >
>> Agree to this issue in first place. However before we start on this. I have
>> a scenario and question based on
>> it (probably you've already thought it but I couldn't guess limitations here)
>>
>> Given all limitations of mbuf library(especially) on SGL and compression op
>> behaviour and complexities it's
>> going to add on PMD, why we can't just input length of dst buffer from an
>> app and leave all of
>> mbuf management to app,* if it's using mbuf data buffer* i.e. consider this
>> scenario:
>>
>> App alloc rte_mbuf and call rte_pktmbuf_tailroom() to update dst.len with
>> amount of free space. PMD
>> setup HW according to offset data pointer and length given.
>> When op is done, then PMD updates op.produced. App call rte_pktmbuf_append()
>> on mbuf to update m-
>> >data/pkt_len.
>>
>> Now there're following possibilities with op.produced:
>> - if op.produced < tailroom size , app call tailroom size again to find
>> leftover free space and pass offset +=
>> op.produced and new length in next call
>> - if op.produced == tailroom size, app alloc a new mbuf and chain it to
>> previous mbuf, call tailroom size on
>> new mbuf, set offset += op.produced and length = free space in new mbuf. PMD
>> now will have a chained
>> mbuf in input and offset will now end up in new chained buffer.
>>
>> This, of course, mean PMD can have chained mbuf but only last buffer in
>> chain available for use.
>> This is a limitation but only as long as we're working with MBUFs library
>> which were originally designed for
>> other purposes and not so better suited for compression-like operations.
>[Fiona] I like this idea as it keeps all the mbuf mgmt. in the application.
>But it removes the possibility of passing more than 64k-1 in one op, which I
>think is unacceptable for storage purposes.
>As regards mbuf alternatives you mention below, yes, that's still open. The
>mbuf external memory
>feature which was recently added may suffice, we wanted to try that out before
>proposing alternatives.
>I'm focussed on other work for 18.11 at the moment, so will get back to this
>topic after that release.
[Shally] For storage, in any case mbuf data buffer doesn't seem an apt data
structure as they will always have some limitations.
External buffer attachment maybe worth considering though. I would be curious
to know if it behave stably.
Thanks
Shally
>
>>
>> Why I believe it is more simpler to manage:
>> -It keeps PMD implementation simple and lives with limitation/expectations
>> of mbuf library
>> -Having length at input will also be a requirement if we add any other
>> buffer structures types.
>>
>> Although I am not sure if it can solve checksum issue you mentioned as I
>> don't fully understood how ISAL
>> uses dst buffer for checksum o/p. Does it write checksum in dst buffer for
>> every op? Or just when last
>> chunk of data is processed?
>> We already have separate checksum variable at input/for output. So, doesn't
>> it just copy from dst buffer to
>> op at output and op.produced updated with only data length?
>>
>> I remember, initially you had started a discussion to add new and more
>> generic buffer structure type to use
>> in place of rte_mbufs for compression library , such as iovec which simply
>> input pointer to raw bufs with
>> length.
>> Aren't we thinking in that direction anymore? I believe rather than
>> reworking a library/PMD based/around
>> mbufs (unless we've any requirement to support mbufs so), we should define a
>> buffer variant to better suit
>> compression and its users need. It also allow us to define lib to input
>> multiple dst buffers either as an array
>> or SGL.
>>
>>
>> Thanks
>> Shally
>>
>> >It seems more logical that the PMD should take responsibility for appending.
>> >I.e. the application should pass in an empty mbuf, the PMD uses the
>> >rte_pktmbuf_tailroom()
>> >to know what space is available, uses this space as it needs and appends
>> >the output data to
>> >match the actual amount of data it writes.
>> >This method caters for an mbuf already containing some data, in this case
>> >the PMD will
>> >append the output AFTER the data already in the mbuf.
>> >It also needs to cater for SGL aka chained_mbuf case, though I'd propose in
>> >this case only
>> >the first mbuf in the chain is allowed to already contain data, the rest
>> >must be empty.
>> >An application can adjust a chain to satisfy this condition.
>> >
>> >Code impacts:
>> > * rte_comp_op.dst.offset should be deprecated.
>> > * comments on the m_dst would be changed to describe this usage
>> > * applications should alloc destination mbuf(s) from a pool, but not
>> > append.
>> > * PMDs should use append() to correctly reflect the amount of data
>> > returned.
>> >
>> >This turns out to be more complex than expected for SGLs as rte_mbuf chains
>> >DON'T support
>> >empty mbuf chains, i.e. several macros assume there's only tailroom
>> >available in the last
>> >segment of a chain. So essentially if an application chains a bunch of
>> >empty mbufs
>> >together the only tailroom available is the buf_len of the last mbuf and
>> >the space in
>> >all the other mbufs is "lost".
>> >We've investigated several ways around this, I think either options 1 or 2
>> >would be ok, but
>> >am not keen on option 3. I'm looking for feedback please.
>> >
>> >Option1: create an rte_comp_mbuf, identical to rte_mbuf - probably just
>> >cast to it - with its own set of
>> > macros. Most of these wrap existing mbuf macros, those that need to
>> > cater for the empty chain case.
>> >Option2: pass in a pool on the API instead and the PMD allocs dest mbufs as
>> >needed and chains them.
>> > Storage customers expect to use external buffers attached to mbufs so
>> > this may not suit their use-case.
>> > Although it may be an option if all mbufs in the pool are pre-attached
>> > to external buffers.
>> >Option3: appl does as today. PMD trims space not used. Would need changes
>> >or additions to mbuf
>> macros
>> > so it could trim from more than just the last segment. PMD would need to
>> > free unused segments.
>> > I'm not keen on this as doing the work in 2 places and there's potential
>> > to leak mbufs here as allocating them in API and freeing in PMD.
>> >
>> >Regards,
>> >Fiona
>> >