>-----Original Message-----
>From: Trahe, Fiona <fiona.tr...@intel.com>
>Sent: 16 September 2018 15:38
>To: Verma, Shally <shally.ve...@cavium.com>; dev@dpdk.org
>Cc: Daly, Lee <lee.d...@intel.com>; Jozwiak, TomaszX 
><tomaszx.jozw...@intel.com>; Akhil Goyal <akhil.go...@nxp.com>; Sahu,
>Sunila <sunila.s...@cavium.com>; Gupta, Ashish <ashish.gu...@cavium.com>; 
>Trahe, Fiona <fiona.tr...@intel.com>
>Subject: RE: compressdev: append dest data in PMDs instead of in application
>
>External Email
>
>Hi Shally,
>
>> -----Original Message-----
>> From: Verma, Shally [mailto:shally.ve...@cavium.com]
>> Sent: Saturday, September 15, 2018 12:32 PM
>> To: Trahe, Fiona <fiona.tr...@intel.com>; dev@dpdk.org
>> Cc: Daly, Lee <lee.d...@intel.com>; Jozwiak, TomaszX 
>> <tomaszx.jozw...@intel.com>; Akhil Goyal
>> <akhil.go...@nxp.com>; Sahu, Sunila <sunila.s...@cavium.com>; Gupta, Ashish
>> <ashish.gu...@cavium.com>
>> Subject: RE: compressdev: append dest data in PMDs instead of in application
>>
>> HI Fiona
>>
>> >-----Original Message-----
>> >From: Trahe, Fiona <fiona.tr...@intel.com>
>> >Sent: 11 September 2018 22:47
>> >To: dev@dpdk.org
>> >Cc: Trahe, Fiona <fiona.tr...@intel.com>; Verma, Shally 
>> ><shally.ve...@cavium.com>; Daly, Lee
>> <lee.d...@intel.com>; Jozwiak,
>> >TomaszX <tomaszx.jozw...@intel.com>; Akhil Goyal <akhil.go...@nxp.com>
>> >Subject: RFC: compressdev: append dest data in PMDs instead of in 
>> >application
>> >
>> >External Email
>> >
>> >Proposal:
>> >In compressdev, move the responsibility for appending data in the 
>> >destination
>> >mbuf from the application to the PMD.
>> >
>> >Why:
>> >Compression operations are all out-of-place and the output size is unknown.
>> >Source and destination mbufs are passed to the PMDs.
>> >There's no problem with the src, but there is with the destination.
>> >The application allocates the dest mbuf from the pool, guesses how much of
>> >the buffer the PMD will write with output and calls rte_pktmbuf_append() 
>> >for this amount.
>> >No data is actually copied into the dest mbuf by the application.
>> >
>> >The PMD writes output data to the destination mbuf, and returns the amount 
>> >written in the
>> >op.produced parameter.
>> >
>> >Throughout this the mbuf is not consistent with expectations, i.e. a call to
>> >rte_pktmbuf_data_len() will NOT return the actual amount of data in the 
>> >buffer. A call to
>> >rte_pktmbuf_append() will NOT add space at end of existing data. 
>> >rte_pktmbuf_tailroom()
>> > will not return how much space is available at the end of the data.
>> >Also, some PMDs, e.g. ISA-L, need scratch space at end of the output buffer 
>> >for checksum
>> >calculation. So though the appl has appended a specific amount in the 
>> >expectation that it
>> >can be used for compressed data, the PMD needs to reduce this by 4 bytes to 
>> >reserve
>> >space for the checksum - even though there may be space to append another 
>> >4bytes.
>> >
>> Agree to this issue in first place. However before we start on this. I have 
>> a scenario and question based on
>> it (probably you've already thought it but I couldn't guess limitations here)
>>
>> Given all limitations of mbuf library(especially) on SGL and compression op 
>> behaviour and complexities it's
>> going to add on PMD, why we can't just input length of dst buffer from an 
>> app and leave all of
>> mbuf management to app,* if it's using mbuf data buffer* i.e. consider this 
>> scenario:
>>
>> App alloc rte_mbuf and call rte_pktmbuf_tailroom() to update dst.len with 
>> amount of free space. PMD
>> setup HW according to offset data pointer and length given.
>> When op is done, then PMD updates op.produced. App call rte_pktmbuf_append() 
>> on mbuf to update m-
>> >data/pkt_len.
>>
>> Now there're following possibilities with op.produced:
>> - if  op.produced < tailroom size ,  app call tailroom size again to find 
>> leftover free space and pass offset +=
>> op.produced and new length in next call
>> - if op.produced == tailroom size, app alloc a new mbuf and chain it to 
>> previous mbuf, call tailroom size on
>> new mbuf, set offset += op.produced and length = free space in new mbuf. PMD 
>> now will have a chained
>> mbuf in input and offset will now end up in new chained buffer.
>>
>> This, of course, mean PMD can have chained mbuf but only last buffer in 
>> chain available for use.
>> This is a limitation but only as long as we're working with MBUFs library 
>> which were originally designed for
>> other purposes and not so better suited for compression-like operations.
>[Fiona] I like this idea as it keeps all the mbuf mgmt. in the application.
>But it removes the possibility of passing more than 64k-1 in one op, which I 
>think is unacceptable for storage purposes.
>As regards mbuf alternatives you mention below, yes, that's still open. The 
>mbuf external memory
>feature which was recently added may suffice, we wanted to try that out before 
>proposing alternatives.
>I'm focussed on other work for 18.11 at the moment, so will get back to this 
>topic after that release.

[Shally] For storage, in any case mbuf data buffer doesn't seem an apt data 
structure as they will always have some limitations.
External buffer attachment maybe worth considering though. I would be curious 
to know if it behave stably. 

Thanks
Shally

>
>>
>> Why I believe it is more simpler to manage:
>> -It keeps PMD implementation simple and lives with limitation/expectations 
>> of mbuf library
>> -Having length at input will also be a requirement if we add any other 
>> buffer structures types.
>>
>> Although I am not sure if it can solve checksum issue you mentioned as I 
>> don't fully understood how ISAL
>> uses dst buffer for checksum o/p. Does it write checksum in dst buffer for 
>> every op? Or just when last
>> chunk of data is processed?
>> We already have separate checksum variable at input/for output. So, doesn't 
>> it just copy from dst buffer to
>> op at output and op.produced updated with only data length?
>>
>> I remember, initially you had started a discussion to add new and more 
>> generic buffer structure type to use
>> in place of rte_mbufs for compression library , such as iovec which simply 
>> input pointer to raw bufs with
>> length.
>> Aren't we thinking in that direction anymore? I believe rather than 
>> reworking a library/PMD based/around
>> mbufs (unless we've any requirement to support mbufs so), we should define a 
>> buffer variant to better suit
>> compression and its users need. It also allow us to define lib to input 
>> multiple dst buffers either as an array
>> or SGL.
>>
>>
>> Thanks
>> Shally
>>
>> >It seems more logical that the PMD should take responsibility for appending.
>> >I.e. the application should pass in an empty mbuf, the PMD uses the 
>> >rte_pktmbuf_tailroom()
>> >to know what space is available, uses this space as it needs and appends 
>> >the output data to
>> >match the actual amount of data it writes.
>> >This method caters for an mbuf already containing some data, in this case 
>> >the PMD will
>> >append the output AFTER the data already in the mbuf.
>> >It also needs to cater for SGL aka chained_mbuf case, though I'd propose in 
>> >this case only
>> >the first mbuf in the chain is allowed to already contain data, the rest 
>> >must be empty.
>> >An application can adjust a chain to satisfy this condition.
>> >
>> >Code impacts:
>> > * rte_comp_op.dst.offset should be deprecated.
>> > * comments on the m_dst would be changed to describe this usage
>> > * applications should alloc destination mbuf(s) from a pool, but not 
>> > append.
>> > * PMDs should use append() to correctly reflect the amount of data 
>> > returned.
>> >
>> >This turns out to be more complex than expected for SGLs as rte_mbuf chains 
>> >DON'T support
>> >empty mbuf chains, i.e. several macros assume there's only tailroom 
>> >available in the last
>> >segment of a chain. So essentially if an application chains a bunch of 
>> >empty mbufs
>> >together the only tailroom available is the buf_len of the last mbuf and 
>> >the space in
>> >all the other mbufs is "lost".
>> >We've investigated several ways around this, I think either options 1 or 2 
>> >would be ok, but
>> >am not keen on option 3. I'm looking for feedback please.
>> >
>> >Option1: create an rte_comp_mbuf, identical to rte_mbuf - probably just 
>> >cast to it - with its own set of
>> >   macros.  Most of these wrap existing mbuf macros, those that need to 
>> > cater for the empty chain case.
>> >Option2: pass in a pool on the API instead and the PMD allocs dest mbufs as 
>> >needed and chains them.
>> >   Storage customers expect to use external buffers attached to mbufs so 
>> > this may not suit their use-case.
>> >   Although it may be an option if all mbufs in the pool are pre-attached 
>> > to external buffers.
>> >Option3: appl does as today. PMD trims space not used. Would need changes 
>> >or additions to mbuf
>> macros
>> >   so it could trim from more than just the last segment. PMD would need to 
>> > free unused segments.
>> >   I'm not keen on this as doing the work in 2 places and there's potential
>> >   to leak mbufs here as allocating them in API and freeing in PMD.
>> >
>> >Regards,
>> >Fiona
>> >

Reply via email to