Hi Fiona > -----Original Message----- > From: Trahe, Fiona [mailto:fiona.tr...@intel.com] > Sent: 12 January 2018 00:24 > To: Verma, Shally <shally.ve...@cavium.com>; Ahmed Mansour > <ahmed.mans...@nxp.com>; dev@dpdk.org > Cc: Athreya, Narayana Prasad <narayanaprasad.athr...@cavium.com>; > Gupta, Ashish <ashish.gu...@cavium.com>; Sahu, Sunila > <sunila.s...@cavium.com>; De Lara Guarch, Pablo > <pablo.de.lara.gua...@intel.com>; Challa, Mahipal > <mahipal.cha...@cavium.com>; Jain, Deepak K <deepak.k.j...@intel.com>; > Hemant Agrawal <hemant.agra...@nxp.com>; Roy Pledge > <roy.ple...@nxp.com>; Youri Querry <youri.querr...@nxp.com>; Trahe, > Fiona <fiona.tr...@intel.com> > Subject: RE: [RFC v2] doc compression API for DPDK > > Hi Shally, Ahmed, > > > > -----Original Message----- > > From: Verma, Shally [mailto:shally.ve...@cavium.com] > > Sent: Wednesday, January 10, 2018 12:55 PM > > To: Ahmed Mansour <ahmed.mans...@nxp.com>; Trahe, Fiona > <fiona.tr...@intel.com>; dev@dpdk.org > > Cc: Athreya, Narayana Prasad <narayanaprasad.athr...@cavium.com>; > Gupta, Ashish > > <ashish.gu...@cavium.com>; Sahu, Sunila <sunila.s...@cavium.com>; > De Lara Guarch, Pablo > > <pablo.de.lara.gua...@intel.com>; Challa, Mahipal > <mahipal.cha...@cavium.com>; Jain, Deepak K > > <deepak.k.j...@intel.com>; Hemant Agrawal > <hemant.agra...@nxp.com>; Roy Pledge > > <roy.ple...@nxp.com>; Youri Querry <youri.querr...@nxp.com> > > Subject: RE: [RFC v2] doc compression API for DPDK > > > > HI Ahmed > > > > > -----Original Message----- > > > From: Ahmed Mansour [mailto:ahmed.mans...@nxp.com] > > > Sent: 10 January 2018 00:38 > > > To: Verma, Shally <shally.ve...@cavium.com>; Trahe, Fiona > > > <fiona.tr...@intel.com>; dev@dpdk.org > > > Cc: Athreya, Narayana Prasad <narayanaprasad.athr...@cavium.com>; > > > Gupta, Ashish <ashish.gu...@cavium.com>; Sahu, Sunila > > > <sunila.s...@cavium.com>; De Lara Guarch, Pablo > > > <pablo.de.lara.gua...@intel.com>; Challa, Mahipal > > > <mahipal.cha...@cavium.com>; Jain, Deepak K > <deepak.k.j...@intel.com>; > > > Hemant Agrawal <hemant.agra...@nxp.com>; Roy Pledge > > > <roy.ple...@nxp.com>; Youri Querry <youri.querr...@nxp.com> > > > Subject: Re: [RFC v2] doc compression API for DPDK > > > > > > Hi Shally, > > > > > > Thanks for the summary. It is very helpful. Please see comments below > > > > > > > > > On 1/4/2018 6:45 AM, Verma, Shally wrote: > > > > This is an RFC v2 document to brief understanding and requirements on > > > compression API proposal in DPDK. It is based on "[RFC v3] Compression > API > > > in DPDK > > > > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd > > > > k.org%2Fdev%2Fpatchwork%2Fpatch%2F32331%2F&data=02%7C01%7Cahm > > > > ed.mansour%40nxp.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea > > > > 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636506631207323264&sdata=JF > > > tOnJxajgXX7s3DMZ79K7VVM7TXO8lBd6rNeVlsHDg%3D&reserved=0 ". > > > > Intention of this document is to align on concepts built into > compression > > > API, its usage and identify further requirements. > > > > > > > > Going further it could be a base to Compression Module Programmer > > > Guide. > > > > > > > > Current scope is limited to > > > > - definition of the terminology which makes up foundation of > compression > > > API > > > > - typical API flow expected to use by applications > > > > - Stateless and Stateful operation definition and usage after RFC v1 doc > > > review > > > > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdev. > > > dpdk.narkive.com%2FCHS5l01B%2Fdpdk-dev-rfc-v1-doc-compression- > api- > > > for- > > > > dpdk&data=02%7C01%7Cahmed.mansour%40nxp.com%7C80bd3270430c473 > > > > fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6 > > > > 36506631207323264&sdata=Fy7xKIyxZX97i7vEM6NqgrvnqKrNrWOYLwIA5dEH > > > QNQ%3D&reserved=0 > > > > > > > > 1. Overview > > > > ~~~~~~~~~~~ > > > > > > > > A. Compression Methodologies in compression API > > > > =========================================== > > > > DPDK compression supports two types of compression methodologies: > > > > - Stateless - each data object is compressed individually without any > > > reference to previous data, > > > > - Stateful - each data object is compressed with reference to previous > data > > > object i.e. history of data is needed for compression / decompression > > > > For more explanation, please refer RFC > > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw > > > > ww.ietf.org%2Frfc%2Frfc1951.txt&data=02%7C01%7Cahmed.mansour%40nx > > > > p.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd9 > > > > 9c5c301635%7C0%7C0%7C636506631207323264&sdata=pfp2VX1w3UxH5YLcL > > > 2R%2BvKXNeS7jP46CsASq0B1SETw%3D&reserved=0 > > > > > > > > To support both methodologies, DPDK compression introduces two key > > > concepts: Session and Stream. > > > > > > > > B. Notion of a session in compression API > > > > ================================== > > > > A Session in DPDK compression is a logical entity which is setup one- > time > > > with immutable parameters i.e. parameters that don't change across > > > operations and devices. > > > > A session can be shared across multiple devices and multiple operations > > > simultaneously. > > > > A typical Session parameters includes info such as: > > > > - compress / decompress > > > > - compression algorithm and associated configuration parameters > > > > > > > > Application can create different sessions on a device initialized with > > > same/different xforms. Once a session is initialized with one xform it > cannot > > > be re-initialized. > > > > > > > > C. Notion of stream in compression API > > > > ======================================= > > > > Unlike session which carry common set of information across > operations, a > > > stream in DPDK compression is a logical entity which identify related set > of > > > operations and carry operation specific information as needed by device > > > during its processing. > > > > It is device specific data structure which is opaque to application, > > > > setup > and > > > maintained by device. > > > > > > > > A stream can be used with *only* one op at a time i.e. no two > operations > > > can share same stream simultaneously. > > > > A stream is *must* for stateful ops processing and optional for > stateless > > > (Please see respective sections for more details). > > > > > > > > This enables sharing of a session by multiple threads handling different > > > data set as each op carry its own context (internal states, history > > > buffers > et > > > el) in its attached stream. > > > > Application should call rte_comp_stream_create() and attach to op > before > > > beginning of operation processing and free via rte_comp_stream_free() > > > after its complete. > > > > > > > > C. Notion of burst operations in compression API > > > > ======================================= > > > > A burst in DPDK compression is an array of operations where each op > carry > > > independent set of data. i.e. a burst can look like: > > > > > > > > > > > > ---------------------------------------------------------------- > ----- > > > ------------------------------------ > > > > enque_burst (|op1.no_flush | op2.no_flush | > > > > op3.flush_final | > > > op4.no_flush | op5.no_flush |) > > > > > > > > ---------------------------------------------------------------- > ---- > > > ------------------------------------- > > > > > > > > Where, op1 .. op5 are all independent of each other and carry entirely > > > different set of data. > > > > Each op can be attached to same/different session but *must* be > attached > > > to different stream. > > > > > > > > Each op (struct rte_comp_op) carry compression/decompression > > > operational parameter and is both an input/output parameter. > > > > PMD gets source, destination and checksum information at input and > > > update it with bytes consumed and produced and checksum at output. > > > > > > > > Since each operation in a burst is independent and thus can complete > out- > > > of-order, applications which need ordering, should setup per-op user > data > > > area with reordering information so that it can determine enqueue order > at > > > deque. > > > > > > > > Also if multiple threads calls enqueue_burst() on same queue pair then > it's > > > application onus to use proper locking mechanism to ensure exclusive > > > enqueuing of operations. > > > > > > > > D. Stateless Vs Stateful > > > > =================== > > > > Compression API provide RTE_COMP_FF_STATEFUL feature flag for > PMD > > > to reflect its support for Stateful operation. Each op carry an op type > > > indicating if it's to be processed stateful or stateless. > > > > > > > > D.1 Compression API Stateless operation > > > > ------------------------------------------------------ > > > > An op is processed stateless if it has > > > > - flush value is set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL > > > (required only on compression side), > > > > - op_type set to RTE_COMP_OP_STATELESS > > > > - All-of the required input and sufficient large output > > > > buffer to > store > > > output i.e. OUT_OF_SPACE can never occur. > > > > > > > > When all of the above conditions are met, PMD initiates stateless > > > processing and releases acquired resources after processing of current > > > operation is complete i.e. full input consumed and full output written. > [Fiona] I think 3rd condition conflicts with D1.1 below and anyway cannot be > a precondition. i.e. > PMD must initiate stateless processing based on RTE_COMP_OP_STATELESS. > It can't always know if the output buffer is big enough before processing, it > must process the input data and > only when it has consumed it all can it know that all the output data fits or > doesn't fit in the output buffer. > > I'd suggest rewording as follows: > An op is processed statelessly if op_type is set to RTE_COMP_OP_STATELESS > In this case > - The flush value must be set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL > (required only on compression side), > - All of the input data must be in the src buffer > - The dst buffer should be sufficiently large enough to hold the expected > output > The PMD acquires the necessary resources to process the op. After > processing of current operation is > complete, whether successful or not, it releases acquired resources and no > state, history or data is > held in the PMD or carried over to subsequent ops. > In SUCCESS case full input is consumed and full output written and status is > set to RTE_COMP_OP_STATUS_SUCCESS. > OUT-OF-SPACE as D1.1 below. >
[Shally] Ok. Agreed. > > > > Application can optionally attach a stream to such ops. In such case, > > > application must attach different stream to each op. > > > > > > > > Application can enqueue stateless burst via making consecutive > > > enque_burst() calls i.e. Following is relevant usage: > > > > > > > > enqueued = rte_comp_enque_burst (dev_id, qp_id, ops1, nb_ops); > > > > enqueued = rte_comp_enque_burst(dev_id, qp_id, ops2, nb_ops); > > > > > > > > *Note - Every call has different ops array i.e. same rte_comp_op array > > > *cannot be re-enqueued* to process next batch of data until previous > ones > > > are completely processed. > > > > > > > > D.1.1 Stateless and OUT_OF_SPACE > > > > ------------------------------------------------ > > > > OUT_OF_SPACE is a condition when output buffer runs out of space > and > > > where PMD still has more data to produce. If PMD run into such > condition, > > > then it's an error condition in stateless processing. > > > > In such case, PMD resets itself and return with status > > > RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0 > i.e. > > > no input read, no output written. > > > > Application can resubmit an full input with larger output buffer size. > > > > > > [Ahmed] Can we add an option to allow the user to read the data that > was > > > produced while still reporting OUT_OF_SPACE? this is mainly useful for > > > decompression applications doing search. > > > > [Shally] It is there but applicable for stateful operation type (please > > refer to > handling out_of_space under > > "Stateful Section"). > > By definition, "stateless" here means that application (such as IPCOMP) > knows maximum output size > > guaranteedly and ensure that uncompressed data size cannot grow more > than provided output buffer. > > Such apps can submit an op with type = STATELESS and provide full input, > then PMD assume it has > > sufficient input and output and thus doesn't need to maintain any contexts > after op is processed. > > If application doesn't know about max output size, then it should process it > as stateful op i.e. setup op > > with type = STATEFUL and attach a stream so that PMD can maintain > relevant context to handle such > > condition. > [Fiona] There may be an alternative that's useful for Ahmed, while still > respecting the stateless concept. > In Stateless case where a PMD reports OUT_OF_SPACE in decompression > case > it could also return consumed=0, produced = x, where x>0. X indicates the > amount of valid data which has > been written to the output buffer. It is not complete, but if an application > wants to search it it may be sufficient. > If the application still wants the data it must resubmit the whole input with > a > bigger output buffer, and > decompression will be repeated from the start, it > cannot expect to continue on as the PMD has not maintained state, history > or data. > I don't think there would be any need to indicate this in capabilities, PMDs > which cannot provide this > functionality would always return produced=consumed=0, while PMDs which > can could set produced > 0. > If this works for you both, we could consider a similar case for compression. > [Shally] Sounds Fine to me. Though then in that case, consume should also be updated to actual consumed by PMD. Setting consumed = 0 with produced > 0 doesn't correlate. > > > > > > > > > D.2 Compression API Stateful operation > > > > ---------------------------------------------------------- > > > > A Stateful operation in DPDK compression means application invokes > > > enqueue burst() multiple times to process related chunk of data either > > > because > > > > - Application broke data into several ops, and/or > > > > - PMD ran into out_of_space situation during input processing > > > > > > > > In case of either one or all of the above conditions, PMD is required to > > > maintain state of op across enque_burst() calls and > > > > ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with > > > flush value = RTE_COMP_NO/SYNC_FLUSH and end at flush value > > > RTE_COMP_FULL/FINAL_FLUSH. > > > > > > > > D.2.1 Stateful operation state maintenance > > > > --------------------------------------------------------------- > > > > It is always an ideal expectation from application that it should parse > > > through all related chunk of source data making its mbuf-chain and > enqueue > > > it for stateless processing. > > > > However, if it need to break it into several enqueue_burst() calls, then > an > > > expected call flow would be something like: > > > > > > > > enqueue_burst( |op.no_flush |) > > > > > > [Ahmed] The work is now in flight to the PMD.The user will call dequeue > > > burst in a loop until all ops are received. Is this correct? > > > > > > > deque_burst(op) // should dequeue before we enqueue next > > > > [Shally] Yes. Ideally every submitted op need to be dequeued. However > this illustration is specifically in > > context of stateful op processing to reflect if a stream is broken into > chunks, then each chunk should be > > submitted as one op at-a-time with type = STATEFUL and need to be > dequeued first before next chunk is > > enqueued. > > > > > > enqueue_burst( |op.no_flush |) > > > > deque_burst(op) // should dequeue before we enqueue next > > > > enqueue_burst( |op.full_flush |) > > > > > > [Ahmed] Why now allow multiple work items in flight? I understand that > > > occasionaly there will be OUT_OF_SPACE exception. Can we just > distinguish > > > the response in exception cases? > > > > [Shally] Multiples ops are allowed in flight, however condition is each op > > in > such case is independent of > > each other i.e. belong to different streams altogether. > > Earlier (as part of RFC v1 doc) we did consider the proposal to process all > related chunks of data in single > > burst by passing them as ops array but later found that as not-so-useful for > PMD handling for various > > reasons. You may please refer to RFC v1 doc review comments for same. > [Fiona] Agree with Shally. In summary, as only one op can be processed at a > time, since each needs the > state of the previous, to allow more than 1 op to be in-flight at a time would > force PMDs to implement internal queueing and exception handling for > OUT_OF_SPACE conditions you mention. > If the application has all the data, it can put it into chained mbufs in a > single > op rather than > multiple ops, which avoids pushing all that complexity down to the PMDs. > > > > > > > > > > > Here an op *must* be attached to a stream and every subsequent > > > enqueue_burst() call should carry *same* stream. Since PMD maintain > ops > > > state in stream, thus it is mandatory for application to attach stream to > such > > > ops. > [Fiona] I think you're referring only to a single stream above, but as there > may be many different streams, > maybe add the following? > Above is simplified to show just a single stream. However there may be > many streams, and each > enqueue_burst() may contain ops from different streams, as long as there is > only one op in-flight from any > stream at a given time. > [Shally] Ok get it. > > > > > > > > > D.2.2 Stateful and Out_of_Space > > > > -------------------------------------------- > > > > If PMD support stateful and run into OUT_OF_SPACE situation, then it is > > > not an error condition for PMD. In such case, PMD return with status > > > RTE_COMP_OP_STATUS_OUT_OF_SPACE with consumed = number of > input > > > bytes read and produced = length of complete output buffer. > [Fiona] - produced would be <= output buffer len (typically =, but could be a > few bytes less) > > > > > > Application should enqueue op with source starting at consumed+1 and > > > output buffer with available space. > > > > > > [Ahmed] Related to OUT_OF_SPACE. What status does the user recieve > in a > > > decompression case when the end block is encountered before the end > of > > > the input? Does the PMD continue decomp? Does it stop there and > return > > > the stop index? > > > > > > > [Shally] Before I could answer this, please help me understand your use > case . When you say "when the > > end block is encountered before the end of the input?" Do you mean - > > "Decompressor process a final block (i.e. has BFINAL=1 in its header) and > there's some footer data after > > that?" Or > > you mean "decompressor process one block and has more to process till its > final block?" > > What is "end block" and "end of input" reference here? > > > > > > > > > > D.2.3 Sliding Window Size > > > > ------------------------------------ > > > > Every PMD will reflect in its algorithm capability structure maximum > length > > > of Sliding Window in bytes which would indicate maximum history buffer > > > length used by algo. > > > > > > > > 2. Example API illustration > > > > ~~~~~~~~~~~~~~~~~~~~~~~ > > > > > [Fiona] I think it would be useful to show an example of both a STATELESS > flow and a STATEFUL flow. > [Shally] Ok. I can add simplified version to illustrate API usage in both cases. > > > > Following is an illustration on API usage (This is just one flow, other > variants > > > are also possible): > > > > 1. rte_comp_session *sess = rte_compressdev_session_create > > > (rte_mempool *pool); > > > > 2. rte_compressdev_session_init (int dev_id, rte_comp_session *sess, > > > rte_comp_xform *xform, rte_mempool *sess_pool); > > > > 3. rte_comp_op_pool_create(rte_mempool ..) > > > > 4. rte_comp_op_bulk_alloc (struct rte_mempool *mempool, struct > > > rte_comp_op **ops, uint16_t nb_ops); > > > > 5. for every rte_comp_op in ops[], > > > > 5.1 rte_comp_op_attach_session (rte_comp_op *op, > rte_comp_session > > > *sess); > > > > 5.2 op.op_type = RTE_COMP_OP_STATELESS > > > > 5.3 op.flush = RTE_FLUSH_FINAL > > > > 6. [Optional] for every rte_comp_op in ops[], > > > > 6.1 rte_comp_stream_create(int dev_id, rte_comp_session *sess, > void > > > **stream); > > > > 6.2 rte_comp_op_attach_stream(rte_comp_op *op, > rte_comp_session > > > *stream); > > > > > > [Ahmed] What is the semantic effect of attaching a stream to every op? > will > > > this application benefit for this given that it is setup with op_type > STATELESS > > > > [Shally] By role, stream is data structure that hold all information that > > PMD > need to maintain for an op > > processing and thus it's marked device specific. It is required for stateful > processing but optional for > > statelss as PMD doesn't need to maintain context once op is processed > unlike stateful. > > It may be of advantage to use stream for stateless to some of the PMD. > They can be designed to do one- > > time per op setup (such as mapping session params) during > stream_create() in control path than data > > path. > > > [Fiona] yes, we agreed that stream_create() should be called for every > session and if it > returns non-NULL the PMD needs it so op_attach_stream() must be called. > However I've just realised we don't have a STATEFUL/STATELESS param on > the xform, just on the op. > So we could either add stateful/stateless param to stream_create() ? > OR add stateful/stateless param to xform so it would be in session? [Shally] No it shouldn't be as part of session or xform as sessions aren't stateless/stateful. So, we shouldn't alter the current definition of session or xforms. If we need to mention it, then it could be added as part of stream_create() as it's device specific and depending upon op_type() device can then setup stream resources. > However, Shally, can you reconsider if you really need it for STATELESS or if > the data you want to > store there could be stored in the session? Or if it's needed per-op does it > really need > to be visible on the API as a stream or could it be hidden within the PMD? [Shally] I would say it is not mandatory but a desirable feature that I am suggesting. I am only trying to enable optimization in data path which may be of help to some PMD designs as they can use stream_create() to do setup that are 1-time per op and regardless of op_type, such as I mentioned, setting up user session params to device sess params. We can hide it inside PMD however there may be slight overhead in datapath depending on PMD design. But I would say, it's not a blocker for us to freeze on current spec. We can revisit this feature later because it will not alter base API functionality. Thanks Shally > > > > > > > > 7.for every rte_comp_op in ops[], > > > > 7.1 set up with src/dst buffer > > > > 8. enq = rte_compressdev_enqueue_burst (dev_id, qp_id, &ops, > nb_ops); > > > > 9. do while (dqu < enq) // Wait till all of enqueued are dequeued > > > > 9.1 dqu = rte_compressdev_dequeue_burst (dev_id, qp_id, &ops, > enq); > > > > > > [Ahmed] I am assuming that waiting for all enqueued to be dequeued is > not > > > strictly necessary, but is just the chosen example in this case > > > > > > > [Shally] Yes. By design, for burst_size>1 each op is independent of each > other. So app may proceed as soon > > as it dequeue any. > > > > > > 10. Repeat 7 for next batch of data > > > > 11. for every ops in ops[] > > > > 11.1 rte_comp_stream_free(op->stream); > > > > 11. rte_comp_session_clear (sess) ; > > > > 12. rte_comp_session_terminate(ret_comp_sess *session) > > > > > > > > Thanks > > > > Shally > > > > > > > >