Hi Shally, Thanks for the summary. It is very helpful. Please see comments below
On 1/4/2018 6:45 AM, Verma, Shally wrote: > This is an RFC v2 document to brief understanding and requirements on > compression API proposal in DPDK. It is based on "[RFC v3] Compression API in > DPDK > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpdk.org%2Fdev%2Fpatchwork%2Fpatch%2F32331%2F&data=02%7C01%7Cahmed.mansour%40nxp.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636506631207323264&sdata=JFtOnJxajgXX7s3DMZ79K7VVM7TXO8lBd6rNeVlsHDg%3D&reserved=0 > ". > Intention of this document is to align on concepts built into compression > API, its usage and identify further requirements. > > Going further it could be a base to Compression Module Programmer Guide. > > Current scope is limited to > - definition of the terminology which makes up foundation of compression API > - typical API flow expected to use by applications > - Stateless and Stateful operation definition and usage after RFC v1 doc > review > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdev.dpdk.narkive.com%2FCHS5l01B%2Fdpdk-dev-rfc-v1-doc-compression-api-for-dpdk&data=02%7C01%7Cahmed.mansour%40nxp.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636506631207323264&sdata=Fy7xKIyxZX97i7vEM6NqgrvnqKrNrWOYLwIA5dEHQNQ%3D&reserved=0 > > 1. Overview > ~~~~~~~~~~~ > > A. Compression Methodologies in compression API > =========================================== > DPDK compression supports two types of compression methodologies: > - Stateless - each data object is compressed individually without any > reference to previous data, > - Stateful - each data object is compressed with reference to previous data > object i.e. history of data is needed for compression / decompression > For more explanation, please refer RFC > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ietf.org%2Frfc%2Frfc1951.txt&data=02%7C01%7Cahmed.mansour%40nxp.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636506631207323264&sdata=pfp2VX1w3UxH5YLcL2R%2BvKXNeS7jP46CsASq0B1SETw%3D&reserved=0 > > To support both methodologies, DPDK compression introduces two key concepts: > Session and Stream. > > B. Notion of a session in compression API > ================================== > A Session in DPDK compression is a logical entity which is setup one-time > with immutable parameters i.e. parameters that don't change across operations > and devices. > A session can be shared across multiple devices and multiple operations > simultaneously. > A typical Session parameters includes info such as: > - compress / decompress > - compression algorithm and associated configuration parameters > > Application can create different sessions on a device initialized with > same/different xforms. Once a session is initialized with one xform it cannot > be re-initialized. > > C. Notion of stream in compression API > ======================================= > Unlike session which carry common set of information across operations, a > stream in DPDK compression is a logical entity which identify related set of > operations and carry operation specific information as needed by device > during its processing. > It is device specific data structure which is opaque to application, setup > and maintained by device. > > A stream can be used with *only* one op at a time i.e. no two operations can > share same stream simultaneously. > A stream is *must* for stateful ops processing and optional for stateless > (Please see respective sections for more details). > > This enables sharing of a session by multiple threads handling different data > set as each op carry its own context (internal states, history buffers et el) > in its attached stream. > Application should call rte_comp_stream_create() and attach to op before > beginning of operation processing and free via rte_comp_stream_free() after > its complete. > > C. Notion of burst operations in compression API > ======================================= > A burst in DPDK compression is an array of operations where each op carry > independent set of data. i.e. a burst can look like: > > > --------------------------------------------------------------------------------------------------------- > enque_burst (|op1.no_flush | op2.no_flush | op3.flush_final | > op4.no_flush | op5.no_flush |) > > --------------------------------------------------------------------------------------------------------- > > Where, op1 .. op5 are all independent of each other and carry entirely > different set of data. > Each op can be attached to same/different session but *must* be attached to > different stream. > > Each op (struct rte_comp_op) carry compression/decompression operational > parameter and is both an input/output parameter. > PMD gets source, destination and checksum information at input and update it > with bytes consumed and produced and checksum at output. > > Since each operation in a burst is independent and thus can complete > out-of-order, applications which need ordering, should setup per-op user > data area with reordering information so that it can determine enqueue order > at deque. > > Also if multiple threads calls enqueue_burst() on same queue pair then it’s > application onus to use proper locking mechanism to ensure exclusive > enqueuing of operations. > > D. Stateless Vs Stateful > =================== > Compression API provide RTE_COMP_FF_STATEFUL feature flag for PMD to reflect > its support for Stateful operation. Each op carry an op type indicating if > it's to be processed stateful or stateless. > > D.1 Compression API Stateless operation > ------------------------------------------------------ > An op is processed stateless if it has > - flush value is set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL > (required only on compression side), > - op_type set to RTE_COMP_OP_STATELESS > - All-of the required input and sufficient large output buffer > to store output i.e. OUT_OF_SPACE can never occur. > > When all of the above conditions are met, PMD initiates stateless processing > and releases acquired resources after processing of current operation is > complete i.e. full input consumed and full output written. > Application can optionally attach a stream to such ops. In such case, > application must attach different stream to each op. > > Application can enqueue stateless burst via making consecutive enque_burst() > calls i.e. Following is relevant usage: > > enqueued = rte_comp_enque_burst (dev_id, qp_id, ops1, nb_ops); > enqueued = rte_comp_enque_burst(dev_id, qp_id, ops2, nb_ops); > > *Note – Every call has different ops array i.e. same rte_comp_op array > *cannot be re-enqueued* to process next batch of data until previous ones are > completely processed. > > D.1.1 Stateless and OUT_OF_SPACE > ------------------------------------------------ > OUT_OF_SPACE is a condition when output buffer runs out of space and where > PMD still has more data to produce. If PMD run into such condition, then it's > an error condition in stateless processing. > In such case, PMD resets itself and return with status > RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0 i.e. no input read, > no output written. > Application can resubmit an full input with larger output buffer size. [Ahmed] Can we add an option to allow the user to read the data that was produced while still reporting OUT_OF_SPACE? this is mainly useful for decompression applications doing search. > D.2 Compression API Stateful operation > ---------------------------------------------------------- > A Stateful operation in DPDK compression means application invokes enqueue > burst() multiple times to process related chunk of data either because > - Application broke data into several ops, and/or > - PMD ran into out_of_space situation during input processing > > In case of either one or all of the above conditions, PMD is required to > maintain state of op across enque_burst() calls and > ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with flush value = > RTE_COMP_NO/SYNC_FLUSH and end at flush value RTE_COMP_FULL/FINAL_FLUSH. > > D.2.1 Stateful operation state maintenance > --------------------------------------------------------------- > It is always an ideal expectation from application that it should parse > through all related chunk of source data making its mbuf-chain and enqueue it > for stateless processing. > However, if it need to break it into several enqueue_burst() calls, then an > expected call flow would be something like: > > enqueue_burst( |op.no_flush |) [Ahmed] The work is now in flight to the PMD.The user will call dequeue burst in a loop until all ops are received. Is this correct? > deque_burst(op) // should dequeue before we enqueue next > enqueue_burst( |op.no_flush |) > deque_burst(op) // should dequeue before we enqueue next > enqueue_burst( |op.full_flush |) [Ahmed] Why now allow multiple work items in flight? I understand that occasionaly there will be OUT_OF_SPACE exception. Can we just distinguish the response in exception cases? > > Here an op *must* be attached to a stream and every subsequent > enqueue_burst() call should carry *same* stream. Since PMD maintain ops state > in stream, thus it is mandatory for application to attach stream to such ops. > > D.2.2 Stateful and Out_of_Space > -------------------------------------------- > If PMD support stateful and run into OUT_OF_SPACE situation, then it is not > an error condition for PMD. In such case, PMD return with status > RTE_COMP_OP_STATUS_OUT_OF_SPACE with consumed = number of input bytes read > and produced = length of complete output buffer. > Application should enqueue op with source starting at consumed+1 and output > buffer with available space. [Ahmed] Related to OUT_OF_SPACE. What status does the user recieve in a decompression case when the end block is encountered before the end of the input? Does the PMD continue decomp? Does it stop there and return the stop index? > > D.2.3 Sliding Window Size > ------------------------------------ > Every PMD will reflect in its algorithm capability structure maximum length > of Sliding Window in bytes which would indicate maximum history buffer length > used by algo. > > 2. Example API illustration > ~~~~~~~~~~~~~~~~~~~~~~~ > > Following is an illustration on API usage (This is just one flow, other > variants are also possible): > 1. rte_comp_session *sess = rte_compressdev_session_create (rte_mempool > *pool); > 2. rte_compressdev_session_init (int dev_id, rte_comp_session *sess, > rte_comp_xform *xform, rte_mempool *sess_pool); > 3. rte_comp_op_pool_create(rte_mempool ..) > 4. rte_comp_op_bulk_alloc (struct rte_mempool *mempool, struct rte_comp_op > **ops, uint16_t nb_ops); > 5. for every rte_comp_op in ops[], > 5.1 rte_comp_op_attach_session (rte_comp_op *op, rte_comp_session *sess); > 5.2 op.op_type = RTE_COMP_OP_STATELESS > 5.3 op.flush = RTE_FLUSH_FINAL > 6. [Optional] for every rte_comp_op in ops[], > 6.1 rte_comp_stream_create(int dev_id, rte_comp_session *sess, void > **stream); > 6.2 rte_comp_op_attach_stream(rte_comp_op *op, rte_comp_session *stream); [Ahmed] What is the semantic effect of attaching a stream to every op? will this application benefit for this given that it is setup with op_type STATELESS > 7.for every rte_comp_op in ops[], > 7.1 set up with src/dst buffer > 8. enq = rte_compressdev_enqueue_burst (dev_id, qp_id, &ops, nb_ops); > 9. do while (dqu < enq) // Wait till all of enqueued are dequeued > 9.1 dqu = rte_compressdev_dequeue_burst (dev_id, qp_id, &ops, enq); [Ahmed] I am assuming that waiting for all enqueued to be dequeued is not strictly necessary, but is just the chosen example in this case > 10. Repeat 7 for next batch of data > 11. for every ops in ops[] > 11.1 rte_comp_stream_free(op->stream); > 11. rte_comp_session_clear (sess) ; > 12. rte_comp_session_terminate(ret_comp_sess *session) > > Thanks > Shally > >