Re: [dpdk-dev] [RFC] Compression API in DPDK

Verma, Shally Mon, 23 Oct 2017 12:57:12 -0700

> -----Original Message-----
> From: Trahe, Fiona [mailto:fiona.tr...@intel.com]
> Sent: 17 October 2017 20:11
> To: shally verma <shallyvermacav...@gmail.com>; dev@dpdk.org; Athreya,
> Narayana Prasad <narayanaprasad.athr...@cavium.com>; Challa, Mahipal
> <mahipal.cha...@cavium.com>; Verma, Shally <shally.ve...@cavium.com>
> Cc: Trahe, Fiona <fiona.tr...@intel.com>
> Subject: RE: [dpdk-dev] [RFC] Compression API in DPDK
> 
> Hi Shally,
> 
> Thanks for your feedback. Comments below.
> 
> Regards,
> Fiona
> 
> > -----Original Message-----
> > From: shally verma [mailto:shallyvermacav...@gmail.com]
> > Sent: Tuesday, October 17, 2017 7:56 AM
> > To: Trahe, Fiona <fiona.tr...@intel.com>; dev@dpdk.org;
> > pathr...@cavium.com; Mahipal Challa <mcha...@cavium.com>; Shally
> Verma
> > <sve...@cavium.com>
> > Subject: Re: [dpdk-dev] [RFC] Compression API in DPDK
> >
> > While I am still going through RFC in details, inline are my inputs to
> > some of the points.
> >
> > On Thu, Oct 12, 2017 at 10:34 PM, Trahe, Fiona <fiona.tr...@intel.com>
> wrote:
> > >
> > > With the vast amounts of data being transported around networks and
> > > stored in storage systems,
> > reducing data size is becoming ever more important. There are both
> > software libraries and hardware devices available that provide
> > compression, but no common API. This RFC proposes a compression API for
> DPDK to address this need.
> > >
> > > Features:
> > > . Deflate Algorithm (https://tools.ietf.org/html/rfc1951), other 
> > > algorithms
> can be added later.
> > > . Static and Dynamic Huffman encoding.
> > > . Compression levels
> > > . Checksum generation
> > > . Asynchronous burst API
> > > . Session-based. (for stateless a session may be useable across
> > > devices, for stateful it wouldn't be.)
> > >
> > >
> > > Note 1: Split of functionality above/below API When considering
> > > whether features should be supported on the API or not, the decision
> > > was based on
> > the following:
> > > The purpose of the API is to decouple the application from the
> > > compute-intensive functions needed
> > for compression by abstracting them under a common API. These can then
> > be implemented by either hardware accelerators or optimised software
> > libraries. Where features are not compute-intensive and unlikely to be
> > offloaded or optimised, there's nothing to be gained by each PMD
> > having to separately implement them, and it makes more sense for them
> to be done above the API. So the following are not handled on the API and
> can be done above.
> > > . Prepending/appending protocol headers (gzip, zlib) .
> > > File-handling, breaking files up into packets, reassembling.
> > > . Synchronous API
> > >
> > >
> > > Note 2: Stateful compression.
> > > A better compression ratio can be achieved by using stateful
> > > compression, allowing pattern matching
> > across multiple sequential packets. This will be supported on the API,
> > but in the interests of getting an initial RFC out for feedback, there
> > are placeholder parameters on the API. Feedback welcome on what
> parameters are needed for this feature.
> > >
> >
> > We would like to see this as part of Specification. We will provide
> > more details on what we(cavium) require.
> 
> [Fiona] I look forward to it.
> 
> >
> > >
> > >
> > > Note 3: The tricky question of where the API belongs We considered
> > > 1. Extending cryptodev 2. New acceldev APIs for device handling +
> > > compressdev APIs for data path 3. New acceldev for all APIs 4. New
> > > compressdev API
> > >
> > > We're proposing option 4, here's why:
> > > 1. Extending cryptodev:
> > >     .  + Nice code re-use.
> > >     .  + Consistent with Linux kernel and BSD approaches.
> > >     .  - But not a good name.
> > >     .  - Would inevitably lead to some cryptodev API breakage - can be
> minimised, but then looks hacky.
> > E.g.
> > >             rte_cryptodev_info.max_nb_queue_pairs currently. Would need
> to extend to
> > >             rte_cryptodev_info.max_nb_queue_pairs (Ideally rename
> > > this to max_nb_sym_.... but that's an
> > API breakage)
> > >             rte_cryptodev_info.max_nb_comp_queue_pairs
> > >             rte_cryptodev_info.max_nb_asym_queue_pairs
> > >           Similarly fn names, e.g. rte_cryptodev_queue_pair_setup.
> > >            Rename to rte_cryptodev_sym_queue_pair_setup, add _comp
> > > and _asym fns? Or add
> > parameter?
> > > 2. New acceldev APIs for device handling + compressdev APIs for data
> path:
> > >     ~70% of cryptodev code is device-handling which is common to all
> accelerators.
> > >     Rather than providing new compressdev APIs, provide APIs which can
> be used for compression
> > >     and also other accelerators, e.g. bbdev, FPGAs. Would look like
> rte_acceldev_info_get(),
> > >     rte_acceldev_configure(), rte_compress_enqueue_burst(), etc
> > >     . + Nice future code re-use.
> > >     . + No cryptodev API breakage, though would be nice over time to
> move to acceldev APIs for
> > >           cryptodev and deprecate cryptodev APIs.
> > >     . - acceldev either has to understand structs for all services with
> resulting dependencies OR
> > >             it's independent with minimal type-checking, lots of void* and
> casting.
> > >     . - potential API churn if services have different needs
> > >     . - usability - application has to sometimes use acceldev,
> > > sometimes compressdev 3. New acceldev for all APIs:
> > >     . As 2 but with only the xforms and ops different for each service. 
> > > All
> APIs are rte_acceldev_....
> > >     . Similar pros/cons.
> > >     . - No flexibility for accelerators services to have
> > > service-specific APIs. Services are more tightly
> > coupled.
> > > 4. New compressdev API: (possibly with an acceldev under the hood)
> > >     . + Better name
> > >     . + Usability - rte_compdev_... for everything
> > >     . + No cryptodev API breakage
> > >     . + Flexibility to add any compression-specific APIs if needed.
> > >     . + Simpler, more decoupled code
> > >     . - More code duplication.
> > >           May in a later iteration implement an acceldev under the hood, 
> > > for
> code re-use, but
> > >           this is hidden from applications. Service APIs could pick and 
> > > choose
> whatever makes sense
> > >           from acceldev. POC on this in-progress, also awaiting Hemant
> Agrawal's proposed
> > >           rte_raw_device RFC, which may fit the acceldev role.
> > >
> > > Note 4: Chaining of compression with other services.
> > >     Option 1 would make it easy to chain compression with sym
> > > crypto. Though not with other non-
> > crypto
> > >     operations if this is a future use-case. However it can be
> > > achieved with any option, following the
> > pattern
> > >     of rte_security API, so is not a factor in above decision.
> > >
> >
> > We are in favor of solution #4. Apart from reasons as you stated,
> > nature of both operations are very different and it add lots of
> > complexity to both ends (crypto + compression) to co-exist with
> > different behavior.
> > However I would like to understand more about acceldev proposal? Is
> > there any document that I can see to understands its purpose?
> 
> [Fiona] Not at the moment.
> We'd like to wait for Hemant's rte_raw_devices RFC before making a
> proposal on this.
>


[Shally] Ok.

> >
> > >
> > >
> > > Code extracts below:
> > >
> > >
> > >
> > > diff --git a/lib/librte_compressdev/rte_comp.h
> > > b/lib/librte_compressdev/rte_comp.h
> > > +/**
> > > + * @file rte_comp.h
> > > + *
> > > + * RTE definitions for Data Compression Service
> > > + *
> > > + */
> > > +
> > > +/** Compression Algorithms */
> > > +enum rte_comp_algorithm {
> > > +      RTE_COMP_NULL = 1,

[Shally] What is an API expectation on seeing algorithm as NULL?

> > > +      /**< NULL compression algorithm. */
> > > +      RTE_COMP_DEFLATE,
> > > +      /**< DEFLATE compression algorithm
> > > +      * https://tools.ietf.org/html/rfc1951 */
> > > +      RTE_COMP_LIST_END
> > > +};
> > > +
> >
> > We support LZS offload as well, need to extend with RTE_COMP_LZS.
> 
> [Fiona] Agreed. Do you see any need for lzs-specific parameters? i.e. is there
> a  need for a struct rte_comp_lzs_params and, if so, what should it contain?
> 

[Shally] No. Current exported enum/param are sufficient.

> >
> > > +
> > > +/** Compression checksum types */
> > > +enum rte_comp_checksum_type {
> > > +      RTE_COMP_NONE,
> > > +      /**< No checksum generated */
> > > +      RTE_COMP_CRC32,
> > > +      /**< Generates a CRC32 checksum, as used by gzip */
> > > +      RTE_COMP_ADLER32,
> > > +      /**< Generates an Adler-32 checksum, as used by zlib */
> > > +      RTE_COMP_CRC32_ADLER32,
> > > +      /**< Generates both Adler-32 and CRC32 checksums, concatenated.
> > > +      * CRC32 is in the lower 32bits, Adler-32 in the upper 32 bits.
> > > +      */
> > > +};
> > > +
> >
> > In addition to these we support SHA1 and SHA256.
> > To incorporate these, we suggest the following:
> >
> > enum rte_comp_integrity_algo {
> >    RTE_COMP_NONE,
> >    RTE_COMP_CRC32,
> >    RTE_COMP_ADLER32,
> >    RTE_COMP_CRC32_ADLER32,
> >    RTE_COMP_SHA1,
> >    RTE_COMP_SHA256,
> > };
> 
> [Fiona] Can you elaborate on why these are needed in the compression
> context?

[Shally] One example typical use cases are in Data Centre applications where 
Hash is used as unique key for Data fingerprinting/identification and 
De-duplication.

> With lengths of 20 and 32 bytes they add considerable overhead to smaller
> packets.

[Shally] Typical usage is with larger buffers(2KB and above).

> How do you propose to return the checksum? The others proposed all fit into
> an 8-byte field in the op.
> 

[Shally] User can request both checksum and hash at output. I will propose 
additional associated changes after we freeze current RFC version.

> >
> > > +
> > > +/** Compression Deflate Huffman Type */ enum rte_comp_huffman {
> > > +      RTE_COMP_STATIC,
> > > +      /**< Static encoding */
> > > +      RTE_COMP_DYNAMIC,
> > > +      /**< Dynamic encoding */
> > > +};
> > > +
> >
> > We would like to see a RTE_COMP_DEFAULT for situations where
> > application cannot make a choice. Also, To be more consistent to
> > naming and deflate RFC1951, should be renamed as RTE_COMP_FIXED and
> > RTE_COMP_DYNAMIC.
> 
> [Fiona] Agreed
> >
> > > +/** Compression integrity check */
> > > +enum rte_comp_integrity_check {
> > > +      RTE_COMP_NO_VERIFY,
> > > +      /**< No integrity check on output data */
> > > +      RTE_COMP_VERIFY,
> > > +      /**< Integrity check on output data. Only applies to compress
> operation.
> > > +      * After compress operation done, decompress and verify it matches
> original data.
> > > +      */
> >
> > This looks more to me as diagnostic tool / test app.. It should be
> > part of an Application than a library
> [Fiona]
> If only a diagnostic tool then yes, it needn't be on the API.
> However, data integrity is a critical requirement for lossless compression. 
> It is
> extremely difficult to fully validate any compression implementation due to
> the infinite variety of input data patterns and the fact that there are many
> possible outputs for any given input.
> So applications sometimes build verification into the system.
> This could be done by the application, above the API, but as is compute
> intensive it can also benefit from offloading to hardware. And saves PCI
> bandwidth if done in same operation as compression.  So we would like to
> have this option on the API.
> As with any option we should also expose it on the capabilities API and PMDs
> can signal to the application whether or not they support it.
> 

[Shally] Capability idea looks good to me making it optional. Here are few more 
things to consider:
- Memory requirement to store decompression results - should get from user Or 
allocate internally if verification turned on?
- It's usage if burst size > 1.
- Different implementation may have their own version so following comment 
could be more generic such as
       /**< Integrity check on output data. Only applies to compress operation.
          * Verify correctness of compression operation on data.
         */

Just for my curiosity, does your hardware support verification offload?
> 
> >
> > > +};
> > > +
> > > +enum rte_comp_flush_flag {
> > > +      RTE_COMP_FLUSH_NONE,
> > > +      /**< TODO */
> > > +      RTE_COMP_FLUSH_FULL,
> > > +      /**< TODO */
> > > +      RTE_COMP_FLUSH_FINAL
> > > +      /**< TODO */
> > > +};
> > > +
> >
> > We need to add RTE_COMP_FLUSH_SYNC.
> >  Also, there could be some algorithms where these FLUSH type may not
> > be   relevant or not all may be supported , so what are PMD
> > expectations for such   case?
> [Fiona] Let's hold the discussion of flush flag values until we talk about
> stateful compression.
> If a PMD doesn't support a flush flag value we should provide for this to be
> signalled on the capabilities API.
> 

[Shally] Ok.

> >
> > > +
> > > +/** Status of compression operation */ enum rte_comp_op_status {
> > > +      RTE_COMP_OP_STATUS_SUCCESS,
> > > +      /**< Operation completed successfully */
> > > +      RTE_COMP_OP_STATUS_NOT_PROCESSED,

[Shally] I have some questions around it, please see my comments on rte_comp_op 
structure.

> > > +      /**< Operation has not yet been processed by the device */
> > > +      RTE_COMP_OP_STATUS_INVALID_SESSION,
> > > +      /**< operation failed due to invalid session arguments */
> > > +      RTE_COMP_OP_STATUS_INVALID_ARGS,
> > > +      /**< Operation failed due to invalid arguments in request */
> > > +      RTE_COMP_OP_STATUS_ERROR,
> > > +      /**< Error handling operation */
> > > +      RTE_COMP_OP_STATUS_OVERFLOW,
> >
> > If I understand the purpose correctly then this indicates a situation
> > where output buffer exhausted during compression/decompression.
> > So it should be renamed as RTE_COMP_OP_STATUS_OUT_OF_SPACE
> [Fiona] Agreed.
> 
> >
> > > +      /**< Output buffer filled up before all data in input buffer was
> consumed */
> >
> > This could also arise where input buffer is fully consumed during
> > decompression. So comment can be more generic such as
> > “Output buffer exhausted during operation.”
> [Fiona] In this case I would expect the status to be
> RTE_COMP_OP_STATUS_SUCCESS.
> The consumed parameter can be used by the application to detect that it has
> come to the end of the output buffer.
> 
 
[Shally] - Have a question here, consider a scenario
 -  input is fully read i.e consumed = length of src data, 
 - output is written i.e. produced = length of data written in dst buffer, and
 - status = OP_STATUS_SUCCESS
    Then how an application would know there is more data to be flushed and 
application need to re-enqueue request with more output buffer?

> 
> > > +      RTE_COMP_OP_STATUS_VERIFY_FAILED
> > > +      /**< Decompression after compression failed to regenerate original
> data */

[Shally] It can simply be "data verification failed".

> >
> > Believe it isn't needed to be part of Spec.
> >
> > > +
> > > +      //Note:
> > > +      //QAT API has 19 error types.
> > > +      //ISA-l has 5 inflate and 6 deflate errors.
> > > +      //zlib has 6 errors
> > > +      //Propose only include common subset in status - only values where
> appl should have different
> > behaviour.
> > > +      //Add separate error field on op return which a PMD could populate
> with PMD-specific debug
> > info.
> >
> > Probably should add RTE_COMP_OP_STATUS_INVALID_STATE in case an
> > API is invoked in invalid state.
> [Fiona] Agreed
> 
> > Rest Subset looks self-contained.
> >
> > > +};
> > > +
> > > +
> > > +/** Compression transform types */
> > > +enum rte_comp_xform_type {
> > > +      RTE_COMP_COMPRESS,
> > > +      /**< Compression service - compress */
> > > +      RTE_COMP_DECOMPRESS,
> > > +      /**< Compression service - decompress */
> > > +      RTE_COMP_CHECKSUM
> > > +      /** Compression service - generate checksum */
> > > +};
> > > +
> >
> > Enumeration may be renamed as rte_comp_op_type
> [Fiona] See answer re xforms below.
> 
> >
> > > +/** Parameters specific to the deflate algorithm */
> > > +struct rte_comp_deflate_params {
> > > +      enum rte_comp_huffman huffman;
> > > +      /**< Compression huffman encoding type */
> > > +};
> > > +
> > > +/**
> > > + * Session Setup Data common to all compress transforms.
> > > + * Includes parameters common to stateless and stateful
> > > + */
> > > +struct rte_comp_compress_common_params {
> > > +      enum rte_comp_algorithm algo;
> > > +      /**< Algorithm to use for compress operation */
> > > +      union {
> > > +            struct rte_comp_deflate_params deflate;
> > > +            /**< Parameters specific to the deflate algorithm */
> > > +      }; /**< Algorithm specific parameters */
> > > +      uint8_t level;
> > > +      /**< Compression level
> > > +      * Range is 1-9, where 1 is fastest compression
> > > +      * but worst, i.e. highest, compression ratio.
> > > +      * Higher numbers may give better compression ratios and are likely
> slower.
> > > +      * The number is interpreted by each PMD differently.
> > > +       */
> >
> > Different implementations of same algo may have different levels
> > supported. So, we can add enum alongside this parameter such as:
> >
> > enum rte_comp_compression_level {
> > RTE_COMP_COMPRESSION_LEVEL_DEFAULT=0,
> > /** Use PMD Default */
> > RTE_COMP_COMPRESSION_LEVEL_MIN=1,
> > /** Fastest Compression */
> > RTE_COMP_COMPRESSION_LEVEL_MAX=9
> > /** Maximum level supported by compression PMD */
> > }
> [Fiona] Agreed.
> 
> >
> > > +};
> > > +
> > > +/**
> > > + * Session Setup Data for stateful compress transform.
> > > + * Extra params for stateful transform
> > > + */
> > > +struct rte_comp_compress_stateful_params {
> > > +      //TODO : add extra params just needed for stateful, e.g.
> > > +      // history buffer size, window size, state, state buffers, etc...?
> >
> > I will provide my comments on it in another email.
> >
> > > +};
> > > +/* Session Setup Data for compress transform. */
> > > +struct rte_comp_compress_xform {
> > > +      struct rte_comp_compress_common_params cmn;
> > > +      struct rte_comp_compress_stateful_params stateful;
> > > +};
> > > +
> > > +/**
> > > + * Session Setup Data common to all decompress transforms.
> > > + * Includes parameters common to stateless and stateful
> > > + */
> > > +struct rte_comp_decompress_common_params {
> > > +      enum rte_comp_algorithm algo;
> > > +      /**< Algorithm to use for decompression */
> > > +};
> > > +/**
> > > + *  Session Setup Data for decompress transform.
> > > + * Extra params for stateful transform
> > > + */
> > > +struct rte_comp_decompress_stateful_params {
> > > +      //TODO : add extra params just needed for stateful, e.g.
> > > +      // history buffer size, window size, state, state buffers, etc...?
> > > +};
> > > +/* Session Setup Data for decompress transform. */
> > > +struct rte_comp_decompress_xform {
> > > +      struct rte_comp_decompress_common_params cmn;
> > > +      struct rte_comp_decompress_stateful_params stateful;
> > > +};
> > > +
> > > +/**
> > > + *  Generate checksum on uncompressed data.
> > > + * Applies to both compress and decompress operations. */
> > > +/* Session Setup Data for checksum transform. */
> > > +struct rte_comp_checksum_xform {
> > > +      enum rte_comp_checksum_type chksum_type;
> > > +      /**< This parameter determines the checksum type */
> > > +};
> > > +
> > > +
> > > +/**
> > > + * Compression transform structure.
> > > + *
> > > + * This is used to specify the compression transforms required, multiple
> transforms
> > > + * can be chained together to specify a chain of transforms such as
> > > + * generate checksum, then compress. Each transform structure can
> > > + * hold a single transform, the type field is used to specify which
> transform
> > > + * is contained within the union.
> > > + * The supported combinations are:
> > > + *  - compress-only
> > > + *  - decompress-only
> > > + *  - generate checksum and compress input data
> > > + *  - decompress input data, generate checksum on output data
> > > + *
> > > + */
> > > +struct rte_comp_xform {
> > > +      struct rte_comp_xform *next;
> > > +      /**< next xform in chain */
> > > +      enum rte_comp_xform_type type;
> > > +      /**< xform type */
> > > +      union {
> > > +            struct rte_comp_compress_xform compress;
> > > +            /**< xform for compress operation */
> > > +            struct rte_comp_decompress_xform decompress;
> > > +            /**< xform for decompress operation */
> > > +            struct rte_comp_checksum_xform chksum;
> > > +            /**< xform to generate checksums */
> >
> > We need compress+checksum(and decompress+checksum) as a single
> xform
> > instead of requiring 2 xforms. The use of xform chain will introduce
> > needless overhead for the most typical use-cases of comp+checksum(and
> > decomp+checksum).
> >
> [Fiona] ok.
> I can merge so we just have compress & decompress xforms with an integrity
> (checksum) param.
> So now I see why you want to rename rte_comp_xform_type as
> rte_comp_op_type.
> Calling it ..op_type though implies a 1:1 mapping between xform and
> operation,
> so assumes there will never be any chained xforms.
> If this is the case we should also remove the *next parameter.
> I'd followed the crypto xform chain pattern in case it proves more
> extendable in future,
> but if we can't foresee any case for it, then we should simplify.
> If we choose to keep the *next then we should keep the name as
> rte_comp_xform_type.
> 

 [Shally] I would prefer to keep "next" for extensibility, if that require 
naming to be retained as "xform" am fine by that.
 
Just a note, same is proposal for hash as well i.e. compress+hash or 
compress+checksum+hash will be one xform.
To support compress+checksum+hash, data structure would need to be modified.

> 
> > > +      };
> > > +};
> > > +
> > > +
> > > +struct rte_comp_session;

[Shally] It is named as rte_crytodev_session in crypto API unlike here. So is 
it intentional?

> > > +
> > > +/**
> > > + * Compression Operation.
> > > + *
> > > + * This structure contains data relating to performing a compression
> > > + * operation on the referenced mbuf data buffers.
> > > + *
> > > + * All compression operations are Out-of-place (OOP) operations,
> > > + * as the size of the output data is different to the size of the input 
> > > data.
> > > + *
> > > + * Comp operations are enqueued and dequeued in comp PMDs using
> the
> > > + * rte_compdev_enqueue_burst() / rte_compdev_dequeue_burst()
> APIs
> > > + */
> > > +struct rte_comp_op {
> > > +      uint8_t status;
> > > +      /**<
> > > +      * operation status - use values from enum rte_comp_status.
> > > +      * This is reset to
> > > +      * RTE_COMP_OP_STATUS_NOT_PROCESSED on allocation from
> mempool and

[Shally] Per my understanding, RTE_COMP_OP_STATUS_NOT_PROCESSED will be set 
only at the return of rte_comp_op_alloc() and never after.
If that understanding correct, shouldn't it be renamed as 
RTE_COMP_OP_STATUS_ALLOCATED?

> > > +      * will be set to RTE_COMP_OP_STATUS_SUCCESS after operation
> > > +      * is successfully processed by a PMD
> > > +      */
> > > +      uint16_t debug_status;
> > > +      /**<
> > > +      * Status of the operation is returned in the rte_crypto_op.

[Shally] Typo correction to rte_comp_op

> > > +      * This field allows the PMD to pass back extra
> > > +      * pmd-specific debug information. Value is not defined on the API.
> > > +      */
> > > +      struct rte_comp_session *session;
> > > +      /**< Handle for the initialised session context */
> > > +      struct rte_mempool *mempool;
> > > +      /**< mempool from which operation is allocated */
> > > +      phys_addr_t phys_addr;
> > > +      /**< physical address of this operation */
> > > +
> > > +      struct rte_mbuf *m_src;   /**< source mbuf */
> > > +      struct rte_mbuf *m_dst;   /**< destination mbuf */
> > > +
> > > +      struct {
> > > +            uint32_t offset;
> > > +            /**< Starting point for compression or decompression,
> > > +            * specified as number of bytes from start of packet in
> > > +            * source buffer.
> > > +            * Starting point for checksum generation in compress 
> > > direction.
> > > +            */
> > > +            uint32_t length;
> > > +            /**< The length, in bytes, of the data in source buffer
> > > +            * to be compressed or decompressed.
> > > +            * Also the length of the data over which the checksum
> > > +            * should be generated in compress direction
> > > +            */
> > > +      } src;
> > > +      struct {
> > > +            uint32_t offset;
> > > +            /**< Starting point for writing output data, specified as
> > > +            * number of bytes from start of packet in m_dst
> > > +            * buffer. Starting point for checksum generation in
> > > +            * decompress direction.
> > > +            */

[Shally] Don't we need length here? indicating length (in bytes) of available 
dst mbuf at input?

> > > +      } dst;
> > > +      enum rte_comp_flush_flag flush_flag;
> > > +      /**< defines flush characteristics for the output data.
> > > +      * Only applicable in compress direction.
> > > +      */
> > > +      enum rte_comp_integrity_check verify;
> > > +      /**< Specifies if the output data should be verified.
> > > +        Only applicable in compress direction.
> > > +      */
> >
> > Again, believe we can get away with this by moving to application.
> >
> > Thanks
> > Shally
> >
> > > +      uint64_t input_chksum;
> > > +      /**< An input checksum can be provided to generate a
> > > +      * cumulative checksum across sequential blocks.
> > > +      * Checksum type is as specified in chksum_type in xform.
> > > +      */
> > > +      uint64_t output_chksum;
> > > +      /**< If a checksum is generated it will be written in here.
> > > +      * Checksum type is as specified in chksum_type in xform.
> > > +      */
> > > +      uint32_t consumed;
> > > +      /**< The number of bytes from the source buffer
> > > +      * which were compressed/decompressed.
> > > +      */
> > > +      uint32_t produced;
> > > +      /**< The number of bytes written to the destination buffer
> > > +      * which were compressed/decompressed.
> > > +      */
> > > +      /*
> > > +      TODO - Are any extra params needed on stateful op or are all in
> xform?
> > > +      rte_comp_op_common_params/_stateful_params?
> > > +      extra values apply to flush flag
> > > +      */
> > > +};
> > > +
> > > +
> > > + int
> > > + rte_compdev_configure(uint8_t dev_id, struct rte_compdev_config
> *config);
> > > + int
> > > + rte_compdev_start(uint8_t dev_id);
> > > + void
> > > + rte_compdev_stop(uint8_t dev_id);
> > > /* etc...Rest of APIs follow same pattern, similar to cryptodev, stripped
> from RFC for simplicity. */
> > >
> > > +/**
> > > + * Compression supported feature flags
> > > + *
> > > + * Note:
> > > + * New features flags should be added to the end of the list
> > > + *
> > > + * Keep these flags synchronised with
> rte_compdev_get_comp_feature_name()
> > > + * Note: bit numbers here are same as on cryptodev, hence the gaps.
> > > + * This is an implementation detail which may change.
> > > + */
> > > + #define       RTE_COMPDEV_FF_CPU_SSE          (1ULL << 3)
> > > + /**< Utilises CPU SIMD SSE instructions */
> > > + #define       RTE_COMPDEV_FF_CPU_AVX          (1ULL << 4)
> > > + /**< Utilises CPU SIMD AVX instructions */
> > > + #define       RTE_COMPDEV_FF_CPU_AVX2         (1ULL << 5)
> > > + /**< Utilises CPU SIMD AVX2 instructions */
> > > + #define       RTE_COMPDEV_FF_HW_ACCELERATED           (1ULL << 7)
> > > + /**< Operations are off-loaded to an external hardware accelerator */
> > > + #define       RTE_COMPDEV_FF_CPU_AVX512               (1ULL << 8)
> > > + /**< Utilises CPU SIMD AVX512 instructions */
> > > + #define       RTE_COMPDEV_FF_CPU_NEON                        (1ULL << 
> > > 10)
> > > + /**< Utilises CPU NEON instructions */
> > > + #define       RTE_COMPDEV_FF_CPU_ARM_CE               (1ULL << 11)
> > > + /**< Utilises ARM CPU Cryptographic Extensions */
> > > +
> > > + #define     RTE_COMP_FF_MBUF_SCATTER_GATHER                    (1ULL <<
> 12)
> > > + /**< Scatter-gather mbufs are supported */
> > > + #define     RTE_COMP_FF_STATEFUL                                        
> > >          (1ULL <<
> 13)
> > > + /**< Stateful compression is supported */
> > > + #define     RTE_COMPDEV_FF_MULTI_PKT_CHECKSUM               (1ULL <<
> 14)
> > > + /**< Generation of checksum across multiple packets is supported */
> > > +

[Shally] What is meaning of this feature? how it is supposed to impact burst 
operations.

> > > +
> > > +
> > > +/**  Compression device information */
> > > +struct rte_compdev_info {
> > > +      const char *driver_name;  /**< Driver name. */
> > > +      uint8_t driver_id;               /**< Driver identifier */
> > > +      struct rte_device *dev;          /**< Device information. */
> > > +
> > > +      uint64_t feature_flags;
> > > +      /**< device feature flags */
> > > +     const struct rte_comp_capabilities *capabilities;
> > > +      /**< Array of devices capabilities */
> > >
> > > +      unsigned max_nb_queue_pairs;
> > > +      /**< Maximum number of queues pairs supported by device. */
> > > +      unsigned max_nb_sessions;
> > > +      /**< Maximum number of sessions supported by device. */
> > > +      unsigned int max_nb_sessions_per_qp;
> > > +      /**< Maximum number of sessions per queue pair.
> > > +      * Default 0 for infinite sessions
> > > +      */
> > > +      /* TODO uint8_t needs_intermediate_buffer OR should this be a
> feature flag? Or not necessary?
> > */
> > > +};
> > > +
> > > +
> > > +/** Compression device statistics */
> > > +struct rte_compdev_stats {
> > > +      uint64_t enqueued_count;
> > > +      /**< Count of all operations enqueued */
> > > +      uint64_t dequeued_count;
> > > +      /**< Count of all operations dequeued */
> > > +
> > > +      uint64_t enqueue_err_count;
> > > +      /**< Total error count on operations enqueued */
> > > +      uint64_t dequeue_err_count;
> > > +      /**< Total error count on operations dequeued */
> > > +};
> > > +
> > > +
> > > +/** Compression device configuration structure */
> > > +struct rte_compdev_config {
> > > +      int socket_id;
> > > +      /**< Socket to allocate resources on */
> > > +      uint16_t nb_queue_pairs;
> > > +      /**< Total number of queue pairs to configure on a device. */
> > > +      uint32_t comp_intermediate_buf_size;
> > > +     /**< For deflate algorithm HW accelerators usually need to allocate
> > > +      * a pool of intermediate buffers for dynamic compression.

[Shally] Do you mean Dynamic Huffman coding based compression?

> > > +      * This indicates the buffer size and should be
> > > +      * set a little larger than the expected max source buffer size.
> > > +      * if the output of static compression doesn't fit in the
> > > +      * intermediate buffer dynamic compression may not be possible,
> > > +      * in this case the accelerator may revert back to static 
> > > compression.
> > > +      */
> > > +};
> > > +
> > > +
> > > +/** Compression device queue pair configuration structure. */
> > > +struct rte_compdev_qp_conf {
> > > +      uint32_t nb_descriptors; /**< Number of descriptors per queue pair
> */
> > > +};
> > > +
> > > +

[Shally] Subsequent APIs and part of above has reference to cryptodev and I 
have some question on their one-to-one mapping. I would not list them right 
now. Will take up once we freeze feedbacks on current RFC version so that we 
can have focused discussion around rest.

Thanks
Shally

> > > +typedef uint16_t (*dequeue_pkt_burst_t)(void *qp,
> > > +            struct rte_comp_op **ops, uint16_t nb_ops);
> > > +/**< Dequeue processed packets from queue pair of a device. */
> > > +
> > > +typedef uint16_t (*enqueue_pkt_burst_t)(void *qp,
> > > +            struct rte_comp_op **ops, uint16_t nb_ops);
> > > +/**< Enqueue packets for processing on queue pair of a device. */
> > > +
> > > +
> > > +
> > > +/** The data structure associated with each compression device. */
> > > +struct rte_compdev {
> > > +
> > > +      dequeue_pkt_burst_t dequeue_burst;
> > > +      /**< Pointer to PMD receive function. */
> > > +      enqueue_pkt_burst_t enqueue_burst;
> > > +      /**< Pointer to PMD transmit function. */
> > > +
> > > +      struct rte_compdev_data *data;
> > > +      /**< Pointer to device data */
> > > +      struct rte_compdev_ops *dev_ops;
> > > +      /**< Functions exported by PMD */
> > > +      uint64_t feature_flags;
> > > +      /**< Supported features */
> > > +
> > > +     struct rte_device *device;
> > > +      /**< Backing device */
> > > +
> > > +      uint8_t driver_id;
> > > +      /**< driver identifier*/
> > > +
> > > +      uint8_t attached : 1;
> > > +      /**< Flag indicating the device is attached */
> > > +
> > > +}
> > > +
> > > +/**
> > > + *
> > > + * The data part, with no function pointers, associated with each device.
> > > + *
> > > + * This structure is safe to place in shared memory to be common among
> > > + * different processes in a multi-process configuration.
> > > + */
> > > +struct rte_compdev_data {
> > > +      uint8_t dev_id;
> > > +      /**< Device ID for this instance */
> > > +      uint8_t socket_id;
> > > +      /**< Socket ID where memory is allocated */
> > > +      char name[RTE_COMPDEV_NAME_MAX_LEN];
> > > +      /**< Unique identifier name */
> > > +
> > > +      uint8_t dev_started : 1;
> > > +      /**< Device state: STARTED(1)/STOPPED(0) */
> > > +
> > > +      void **queue_pairs;
> > > +      /**< Array of pointers to queue pairs. */
> > > +      uint16_t nb_queue_pairs;
> > > +      /**< Number of device queue pairs. */
> > > +
> > > +      void *dev_private;
> > > +      /**< PMD-specific private data */
> > > +}
> > > +
> > > +
> > > +/**
> > > + *
> > > + * Dequeue a burst of processed compression operations from a queue
> on the
> > > + * device. The dequeued operation are stored in *rte_comp_op*
> structures
> > > + * whose pointers are supplied in the *ops* array.
> > > + *
> > > + * The rte_compdev_dequeue_burst() function returns the number of
> ops
> > > + * actually dequeued, which is the number of *rte_comp_op* data
> structures
> > > + * effectively supplied into the *ops* array.
> > > + *
> > > + * A return value equal to *nb_ops* indicates that the queue contained
> > > + * at least *nb_ops* operations, and this is likely to signify that other
> > > + * processed operations remain in the devices output queue.
> Applications
> > > + * implementing a "retrieve as many processed operations as possible"
> policy
> > > + * can check this specific case and keep invoking the
> > > + * rte_compdev_dequeue_burst() function until a value less than
> > > + * *nb_ops* is returned.
> > > + *
> > > + * The rte_compdev_dequeue_burst() function does not provide any
> error
> > > + * notification to avoid the corresponding overhead.
> > > + *
> > > + * @param   dev_id       The compression device identifier
> > > + * @param   qp_id        The index of the queue pair from which to
> > > + *                       retrieve processed packets. The value must be
> > > + *                       in the range [0, nb_queue_pair - 1] previously
> > > + *                       supplied to rte_compdev_configure().
> > > + * @param   ops          The address of an array of pointers to
> > > + *                       *rte_comp_op* structures that must be
> > > + *                       large enough to store *nb_ops* pointers in it.
> > > + * @param   nb_ops       The maximum number of operations to
> dequeue.
> > > + *
> > > + * @return
> > > + *   - The number of operations actually dequeued, which is the number
> > > + *   of pointers to *rte_comp_op* structures effectively supplied to the
> > > + *   *ops* array.
> > > + */
> > > +static inline uint16_t
> > > +rte_compdev_dequeue_burst(uint8_t dev_id, uint16_t qp_id,
> > > +            struct rte_comp_op **ops, uint16_t nb_ops)
> > > +{
> > > +      struct rte_compdev *dev = &rte_compdevs[dev_id];
> > > +
> > > +      nb_ops = (*dev->dequeue_burst)
> > > +                   (dev->data->queue_pairs[qp_id], ops, nb_ops);
> > > +
> > > +      return nb_ops;
> > > +}
> > > +
> > > +/**
> > > + * Enqueue a burst of operations for processing on a compression
> device.
> > > + *
> > > + * The rte_compdev_enqueue_burst() function is invoked to place
> > > + * comp operations on the queue *qp_id* of the device designated by
> > > + * its *dev_id*.
> > > + *
> > > + * The *nb_ops* parameter is the number of operations to process
> which are
> > > + * supplied in the *ops* array of *rte_comp_op* structures.
> > > + *
> > > + * The rte_compdev_enqueue_burst() function returns the number of
> > > + * operations it actually enqueued for processing. A return value equal
> to
> > > + * *nb_ops* means that all packets have been enqueued.
> > > + *
> > > + * @param   dev_id       The identifier of the device.
> > > + * @param   qp_id        The index of the queue pair which packets are
> > > + *                       to be enqueued for processing. The value
> > > + *                       must be in the range [0, nb_queue_pairs - 1]
> > > + *                       previously supplied to
> > > + *                       *rte_compdev_configure*.
> > > + * @param   ops          The address of an array of *nb_ops* pointers
> > > + *                       to *rte_comp_op* structures which contain
> > > + *                       the operations to be processed.
> > > + * @param   nb_ops       The number of operations to process.
> > > + *
> > > + * @return
> > > + * The number of operations actually enqueued on the device. The
> return
> > > + * value can be less than the value of the *nb_ops* parameter when
> the
> > > + * device's queue is full or if invalid parameters are specified in
> > > + * a *rte_comp_op*.
> > > + */
> > > +static inline uint16_t
> > > +rte_compdev_enqueue_burst(uint8_t dev_id, uint16_t qp_id,
> > > +            struct rte_comp_op **ops, uint16_t nb_ops)
> > > +{
> > > +      struct rte_compdev *dev = &rte_cryptodevs[dev_id];
> > > +
> > > +      return (*dev->enqueue_burst)(
> > > +                   dev->data->queue_pairs[qp_id], ops, nb_ops);
> > > +}
> > > +
> > >
> > > /* All session APIs stripped from RFC for simplicity. */
> > >
> > > +/** compression device operations function pointer table */
> > > +struct rte_compdev_ops {
> > > +      compdev_configure_t dev_configure;    /**< Configure device. */
> > > +      compdev_start_t dev_start;            /**< Start device. */
> > > +      compdev_stop_t dev_stop;        /**< Stop device. */
> > > +      compdev_close_t dev_close;            /**< Close device. */
> > > +
> > > +      compdev_info_get_t dev_infos_get;     /**< Get device info. */
> > > +
> > > +      compdev_stats_get_t stats_get;
> > > +      /**< Get device statistics. */
> > > +      compdev_stats_reset_t stats_reset;
> > > +      /**< Reset device statistics. */
> > > +
> > > +      compdev_queue_pair_setup_t queue_pair_setup;
> > > +      /**< Set up a device queue pair. */
> > > +      compdev_queue_pair_release_t queue_pair_release;
> > > +      /**< Release a queue pair. */
> > > +      compdev_queue_pair_start_t queue_pair_start;
> > > +      /**< Start a queue pair. */
> > > +      compdev_queue_pair_stop_t queue_pair_stop;
> > > +      /**< Stop a queue pair. */
> > > +      compdev_queue_pair_count_t queue_pair_count;
> > > +      /**< Get count of the queue pairs. */
> > > +
> > > +      comp_get_session_private_size_t session_get_size;
> > > +      /**< Return private session. */
> > > +      comp_configure_session_t session_configure;
> > > +      /**< Configure a comp session. */
> > > +      comp_free_session_t session_clear;
> > > +      /**< Clear a comp sessions private data. */
> > > +      comp_queue_pair_attach_session_t qp_attach_session;
> > > +      /**< Attach session to queue pair. */
> > > +      comp_queue_pair_attach_session_t qp_detach_session;
> > > +      /**< Detach session from queue pair. */
> > > +};
> > >
> > >
> > >
> > >

Re: [dpdk-dev] [RFC] Compression API in DPDK

Reply via email to