Re: [dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev subsystem

Ori Kam Sun, 26 Jan 2020 03:55:25 -0800

Hi Jerin and Xiang,
PSB

> -----Original Message-----
> From: dev <[email protected]> On Behalf Of Wang Xiang
> Sent: Monday, October 14, 2019 4:59 PM
> To: Jerin Jacob Kollanukkaran <[email protected]>
> Cc: Thomas Monjalon <[email protected]>; [email protected]; Pavan
> Nikhilesh Bhagavatula <[email protected]>; Shahaf Shuler
> <[email protected]>; Hemant Agrawal <[email protected]>;
> Opher Reviv <[email protected]>; Alex Rosenbaum
> <[email protected]>; Dovrat Zifroni <[email protected]>; Prasun Kapoor
> <[email protected]>; Nipun Gupta <[email protected]>; Richardson,
> Bruce <[email protected]>; Hong, Yang A <[email protected]>;
> Chang, Harry <[email protected]>; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Ni, Hongjun <[email protected]>; j.bromhead@titan-
> ic.com; [email protected]; [email protected]; [email protected]; Guy Kaneti
> <[email protected]>; Smadar Fuks <[email protected]>; Liron Himi
> <[email protected]>; [email protected]; [email protected]
> Subject: Re: [dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev
> subsystem
> 
> On Fri, Sep 27, 2019 at 02:35:00PM +0000, Jerin Jacob Kollanukkaran wrote:
> > > -----Original Message-----
> > > From: Wang Xiang <[email protected]>
> > >
> > > Hi Jerin,
> > >
> > > Thanks for your response. More comments below and inline.
> > >
> > > 1) I think the size of some varaibles (e.g. nb_matches, scan_size, 
> > > matching
> > > offset, etc) should be increased based on what Hyperscan supports.
> > >
> > >     a) struct rte_regex_ops:
> > >
> > >         uint16_t scan_size => uint32_t scan_size
> >
> > I think, packet buffers will not be > 64K and getting more than contiguous
> > 64K DMAable memory will be difficult in DPDK.
> > Other than that, rte_regex_match is 64bit now, increasing width of
> > Len could increase the size of  "rte_regex_match". i.e Need more
> > Bandwidth for response.
> > Could other HW implementations share the views on max length
> > is supported on their implementation? Based on that we can decide.
> >
> OK, let's gather ideas from HW implementation.


I agree, that 16 bit for buffer length is good, and that the size of 
rte_regex_match
should stay 64 bit, in order to have better performance. (PCI bandwidth and 
caching)

> >
> > >         uint8_t nb_actual_matches => uint64 nb_actual_matches
> > >         uint8_t nb_matches => uint64 nb__matches
> >
> > 2^64 matches will be never possible in practical system. How about 2^16.
> >
> I think the number of matches depends on the number of total rules and
> scan size. Based on the definitions (16-bit nb_rules_per_group,
> 16-bit nb_groups and 16-bit scan size), the maximum possible matches
> could exceed 2^16. Users may get partial matches in this case while
> Hyperscan doesn't make compromises. It'll also be good to check other HW
> implementation.

I think  that we can increase the number of matches to 16 bit. But in any case 
our HW can't
support working on more then 4 groups in a single search, and since we can't 
support buffer
larger then 2^16, and if we could saving that many results in in HW is not 
practical.

> >
> > >
> > >     b) struct rte_regex_match:
> > >         uint16_t offset => uint32_t offset
> > >         uint16_t len => uint32_t len
> >
> > See above.
> >
> > >
> > >     c) uint16_t
> > >         rte_regex_rule_db_update(uint8_t dev_id, const struct 
> > > rte_regex_rule
> > > *rules,
> > >                                  uint16_t nb_rules);
> > >     =>
> > >        uint32_t
> > >         rte_regex_rule_db_update(uint8_t dev_id, const struct 
> > > rte_regex_rule
> > > *rules,
> > >                                  uint32_t nb_rules);
> >
> > OK. I will change it next version.
> >
> > >
> > >     d) int
> > >     rte_regex_queue_pair_setup(uint8_t dev_id, uint8_t queue_pair_id,
> > >                     const struct rte_regex_qp_conf *qp_conf);
> > >     =>
> > >        int
> > >     rte_regex_queue_pair_setup(uint8_t dev_id, uint16_t queue_pair_id,
> > >                     const struct rte_regex_qp_conf *qp_conf);
> >
> > OK. I will change it next version.
> >
> > >
> > >     e) struct rte_regex_dev_config:
> > >         uint8_t nb_max_matches => uint64_t nb_max_matches
> >
> > 2^64 matches will be never possible in practical system. How about 2^16.
> >
> See above.

See above.

> >
> > >
> > >     f) struct rte_regex_dev_info:
> > >         uint8_t max_matches => uint64_t max_matches
> >
> > 2^64 matches will be never possible in practical system. How about 2^16.
> >
> See above.

See above.

> >
> > >
> > > 2) There are rte_regex_dev_attr_get() and rte_regex_dev_attr_set()
> defined.
> > > Are all the attributes below could be set by users? Is any of them 
> > > read-only?
> >
> > See below,
> >
> > > /** Enumerates RegEx device attribute identifier */ enum
> > > rte_regex_dev_attr_id {
> > >     RTE_REGEX_DEV_ATTR_SOCKET_ID,
> > >     /**< The NUMA socket id to which the device is connected or
> > >      * a default of zero if the socket could not be determined.
> > >      * datatype: *int*
> > >      * operation: *get*
> >
> > *get*  means read only. *get* and *set* means it support both operation
> >
> > >      */
> > >     RTE_REGEX_DEV_ATTR_MAX_MATCHES,
> > >     /**< Maximum number of matches per scan.
> > >      * datatype: *uint8_t*
> > >      * operation: *get* and *set*
> > >      *
> > >      * @see RTE_REGEX_OPS_RSP_MAX_MATCH_F
> > >      */
> > >     RTE_REGEX_DEV_ATTR_MAX_SCAN_TIMEOUT,
> > >     /**< Upper bound scan time in ns.
> > >      * datatype: *uint16_t*
> > >      * operation: *get* and *set*
> > >      *
> > >      * @see RTE_REGEX_OPS_RSP_MAX_SCAN_TIMEOUT_F
> > >      */
> > >     RTE_REGEX_DEV_ATTR_MAX_PREFIX,
> > >     /**< Maximum number of prefix detected per scan.
> > >      * This would be useful for denial of service detection.
> > >      * datatype: *uint16_t*
> > >      * operation: *get* and *set*
> > >      *
> > >      * @see RTE_REGEX_OPS_RSP_MAX_PREFIX_F
> > >      */
> > > };
> > >
> > > 3) Both RTE_REGEX_PCRE_RULE_* and
> > > RTE_REGEX_DEV_PCRE_UNSUP_* can be viewed as device capabilities. Can
> we
> > > merge them with RTE_REGEX_DEV_CAPA_RUNTIME_COMPILATION_F and
> have
> > > a unified regex_dev_capa in struct rte_regex_dev_info.
> >
> > Sure. I will fix it next version.
> >
> > >
> > >
> > > 4) It'll be good if we can also define synchronous matching API for users
> who
> > > want to have a one-off scan and wait for the results.
> >
> > Makes sense. I will add synchronous matching API in next version(I
> understand, it will be useful for SW
> > Implementations). Probably expose as INFO flag to expose the it as
> preference.
> >
> > >
> > > On Tue, Sep 10, 2019 at 08:05:39AM +0000, Jerin Jacob Kollanukkaran
> wrote:
> > > > Hi Xiang,
> > > >
> > > > Sorry for delay in response(Was busy with 19.11 proposal deadline).
> Please
> > > see inline.
> > > >
> > > > >
> > > > > Reply to Xiang's queries in main thread:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Some questions regarding APIs. Could you please give more insights?
> > > > >
> > > > > 1) rte_regex_ops
> > > > >       a) rsp_flags
> > > > >       These two flags RTE_REGEX_OPS_RSP_PMI_SOJ_F and
> > > > > RTE_REGEX_OPS_RSP_PMI_EOJ_F are used for cross buffer scan.
> > > > >       RTE_REGEX_OPS_RSP_PMI_EOJ_F tells whether we have a partial
> > > > > match at the end of current buffer after scan.
> > > > >       What's the purpose of having RTE_REGEX_OPS_RSP_PMI_SOJ_F?
> > > > >
> > > > > [Jerin] Since we need three states to represent partial match
> > > > > buffer, RTE_REGEX_OPS_RSP_PMI_SOJ_F to represent start of the
> > > > > buffer, intermediate buffers with no flag, and end of the buffer
> > > > > with RTE_REGEX_OPS_RSP_PMI_EOJ
> > > >
> > > > > [Xiang] How could a user leverage these flags for matching? Suppose
> > > > > a large buffer is divided into multiple chunks. Will
> > > > > RTE_REGEX_OPS_RSP_PMI_SOJ_F cause an early quit once it isn't set
> > > > > after scan the first chunk. Similarly, RTE_REGEX_OPS_RSP_PMI_EOJ
> > > > > tells a user whether to stop matching future buffers after finish the 
> > > > > last
> > > chunk?
> > > >
> > > > Let me describe with an example,
> > > >
> > > > Assume,
> > > > 1) struct rte_regex_dev_info:: max_payload_size set to 1024
> > > > 2) rte_regex_dev_config:: dev_cfg_flags configured with
> > > > RTE_REGEX_DEV_CFG_CROSS_BUFFER_SCAN_F
> > > > 3) Device programmed with matching "hello\s+world" pattern
> > > > 4) user enqueue struct rte_regex_ops:: buf_addr point following "data"
> > > > and struct rte_regex_op:: scan_size = 1024
> > > >
> > > > data[0..1021] = data don???t have hello world pattern data[1022] = 'h'
> > > > data[1023] = 'e'
> > > >
> > > > 5) user enqueue struct rte_regex_ops:: buf_addr point following "data"
> > > > and struct rte_regex_op:: scan_size = 9
> > > >
> > > > data[0] = 'l'
> > > > data[1] = 'l'
> > > > data[2] = 'o'
> > > > data[3] = ' '
> > > > data[4] = 'w'
> > > > data[5] = 'o'
> > > > data[6] = 'r'
> > > > data[7] = 'l'
> > > > data[8] = 'd'
> > > >
> > > > If so,
> > > >
> > > > Response to 4) will be RTE_REGEX_OPS_RSP_PMI_SOJ_F in
> rte_regex_ops::
> > > > rsp_flags on dequeue Where rte_regex_match:: offset is 1022 and len 2
> > > >
> > > > Response to 5) will be RTE_REGEX_OPS_RSP_PMI_EOJ_F in
> rte_regex_ops::
> > > > rsp_flags on dequeue Where rte_regex_match:: offset is 0 and len 9
> > > >
> > > If the defined pattern is "hello.*world" instead of "hello\s+world", and 
> > > we
> > > enqueue following struct rte_regex_ops:
> > >
> > > 1) rte_regex_op:: scan_size = 1024
> > >
> > >    data[0..1021] = data don???t have hello world pattern
> > >    data[1022] = 'h'
> > >    data[1023] = 'e'
> > >
> > > 2) rte_regex_op:: scan_size = 9
> > >    data[0] = 'l'
> > >    data[1] = 'l'
> > >    data[2] = 'o'
> > >    data[3] = ' '
> > >    data[4] = 'w'
> > >    data[5] = 'o'
> > >    data[6] = 'r'
> > >    data[7] = 'l'
> > >    data[8] = 'd'
> > >
> > > 3) rte_regex_op:: scan_size = 5
> > >    data[0] = 'w'
> > >    data[1] = 'o'
> > >    data[2] = 'r'
> > >    data[3] = 'l'
> > >    data[4] = 'd'
> > >
> > > Will response to 3) have RTE_REGEX_OPS_RSP_PMI_EOJ_F in
> rte_regex_ops::
> > > rsp_flags on dequeue
> > > Where rte_regex_match:: offset is 0 and len 4?
> >
> > Yes.
> >
> > >
> > > I am wondering what's your expected behavior for .* or similar syntax and
> if
> > > there are syntax compatability issues. We report all matches in Hyperscan,
> e.g.
> > > report end match offsets 11 and 16 for pattern "hello.*world" and corpus
> > > "hello worldworld".
> > >
> > > BTW, not sure how other hardware devices handle cross buffer scan.
> Hyperscan
> > > doesn't reports matches for start and intermediate buffers but only 
> > > reports
> end
> > > offset if a full match is found.
> > >
> > > >
> > > > >
> > > > >       RTE_REGEX_OPS_RSP_MAX_PREFIX_F: This looks like a definition
> > > > > for a specific hardware implementation. I am wondering what this
> > > > > PREFIX refers to:)?
> > > > >
> > > > > [Jerin] Yes. Looks like it is for hardware specific implementation.
> > > > > Introduced rte_regex_dev_attr_set/get functions to make it portable
> > > > > and To add new implementation specific fields.
> > > > > For example, if a rule is
> > > > > /ABCDEF.*XYZ/, ABCD is considered the prefix, and EF.*XYZ is
> > > > > considered the factor. The prefix is a literal string, while the
> > > > > factor can contain complex regular expression constructs. As a
> > > > > result, rule matching occurs in two stages: prefix matching and
> > > > > factor matching.
> > > > >
> > > > >       b)  user_id or user_ptr
> > > > >       Under what kind of circumstances should an application pass
> > > > > value into these variables for enqueue and dequeuer operations?
> > > > >
> > > > > [Jerin] Just like rte_crypto_ops, struct rte_regex_ops also
> > > > > allocated using mempool normally, on enqueue, user can specify
> > > > > user_id If needed to in order identify the op on dequeue if
> > > > > required. The use case could be to store the sequence number from
> > > > > application POV or storing the mbuf ptr in which pattern is requested
> etc.
> > > > >
> > > > >
> > > > >  2) rte_regex_match
> > > > >       a) offset; /**< Starting Byte Position for matched rule. */
> > > > > and  uint16_t len; /**< Length of match in bytes */
> > > > >       Looks like the matching offset is defined as *starting
> > > > > matching offset* instead of *end matching offset*, e.g. report the 
> > > > > offset
> of
> > > "a" instead of "c"
> > > > > for pattern "abc".
> > > > >       If so, this makes it hard to integrate software regex
> > > > > libraries such as Hyperscan and RE2 as they only report *end
> > > > > matching offset* without length of match.
> > > > >       Although Hyperscan has API for *starting matching offset*, it
> > > > > only delivers partial syntax support. So I think we have to define
> > > > > *end of matching offset* for software solutions.
> > > > >
> > > > > [Jerin] I understand the hyperscan's HS_FLAG_SOM_LEFTMOST
> tradeoffs.
> > > > > I thought application would need always the length of the match.
> > > > > Probably we will see how other HW implementation (from Mellanox)
> > > > > etc. We will try to abstract it, probably we can make it as function
> > > > > of "user requested".
> > > > > [Xiang] Yes, it will be good to make it per user request. At least
> > > > > from Hyperscan user's point of view, start of match and match length
> > > > > are not mandatory.
> > > >
> > > > OK. I think, we can introduce RTE_REGEX_DEV_CFG_MATCH_AS_START In
> > > > device configure.
> > > >
> > > > Since offset+len == end, we can introduce following generic inline
> function.
> > > >
> > > > static inline
> > > > rte_regex_match_end(truct rte_regex_match *match) {
> > > >         match->offset + match->len;
> > > > }
> > > >
> > > > Example:  pattern to match is  "hello\s+world"  and data is following
> > > > data[4] = 'h'
> > > > data[5] = 'e'
> > > > data[6] = 'l'
> > > > data[7] = 'l'
> > > > data[8] = 'o'
> > > > data[9] = ' '
> > > > data[10] = 'w'
> > > > data[11] = 'o'
> > > > data[12] = 'r'
> > > > data[13] = 'l'
> > > > data[14] = 'd'
> > > >
> > > > if device is configured with RTE_REGEX_DEV_CFG_MATCH_AS_START
> > > > match->offset returns 4
> > > > match->len returns 11
> > > >
> > > > if device is NOT configured with
> RTE_REGEX_DEV_CFG_MATCH_AS_START
> > > > driver MAY return the following(in hyperscan case)
> > > > match->offset returns 0
> > > > match->len returns 11 + 4
> > > >
> > > > In both case(irrespective of flags, to make application life easy)
> > > rte_regex_match_end() would return 15.
> > > > If application demands for MATCH_AS_START then driver can return
> > > > match->offset returns 4 and match->len returns 11 Aka set
> > > > HS_FLAG_SOM_LEFTMOST in hyperscan driver, But application should use
> > > rte_regex_match_end() for finding the end of the match. To make, work in
> all
> > > cases.
> > > >
> > > > Is it OK?
> > > >
> > > Can we replace len with end offset? So we can change "offset" to
> "start_offset"
> > > and len to "end_ offset" in struct rte_regex_match. Users interested in 
> > > len
> > > could take "end_offset - start_offset".
> > > We may also change RTE_REGEX_DEV_CFG_MATCH_AS_START to
> > > RTE_REGEX_DEV_CFG_MATCH_START
> > >
> > > In your example,
> > > if device is configured with RTE_REGEX_DEV_CFG_MATCH_START
> > > match->start_offset returns 4
> > > match->end_offset returns 15
> > >
> > > if device is NOT configured with RTE_REGEX_DEV_CFG_MATCH_START
> > > match->start_offset returns 0
> > > match->end_offset returns 15
> >
> >
> > This part is little tricky as HW descriptions need to be rewritten on 
> > response.
> > This is a one issue, I foresee earlier, to come up with rte_regex_match
> > That's works for all implementation  without performance issue.
> >
> > We have two HW implementations, both returns start_off and len.
> > Lets get input from other HW implementation on the semantics of
> > rte_regex_match. Based on that, we can decide how to go about it?
> > Thoughts from Mellanox or other vendors?
> >
> Sure. Let's get more inputs on this.


I think Jerin approach is the better one, since at least in our case we see a 
request 
to copy the match, so it is more user friendly to give the offset and len.

> >
> >
> > >
> > > > >
> > > > > 3)  rte_regex_rule_db_update()
> > > > >     Does this mean we can dynamically add or delete rules for an
> > > > > already generated database without recompile from scratch for
> > > > > hardware Regex implementation?
> > > > >     If so, this isn't possible for software solutions as they don't
> > > > > support dynamic database update and require recompile.
> > > > >
> > > > > [Jerin] rte_regex_rule_db_update() internally it would call
> > > > > recompile function for both HW and SW.
> > > > > See rte_regex_dev_config::rule_db in rte_regex_dev_configure() for
> > > > > precompiled rule database case.
> > > > > [Xiang] OK, sounds like we have to save the original rule-set for
> > > > > the device in order to do recompile. I see both ADD and REMOVE
> > > > > operators from rte_regex_rule.
> > > > > For rules with REMOVE operator, what's the expected behavior to
> > > > > handle them for the old rule-set? Do we need to go through the old
> > > > > rule-set and remove corresponding rules before doing recompile?
> > > >
> > > > Yes.
> > > >
> > > I think it'll be better to change rte_regex_rule_db_update() to
> > > rte_regex_rule_compile() and have users to provide a full rule-set.
> > > So we don't have to maintain old rule-set and decide which one to keep and
> > > remove. We can simply recompile new rule-set and get rid of
> > > rte_regex_rule_op in this case.
> >
> >
> > On virtualized, HW implementations, The RULE database is maintained by
> single
> > body. So the above scheme, works with SW and HW implementations.
> > And It make user life easy as they don't need to maintain the rules.
> >
> > I don't have preference on the rte_regex_rule_db_update() name, I can
> change to
> > rte_regex_rule_compile() if required keeping above functionality. Let me
> know.
> >
> >
> OK, I'm good if your are willing to maintain it for users. Then both
> rte_regex_rule_db_update() and rte_regex_rule_compile() work for me.

Combining with Shahaf request. My suggestion is:
rte_regex_rule_db_update() - only insert/remove rules from the internal-set
rte_regex_rule_db_compile_activate() - compile and activate the new rule set.

> >
> >
> >
> >
> >
> >

Best,
Ori

Re: [dpdk-dev] [RFC PATCH v1] regexdev: introduce regexdev subsystem

Reply via email to