Hi Xiang, > -----Original Message----- > From: Wang Xiang <xiang.w.w...@intel.com> > Sent: Thursday, February 27, 2020 11:26 AM > To: Ori Kam <or...@mellanox.com> > Cc: Jerin Jacob <jerinjac...@gmail.com>; Jerin Jacob <jer...@marvell.com>; > dpdk-dev <dev@dpdk.org>; Pavan Nikhilesh <pbhagavat...@marvell.com>; > Shahaf Shuler <shah...@mellanox.com>; Hemant Agrawal > <hemant.agra...@nxp.com>; Opher Reviv <op...@mellanox.com>; Alex > Rosenbaum <al...@mellanox.com>; dov...@marvell.com; Prasun Kapoor > <pkap...@marvell.com>; Nipun Gupta <nipun.gu...@nxp.com>; Richardson, > Bruce <bruce.richard...@intel.com>; yang.a.h...@intel.com; > harry.ch...@intel.com; gu.ji...@zte.com.cn; shanjia...@chinatelecom.cn; > zhangy....@chinatelecom.cn; lixin...@huachentel.com; wush...@inspur.com; > yuying...@yxlink.com; fanchengg...@sunyainfo.com; > davidf...@tencent.com; liuzho...@chinaunicom.cn; > zhaoyon...@huawei.com; o...@yunify.com; j...@netgate.com; > hongjun...@intel.com; j.bromh...@titan-ic.com; d...@ntop.org; > f...@napatech.com; arthur...@lionic.com; Thomas Monjalon > <tho...@monjalon.net> > Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem > > Hi Ori, > > Thanks for the comments. > > Hyperscan supports both start_offset and end_offset modes with most > users choosing end_offset for rule coverage and performance reasons. > I'm OK to have the default behavior with start_offset and len. > It'll be good to change RTE_REGEX_DEV_CFG_MATCH_AS_START to > RTE_REGEX_DEV_CFG_MATCH_AS_END. For users who need only end_offset, > they have to set RTE_REGEX_DEV_CFG_MATCH_AS_END bit. We may also > remove > RTE_REGEX_DEV_SUPP_MATCH_AS_START if you like.
Since you say that Hyperscan can support both modes, What about changing the cap field to RTE_REGEX_DEV_SUPP_MATCH_AS_END? > > One question is related to the consistency of start_offset definition > among different solutions, does all solutions return the leftmost > start_offset, i.e. for rule: foo.*bar and input: foofoobar, the returned > start_offset will be 0 not 3? > Yes, you are correct, In Mellanox case there will be only one result: start_offset 0 > Thanks, > Xiang > > On Wed, Feb 26, 2020 at 08:36:51AM +0000, Ori Kam wrote: > > Hi Xiang, > > > > > > > -----Original Message----- > > > From: dev <dev-boun...@dpdk.org> On Behalf Of Wang Xiang > > > Sent: Wednesday, February 26, 2020 11:03 AM > > > To: Ori Kam <or...@mellanox.com> > > > Cc: Jerin Jacob <jerinjac...@gmail.com>; Jerin Jacob > <jer...@marvell.com>; > > > dpdk-dev <dev@dpdk.org>; Pavan Nikhilesh <pbhagavat...@marvell.com>; > > > Shahaf Shuler <shah...@mellanox.com>; Hemant Agrawal > > > <hemant.agra...@nxp.com>; Opher Reviv <op...@mellanox.com>; Alex > > > Rosenbaum <al...@mellanox.com>; dov...@marvell.com; Prasun Kapoor > > > <pkap...@marvell.com>; Nipun Gupta <nipun.gu...@nxp.com>; > Richardson, > > > Bruce <bruce.richard...@intel.com>; yang.a.h...@intel.com; > > > harry.ch...@intel.com; gu.ji...@zte.com.cn; > shanjia...@chinatelecom.cn; > > > zhangy....@chinatelecom.cn; lixin...@huachentel.com; > wush...@inspur.com; > > > yuying...@yxlink.com; fanchengg...@sunyainfo.com; > > > davidf...@tencent.com; liuzho...@chinaunicom.cn; > > > zhaoyon...@huawei.com; o...@yunify.com; j...@netgate.com; > > > hongjun...@intel.com; j.bromh...@titan-ic.com; d...@ntop.org; > > > f...@napatech.com; arthur...@lionic.com; Thomas Monjalon > > > <tho...@monjalon.net> > > > Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev subsystem > > > > > > Hi Ori and Jerin, > > > > > > One comment regarding my concern with len and end_offset problem. > > > From open source SW regex library(libpcre, re2 and Hyperscan) and > > > Intel's perspective, the matching results returned are always start > > > offset and end offset. More importantly, Hyperscan only reports end offset > > > most of the time. > > > > > > It'll be good to keep this union as an abstraction and enforce the default > > > behavior for each solution, i.e. HW solutions doesn't support > MATCH_AS_START > > > flag at rule compile time. Applications will know the meaning of variable > > > at > > > rule compile time with the flag so they don't have to do extra check at > > > fast > path > > > run-time matching. > > > Welcome for better abstraction ideas. > > > > > > > I don't mind to keep the union as it was in V3, but I would like to remove > > the > > configuration bit (RTE_REGEX_DEV_CFG_MATCH_AS_START). > > Meaning that if the device reports > RTE_REGEX_DEV_SUPP_MATCH_AS_START > > the result will always be with start_offset and len. > > > > Best, > > Ori > > > > > Thanks, > > > Xiang > > > > > > > > > > + /**< Starting Byte Position for matched > > > > > > > rule. */ > > > > > > > + RTE_STD_C11 > > > > > > > + union { > > > > > > > + uint16_t len; > > > > > > > + /**< Length of match in bytes */ > > > > > > > + uint16_t end_offset; > > > > > > > + /**< The end offset of the match. > > > > > > > In case > > > > > > > + * MATCH_AS_START configuration > > > > > > > is disabled. > > > > > > > + * @see > RTE_REGEX_DEV_CFG_MATCH_AS_START > > > > > > > + */ > > > > > > > > > > > > We have not concluded on this scheme. Have one field which has > > > > > > different meaning will be difficult > > > > > > for application. i.e fast path we need to have a check for this. > > > > > > > > > > > > > > > > This is the time to conclude . at least for the first version. > > > > > Why do we have one field with different meaning? > > > > > The result can be ether len or end_offset. > > > > > > > > > > > I think, Based on the majority of HW/SW implementation, we need to > > > > > > either go with len or > > > > > > end_offset. What Mellanox HW returns? len or end_offset? > > > > > > > > > > > > > > > > From Mellanox perspective we prefer the len approach. We also think > > > > > it is much more user oriented. > > > > > > > > > > > or We can keep it as len or end_offset based on which drivers > upstream > > > > first, > > > > > > other drivers when it comes, we can see how to abstract it? > > > > > > > > > > > > > > > > I can except that assuming we choose the start and len approach > > > > > > > > I think, we can have first version with "start and len" by removing > > > > RTE_REGEX_DEV_CFG_MATCH_AS_START. > > > > When can think, how to abstract new drivers when it upstream based on > > > > the overhead. > > > > > > > > > > > > > On Tue, Feb 25, 2020 at 07:48:54AM +0000, Ori Kam wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: Jerin Jacob <jerinjac...@gmail.com> > > > > > Sent: Tuesday, February 25, 2020 7:57 AM > > > > > To: Ori Kam <or...@mellanox.com> > > > > > Cc: Jerin Jacob <jer...@marvell.com>; xiang.w.w...@intel.com; dpdk- > dev > > > > > <dev@dpdk.org>; Pavan Nikhilesh <pbhagavat...@marvell.com>; > Shahaf > > > > > Shuler <shah...@mellanox.com>; Hemant Agrawal > > > > > <hemant.agra...@nxp.com>; Opher Reviv <op...@mellanox.com>; > Alex > > > > > Rosenbaum <al...@mellanox.com>; dov...@marvell.com; Prasun > Kapoor > > > > > <pkap...@marvell.com>; Nipun Gupta <nipun.gu...@nxp.com>; > > > Richardson, > > > > > Bruce <bruce.richard...@intel.com>; yang.a.h...@intel.com; > > > > > harry.ch...@intel.com; gu.ji...@zte.com.cn; > > > shanjia...@chinatelecom.cn; > > > > > zhangy....@chinatelecom.cn; lixin...@huachentel.com; > > > wush...@inspur.com; > > > > > yuying...@yxlink.com; fanchengg...@sunyainfo.com; > > > > > davidf...@tencent.com; liuzho...@chinaunicom.cn; > > > > > zhaoyon...@huawei.com; o...@yunify.com; j...@netgate.com; > > > > > hongjun...@intel.com; j.bromh...@titan-ic.com; d...@ntop.org; > > > > > f...@napatech.com; arthur...@lionic.com; Thomas Monjalon > > > > > <tho...@monjalon.net> > > > > > Subject: Re: [dpdk-dev] [PATCH v3] regexdev: introduce regexdev > subsystem > > > > > > > > > > > > 4) app/test/test_regexdev.c like app/test/test_eventdev.c > > > > > > > > > > > > We started to create a super basic app, after the API will be > > > > > > finalized > and > > > we > > > > > will have HW > > > > > > we can push it. (if you need it faster than feel free) > > > > > > > > > > A simple Unit test case needs to be present for the APIs. On the > > > > > course of developing common code, > > > > > it can be developed to test the common code with dummy/skeleton > driver. > > > > > > > > > > > > > Agree this is what we are currently have. > > > > > > > > > > > > > > > > > 5) Need a maintainer for maintaining the regex subsystem > > > > > > > > > > > > > We wish to maintain it if you agree. > > > > > > > > > > Yes. Please. > > > > > > > > > > > > > Great. > > > > > > > > > > > > > > > > > > > > One more thing, regarding the ops structure, I think it is > > > > > > > > better to > split > > > it > > > > > to 2 > > > > > > > different > > > > > > > > structures one enque and one for dequeue, since there are no > > > > > > > > real > > > shared > > > > > > > data and we will > > > > > > > > be able to save memory, what do you think? > > > > > > > > > > > > > > Ops are allocated from mempool so it will be overhead to manage > both. > > > > > > > moreover, some > > > > > > > of the fields added in req can be used for resp as info. cryptodev > > > > > > > follows the similar concept, > > > > > > > I think, we can have symmetry with cryptodev wherever is possible > to > > > avoid > > > > > > > end-user to learn new API models. > > > > > > > > > > > > True that there will be overhead with 2 mempools (small one) > > > > > > but lets assume 255 results. This means that the buffer should be > > > > > > 255 > * > > > > > sizeof(rte_regex_match) = 2K > > > > > > also this will enable us to replace groupX with group[] which will > > > > > > allow > > > even > > > > > more groups. > > > > > > In addition don't think that crypto is a good example. > > > > > > The main difference is that in RegEx the output is different format > then > > > the > > > > > input. > > > > > > > > > > # IMO, Some of the fields may be useful for a response as well. I > > > > > think application may be interested in following > > > > > req filed in the response. > > > > > a) buf_addr > > > > > > > > I don't see how this can be used in the response. since if working in > > > > out of > > > order result. > > > > you don’t know which result will be returned. > > > > I also think it is error prone to use the same op for the enqueue and > dequeue. > > > > > > > > > b) scan_size > > > > > > > > Please see above. > > > > > > > > > c) user_id (This would be main one) > > > > > > > > Agree > > > > > > > > > > > > > > # Having two mempools adds overhead per lcore L1 cache usage and > extra > > > > > complexity to the application. > > > > > > > > > > # IMO, From a performance perspective, one mempool is good due to > less > > > > > stress on the cache and it is costly to > > > > > add new mempool for HW mempool implementations. > > > > > > > > > > # I think, group[] use case we can add it when it required by > > > > > introducing "matches_start_offset" field, which will > > > > > tell the req, where is the end of group[] and where "matches" start > > > > > with single mempool scheme also. > > > > > > > > > > # I think, one of the other use case for "matches_start_offset" that, > > > > > It may possible to put vendor-specific > > > > > opaque data. It will be filled by driver on response. The application > > > > > can reference the matches as > > > > > > > > > > struct rte_regex_match *matches = RTE_PTR_ADD(ops, ops- > > > > > > > > > >matches_start_offset); > > > > > > > > > > > > > O.K for now we will keep it as is, and we will see what will be in the > future. > > > > > > > > > > > > > > > > > I assume you will send the v4 with these comments. I think, with > > > > > > > v4 > we > > > > > > > can start implementing common library code. > > > > > > > > > > > > Just need to agree on the split (one more iteration ) > > > > > > and I will start working on the common code. > > > > > > > > > > Ack. > > > > > > > > Great, > > > > I'm starting to work on V4 with all comments so the RFC will be acked > > > > and > > > then will start > > > > coding the rest of the common code. > > > >