On 5/23/2017 2:30 PM, Adrien Mazarguil wrote: > On Tue, May 23, 2017 at 01:58:44PM +0100, Ferruh Yigit wrote: >> On 5/23/2017 1:26 PM, Adrien Mazarguil wrote: >>> On Mon, May 22, 2017 at 02:53:28PM +0100, Ferruh Yigit wrote: >>>> On 5/19/2017 5:30 PM, Iremonger, Bernard wrote: >>>>> Hi Ferruh, >>>>> >>>>>> -----Original Message----- >>>>>> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ferruh Yigit >>>>>> Sent: Thursday, May 18, 2017 7:12 PM >>>>>> To: dev@dpdk.org >>>>>> Cc: Yigit, Ferruh <ferruh.yi...@intel.com>; Mcnamara, John >>>>>> <john.mcnam...@intel.com>; Tahhan, Maryam >>>>>> <maryam.tah...@intel.com> >>>>>> Subject: [dpdk-dev] [RFC v2] Flow classification library >>>>>> >>>>>> DPDK works with packets, but some network administration tools works >>>>>> based on flow information. >>>>>> >>>>>> This library is suggested to provide helper API to convert packet based >>>>>> information to the flow records. >>>>>> >>>>>> Basically the library consist of a single API that gets packets, flow >>>>>> definition >>>>>> and action as parameter and provides flow stats based on action. >>>>>> Application >>>>>> should call the API for all received packets. >>>>>> >>>>>> Library header file has more comments on how library works and provided >>>>>> APIs. >>>>>> >>>>>> Packets to flow conversion will cause performance drop, that is why >>>>>> conversion done on demand by an API call provided by this library. >>>>>> >>>>>> Initial implementation in mind is to provide support for IPFIX metering >>>>>> process but library planned to be as generic as possible. And flow >>>>>> information >>>>>> provided by this library is missing to implement full IPFIX features, >>>>>> but this is >>>>>> planned to be initial step. >>>>>> >>>>>> Flows are defined using rte_flow, also measurements (actions) are >>>>>> provided >>>>>> by rte_flow. To support more IPFIX measurements, the implementation may >>>>>> require extending rte_flow addition to implementing this library. >>>>> >>>>> Do you know what extensions are needed to the rte_flow code? >>>> >>>> The extension may be required on two fields: >>>> 1- Defining the flow >>>> 2- Available actions >>>> >>>> For defining the flow, an update may not be required, specially at first >>>> version of the library. >>>> >>>> But for action, there may be some updates. >>>> >>>> IPFIX RFC defines Metering process as [1], (in [2]). This library should >>>> provide helper APIs to metering process. >>>> >>>> Currently only action can be used in rte_flow is COUNT, more actions can >>>> be added to help "packet header capturing, timestamping, sampling, >>>> classifying" tasks of the metering process. >>>> >>>> The exact list depends on the what will be implemented in this release. >>>> >>>> >>>> [1] >>>> Metering Process >>>> >>>> The Metering Process generates Flow Records. Inputs to the >>>> process are packet headers, characteristics, and Packet Treatment >>>> observed at one or more Observation Points. >>>> >>>> The Metering Process consists of a set of functions that includes >>>> packet header capturing, timestamping, sampling, classifying, and >>>> maintaining Flow Records. >>>> >>>> The maintenance of Flow Records may include creating new records, >>>> updating existing ones, computing Flow statistics, deriving >>>> further Flow properties, detecting Flow expiration, passing Flow >>>> Records to the Exporting Process, and deleting Flow Records. >>>> >>>> [2] >>>> https://tools.ietf.org/html/rfc7011 >>> >>> Since I did not take this into account in my previous answer [3], I now >>> understand several of these requirements cannot be met by hardware (at least >>> in the near future). Therefore I think it makes sense to leave IPFIX and >>> more generally the maintenance of software data associated with flows to >>> separate libraries, instead of adding many new rte_flow actions that can >>> only be implemented in software. >>> >>> A hybrid solution as described in [3] is still needed regardless to offload >>> flow recognition, so that only flows of interest are reprocessed in software >>> to compute IPFIX and other data. >>> >>> You suggested at one point to take flow rules in addition to mbufs as input >>> to handle that. Well, that's actually a nice approach. >>> >>> For this to work, rte_flow_classify would have to use opaque handles like >>> rte_flow, provided back by the application when attempting to classify >>> traffic. If the handle is not known (e.g. MARK is unsupported), a separate >>> API function could take a mbuf as input and spit the related >>> rte_flow_classify object if any. >>> >>> To be clear: >>> >>> 1. Create classifier object: >>> >>> classify = rte_flow_classify_create([some rte_flow pattern], >>> [classify-specific actions list, associated resources]); >>> >>> 2. Create some flow rule with a MARK action to identify it uniquely. This >>> step might fail and flow can be NULL, that's not an issue: >>> >>> flow = rte_flow_create([the same pattern], MARK with id 42) >>> >>> 3. For each received packet: >>> >>> /* >>> * Attempt HW and fall back on SW for flow identification in order to >>> * update classifier flow-related data. >>> */ >>> if (flow) { >>> if (mbuf->ol_flags & PKT_RX_FDIR_ID && mbuf->hash.fdir.hi == 42) >>> tmp_classify = classify; >>> } else { >>> tmp_classify = rte_flow_classify_lookup([classifier candidates], >>> mbuf); >>> } >>> if (tmp_classify) >>> rte_flow_classify_update(tmp_classify, mbuf); >>> >>> 4. At some point, retrieve computed data from the classifier object itself: >>> >>> rte_flow_classify_stats_get(classify, [output buffer(s)]) >>> >>> On the RX path, the MARK action can be enough to implement the above. When >>> not supported, it could also be emulated through the "sw_fallback" bit >>> described in [3] however if the above approach is fine, no need for that. >>> >>> It's a bit more complicated to benefit from rte_flow on the TX path since no >>> MARK data can be returned. There is currently no other solution than doing >>> it all in software anyway. >>> >>> [3] http://dpdk.org/ml/archives/dev/2017-May/066177.html >>> >> >> Using MARK action is good idea indeed, but I believe the software >> version needs to be implemented anyway, so the MARK action can be next >> step optimization. > > Does it mean you are you fine with the separation of RFCv2's > rte_flow_classify_stats_get() into a sort of rte_flow_classify_lookup() and > rte_flow_classify_update() as described?
No indeed, I was planning to postpone that step and keep continue with single rte_flow_classify_stats_get() > > There is otherwise no change needed to benefit from MARK optimization > beside providing the ability to not perform the lookup step when not > necessary. > >> And another thing is, having 3. and 4. as separate steps means >> rte_flow_classify to maintain the flow_records in the library for an >> unknown time. > > I think it should be an application's problem. The library does not > necessarily allocate anything since the related temporary storage can be > provided as arguments to rte_flow_classify_create(). > > Expiration could be handled by making rte_flow_classify_lookup() return some > error code when the flow has expired. Application would then be free to call > rte_flow_classify_destroy() directly or retrieve stats one last time through > rte_flow_classify_stats_get() before that. > >> Application needs to explicitly do the call for 3., why not return the >> output buffers immediately so the application do the required >> association of data to the ports. > > 3. is typically done in the data path and must be as fast as possible, > right? The way I see it, in many cases if the flow is already identified > (classifier object provided) the mbuf does not even need to be parsed for > associated counters to be incremented. > > Also because rte_flow_classify_stats_get() could perform extra computation > to convert internal data to an application-friendly format. > >> Mainly, I would like to keep same API logic in RFC v2, but agree to add >> two new APIs: >> rte_flow_classify_create() >> rte_flow_classify_destroy() >> >> Create will return and opaque rte_flow object (as done in rte_flow), >> meanwhile it can create some masks in rte_flow object to help parsing mbuf. > > OK, I suggest naming that object "struct rte_flow_classify" to avoid > confusion though. OK, makes sense. > >> And rte_flow_classify_stats_get() will get rte_flow object and stats as >> parameter. Also instead of getting an array of filters, to simplify the >> logic, it will get a rte_flow object at a time. > > Perfect. > >> I am planning to send an updated RFC (v3) as above, if there is no major >> objection, we can continue to discuss on that RFC. > > Except for my above comments, looks like we're converging. >