Hi William, Thanks for your interest on this idea. I'm currently working to provide a patch, once it is ready I'll be glad to share.
In order to compare the throughput between the classifier and the acl I changed the code to bypass the EMC table. So in both cases any packet was hitting the classifier/acl. That's just for my tests, of course. I installed flows like sudo ./utilities/ovs-ofctl add-flow br0 dl_type=0x0800,nw_dst=34.34.34.34,action=output:2 sudo ./utilities/ovs-ofctl add-flow br0 idle_timeout=0,dl_type=0x0800,nw_dst=34.34.34.35,dl_src=01:02:03:04:05:08,action=output:2 sudo ./utilities/ovs-ofctl add-flow br0 dl_type=0x0800,nw_proto=17,udp_dst=64,action=output:2 sudo ./utilities/ovs-ofctl add-flow br0 dl_src=01:02:03:04:05:06,action=output:2 sudo ./utilities/ovs-ofctl add-flow br0 dl_type=0x0800,nw_proto=17,actions=output:2 sudo ./utilities/ovs-ofctl add-flow br0 dl_type=0x0800,nw_tos=2,actions=output:2 sudo ./utilities/ovs-ofctl add-flow br0 vlan_tci=0,actions=output:2 so that the classifier was using 6 Subtables. Antonio From: William Tu [mailto:u9012...@gmail.com] Sent: Wednesday, February 3, 2016 11:52 PM To: Fischetti, Antonio Cc: dev@openvswitch.org Subject: Re: [ovs-dev] Wildcard Matching optimization idea Hi Fischetti, I'm interested in trying your experiment with 2 DPDK ACL table swapping. Do you have a patch for me to start with? Or any comments about how to reproduce your experiment results? (the one comparing classifier and DPDK ACL) > +------------+------------+ > | Classifier | 2 ACLs | > +----------------+------------+------------+ > | Max Throughput | 2.2 | 5.4 | > | [Mpps] | | | > +----------------+------------+------------+ Thank you William On Thu, Jan 14, 2016 at 7:28 AM, Fischetti, Antonio <antonio.fische...@intel.com> wrote: Thanks Wei Li, please see my replies inline. If I missed something or for further detail, just let me know. > -----Original Message----- > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Wei Li > Sent: Thursday, January 14, 2016 6:38 AM > To: dev@openvswitch.org > Subject: Re: [ovs-dev] Wildcard Matching optimization idea > > insert a rule to "mirror" and swap "mirror" and "real" > > then real does not contain the new inserted rule Correct, it needs to be updated as well. > > how to sync? take the "long" time as inserting to "mirror"? > Yes, it will take the same long time. The thread that has previously inserted the rule into what was the Shadow (or "mirror") will repeat the same work on the other table. Below is the sequence with few more details to show the latency for a new rule. 1. Insert the new rule into the background table ---------------------------------------------------------- Please Note: while this is happening the new rule will not affect the lookups as they're always carried out on the foreground Table. Background table updated? Go to Step #2. 2. Swap tables ------------------ The updated table is 'moved' to the foreground so that lookups are now affected by the new rule. Anyway we're not done yet, we updated just one of the 2 tables. In case further new rules need to be added at this stage, they will get buffered in a 'waiting list' to be processed later. Background table updated? Go to Step #3. 3. Done ---------- Both tables are synced up. Is there a new rule to insert and/or anything in the 'waiting list'? Go to Step #1. > 在 2016/1/12 18:37, Fischetti, Antonio 写道: > > Thanks Zoltan for your questions. > > > > Anyone else has any new feedback? > > > >> -----Original Message----- > >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Fischetti, > >> Antonio > >> Sent: Friday, December 18, 2015 3:44 PM > >> To: Zoltan Kiss; dev@openvswitch.org > >> Subject: Re: [ovs-dev] Wildcard Matching optimization idea > >> > >> > >>> -----Original Message----- > >>> From: Zoltan Kiss [mailto:zoltan.k...@linaro.org] > >>> Sent: Friday, December 18, 2015 12:38 PM > >>> To: Fischetti, Antonio; dev@openvswitch.org > >>> Subject: Re: [ovs-dev] Wildcard Matching optimization idea > >>> > >>> > >>> > >>> On 17/12/15 16:23, Fischetti, Antonio wrote: > >>>> Hi Zoltan, thanks for your questions. > >>>> Please find below my answers inline. > >>>> > >>>>> -----Original Message----- > >>>>> From: Zoltan Kiss [mailto:zoltan.k...@linaro.org] > >>>>> Sent: Thursday, December 17, 2015 2:33 PM > >>>>> To: Fischetti, Antonio; dev@openvswitch.org > >>>>> Subject: Re: [ovs-dev] Wildcard Matching optimization idea > >>>>> > >>>>> > >>>>> > >>>>> On 17/12/15 10:41, Fischetti, Antonio wrote: > >>>>>> Hi All, > >>>>>> Here's an optimization idea for the datapath classifier table. > >>>>>> I'd like to get some feedback. > >>>>>> > >>>>>> I used the DPDK ACL tables. They can perform a wildcarded matching > >>> and > >>>>> each > >>>>>> lookup requires less CPU cycles than the Classifier. > >>>>>> Anyway there's a negative aspect with ACLs. They take a very long > time > >>> to > >>>>>> insert a new Rule. > >>>>>> It can be 50 times greater than an insertion into the Classifier. See > >> Note > >>>>> below > >>>>>> for further details. > >>>>>> > >>>>>> So a simple 1:1 replacement of the Classifier with an ACL table is not > a > >>>>> viable > >>>>>> solution. > >>>>>> > >>>>>> The idea described below is instead to replace the Classifier with 2 > ACL > >>>>>> tables. One is the 'Operating', while the other is a 'Shadow' table. > >>>>>> > >>>>>> Any lookup will be performed on the Operating table. > >>>>>> > >>>>>> Instead any new insertion will be executed on the Shadow table by > >>> means > >>>>> of a > >>>>>> separate thread. > >>>>>> After the insertion is done, the 2 tables will be swapped. > >>> Are you swapping after each insertion, or in batches? > >> In batches. > >> > >>> The new shadow table needs to get updated first to be sync with > >> Operating, does it take > >>> a similar amount of time? > >> No, the shadow table acts as a 'mirror' of the operating. So the 2 > >> tables are supposed to contain exactly the same entries. > >> An exception is during the transient insertion procedure. But after it > >> is completed the 2 tables will contain again the same entries. > >> > >>> Instead of having this 2 table, how about have one, and make it possible > >>> that you can look up while an insertion is on place? Something in an RCU > >>> fashion? > >> Unluckly not. While an insertion is taking place it is not possible to > >> access > the > >> ACL > >> to read its entries. > >> More precisely, an ACL insertion means 2 actions: add + rebuild. The > rebuild > >> takes > >> the 95% cpu cycles of all the insertion. > >> You could read while the 'add' is in progress. Instead you can't read while > the > >> 'rebuild' is still happening. > >> That's why I'm using 2 ACL tables. > >> > >>>>> So while this insertion happens, you still look up in the actual > >>>>> Operating table. > >>>> Yes, while insertion is in progress any lookup will still be carried out > >>>> on > >> the > >>>> Operating table. > >>> I don't know how the classifier works exactly, but is the following > >>> scenario possible?: > >>> > >>> Rule A matches a flow and specifies an action. A new insertion would > >>> essentially remove Rule A and add B which matches the same flow but > >>> specifies a different action. While that happens, packets would still > >>> match A, while the expectation probably would be to match B. > >>> > >> The same issue can happen with the Classifier, in this case it will be > >> worse. > >> The solution with 2 ACLs have an insertion latency much longer. > >> That is because an ACL insertion can be about 50 times greater than an > >> insertion into the Classifier. > >> > >>>>> What happens if you have a new insertion in the meantime? > >>>> The new Rule gets buffered into a 'wait' queue. > >>>> > >>>>> Especially, what happens if your lookup yields the same rule > >>>>> which is inserted at the moment? > >>>> That's a good point. At the current stage it is simply added into the > wait > >>> queue. > >>>> So I could potentially have duplications where different rules into the > ACL > >>> are > >>>> referring to the same netdev-flow. > >>>> To avoid these duplications there could be 2 approaches. > >>>> One option would be to check that in the wait queue that rule is not > >>> present. > >>>> Another option would be to store it into the wait queue anyway and > then > >>> check > >>>> that the ACL does not already contain that rule. > >>> > >>> > >>> > >>>>>> Thus the Shadow table will now become the Operating one, and > >>> viceversa. > >>>>>> > >>>>>> Is the following ok with real use cases? > >>>>>> ======================================== > >>>>>> An Assumption was made: new sets of Rules arrive with a frequency > >>> lower > >>>>>> than 1 (Rule Sets)/sec. > >>>>>> Would this be ok with real use cases? > >>>>>> > >>>>>> > >>>>>> Performance Figures > >>>>>> =================== > >>>>>> The table below refers to a mono-directional test where the > >>> performance > >>>>> is > >>>>>> compared between the 2 implementations. > >>>>>> Some Flows were installed so that the Classifier was using 7 > SubTables. > >>>>>> The ACL Rule format was {Protocol, IPdest, MACsrc, UdpPortDest, > ToS, > >>>>> VlanTci}. > >>>>>> The performance figures are expressed in Mpps. > >>>>>> > >>>>>> +------------+------------+ > >>>>>> | Classifier | 2 ACLs | > >>>>>> +----------------+------------+------------+ > >>>>>> | Max Throughput | 2.2 | 5.4 | > >>>>>> | [Mpps] | | | > >>>>>> +----------------+------------+------------+ > >>>>>> > >>>>>> > >>>>>> Conclusions > >>>>>> =========== > >>>>>> At this stage it would really be helpful to have an initial feedback > from > >>> the > >>>>>> Community. Any comment or suggestion will be useful to drive > further > >>>>>> developments. > >>>>>> > >>>>>> > >>>>>> References > >>>>>> ========== > >>>>>> DPDK ACL Rules, how to: > >>>>>> > >>> http://dpdk.org/doc/guides/prog_guide/packet_classif_access_ctrl.html > >>>>>> > >>>>>> Notes > >>>>>> ===== > >>>>>> When an ACL table contains about 2000 Rules with a structure like > >>>>>> {Protocol, IPsource, IPdest, PortSource, PortDest} > >>>>>> a new insertion costs about 69000 CPUcycles/Rule. > >>>>>> Instead under similar operating conditions the Classifier would > require > >>>>> about > >>>>>> 1300 CPUcycles/Rule. > >>>>>> > >>>>>> > >>>>>> Thanks, > >>>>>> Antonio > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> dev mailing list > >>>>>> dev@openvswitch.org > >>>>>> http://openvswitch.org/mailman/listinfo/dev > >>>>>> > >> _______________________________________________ > >> dev mailing list > >> dev@openvswitch.org > >> http://openvswitch.org/mailman/listinfo/dev > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > http://openvswitch.org/mailman/listinfo/dev > > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev