Hi All, Here's an optimization idea for the datapath classifier table. I'd like to get some feedback.
I used the DPDK ACL tables. They can perform a wildcarded matching and each lookup requires less CPU cycles than the Classifier. Anyway there's a negative aspect with ACLs. They take a very long time to insert a new Rule. It can be 50 times greater than an insertion into the Classifier. See Note below for further details. So a simple 1:1 replacement of the Classifier with an ACL table is not a viable solution. The idea described below is instead to replace the Classifier with 2 ACL tables. One is the 'Operating', while the other is a 'Shadow' table. Any lookup will be performed on the Operating table. Instead any new insertion will be executed on the Shadow table by means of a separate thread. After the insertion is done, the 2 tables will be swapped. Thus the Shadow table will now become the Operating one, and viceversa. Is the following ok with real use cases? ======================================== An Assumption was made: new sets of Rules arrive with a frequency lower than 1 (Rule Sets)/sec. Would this be ok with real use cases? Performance Figures =================== The table below refers to a mono-directional test where the performance is compared between the 2 implementations. Some Flows were installed so that the Classifier was using 7 SubTables. The ACL Rule format was {Protocol, IPdest, MACsrc, UdpPortDest, ToS, VlanTci}. The performance figures are expressed in Mpps. +------------+------------+ | Classifier | 2 ACLs | +----------------+------------+------------+ | Max Throughput | 2.2 | 5.4 | | [Mpps] | | | +----------------+------------+------------+ Conclusions =========== At this stage it would really be helpful to have an initial feedback from the Community. Any comment or suggestion will be useful to drive further developments. References ========== DPDK ACL Rules, how to: http://dpdk.org/doc/guides/prog_guide/packet_classif_access_ctrl.html Notes ===== When an ACL table contains about 2000 Rules with a structure like {Protocol, IPsource, IPdest, PortSource, PortDest} a new insertion costs about 69000 CPUcycles/Rule. Instead under similar operating conditions the Classifier would require about 1300 CPUcycles/Rule. Thanks, Antonio _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev