Hi All,
Here's an optimization idea for the datapath classifier table.
I'd like to get some feedback.
I used the DPDK ACL tables. They can perform a wildcarded matching and each
lookup requires less CPU cycles than the Classifier.
Anyway there's a negative aspect with ACLs. They take a very long time to
insert a new Rule.
It can be 50 times greater than an insertion into the Classifier. See Note
below
for further details.
So a simple 1:1 replacement of the Classifier with an ACL table is not a viable
solution.
The idea described below is instead to replace the Classifier with 2 ACL
tables. One is the 'Operating', while the other is a 'Shadow' table.
Any lookup will be performed on the Operating table.
Instead any new insertion will be executed on the Shadow table by means of a
separate thread.
After the insertion is done, the 2 tables will be swapped.
Thus the Shadow table will now become the Operating one, and viceversa.
Is the following ok with real use cases?
========================================
An Assumption was made: new sets of Rules arrive with a frequency lower
than 1 (Rule Sets)/sec.
Would this be ok with real use cases?
Performance Figures
===================
The table below refers to a mono-directional test where the performance is
compared between the 2 implementations.
Some Flows were installed so that the Classifier was using 7 SubTables.
The ACL Rule format was {Protocol, IPdest, MACsrc, UdpPortDest, ToS, VlanTci}.
The performance figures are expressed in Mpps.
+------------+------------+
| Classifier | 2 ACLs |
+----------------+------------+------------+
| Max Throughput | 2.2 | 5.4 |
| [Mpps] | | |
+----------------+------------+------------+
Conclusions
===========
At this stage it would really be helpful to have an initial feedback from the
Community. Any comment or suggestion will be useful to drive further
developments.
References
==========
DPDK ACL Rules, how to:
http://dpdk.org/doc/guides/prog_guide/packet_classif_access_ctrl.html
Notes
=====
When an ACL table contains about 2000 Rules with a structure like
{Protocol, IPsource, IPdest, PortSource, PortDest}
a new insertion costs about 69000 CPUcycles/Rule.
Instead under similar operating conditions the Classifier would require about
1300 CPUcycles/Rule.
Thanks,
Antonio
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev