Hi Neale,
   I am working on optimizing "ipsec4-output-feature" node on ARM based 
systems. Towards that, I saw an opportunity to supplement SPD table lookup 
(linear search) with Bihash based flow cache.
   This approach is similar to ACL plugin stateful mode implementation. This 
approach will consume extra memory for Bihash and provide O(1) performance for 
SPD rules added at different indices.
   I did a very basic prototype and got good results. Please find the prototype 
patch attached.
   Before I start the actual implementation, I would like to get your feedback. 
It will be great if you can give your comments.
  
    Following is the idea at high level. Flow cache will be augmented with 
existing linear search based SPD table lookup.
    Enhanced SPD Table lookup logic:
    ---------------------------------------------
    One every packet arrival, following lookup will be done in 
"ipsec4-output-feature" node:
   1. found = Lookup <5 tuple: Bihash based flow cache> 
   2. if (!found) {
        found = Lookup <5 tuple: Linear search>
              if (found) { 
                Add an entry into <5 tuple: Bihash based flow cache> 
              }
       }

    Linear search will happen only for 1st packet in a flow and from 2nd packet 
onwards, match will succeed in bihash table.
    I did a basic prototype and got O(1) performance as expected, when IPv4 5 
tuple rule is added at different indices <1, 10, 100, 1000> in SPD table.

    Following are the per core performance numbers with IPSec NULL encryption 
configuration in ESP Tunnel mode, in ARM CPU based system @MRR with 64B packets:

    Baseline based on existing linear search
    ================================
    SPD index     Performance
    -----------------------------------
    1st match         5.2  MPPS
    10th match      4.51 MPPS
    100th match    2.05 MPPS
    1000th match  266  KPPS        

   With Bihash based flow cache (Basic prototype results):
   ==============================================
   SPD index     Performance
   -----------------------------------
   1st match        4.88 MPPS
   10th match     4.88 MPPS
   100th match   4.88 MPPS
   1000th match 4.88 MPPS            

   As you can see, we are getting constant performance numbers even when rules 
are added at different indices.
   If you are fine with this approach, I would like to proceed with actual 
implementation. 

   I am making an assumption that SPD table will not be populated frequently by 
the application. Please correct me if I am wrong.
   Whenever application add/delete/modify an entry in SPD table, flow cache 
will be purged in the data plane through an interface level flag. I will work 
on this case and send another update.
       
Thanks
Govind      

            

      

Attachment: spd_with_flow_cache_prototype.diff
Description: spd_with_flow_cache_prototype.diff

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18839): https://lists.fd.io/g/vpp-dev/message/18839
Mute This Topic: https://lists.fd.io/mt/81046304/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to