Hi Neale,
   I have completed the flow cache implementation for SPD lookup in IPv4/IPSec 
outbound direction.  Performance numbers are good with null encryption on 
single core/64B @ MRR. Please provide your comments.
https://gerrit.fd.io/r/c/vpp/+/31694

  Summary of flow cache implementation:

  1.  Based on Bihash without collision handling. This will avoid the overhead 
to recycle/age out old entries in flow cache. Whenever collision occurs, the 
old entry will be overwritten by new entry in data plane.
  2.  Size of flow cache is fixed. Currently set to handle 1 million flows. 
This can be made configurable as a next step.
  3.  Whenever an SPD rule is added/deleted from the control plane, flow cache 
entries are flushed from control plane. Before flushing, the data plane is put 
in fall back mode to bypass flow cache and do linear lookup. Flushing is done 
only after the inflight packets are sent out from all the worker cores.

Thanks
Govind

From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Govindarajan 
Mohandoss via lists.fd.io
Sent: Wednesday, March 3, 2021 7:55 PM
To: Neale Ranns <ne...@graphiant.com>; vpp-dev <vpp-dev@lists.fd.io>
Cc: nd <n...@arm.com>; nd <n...@arm.com>
Subject: Re: [vpp-dev] IPSec proposal to improve "ipsec4-output-feature" node 
performance

Hi Neale,
  Thank you for your comments. I know you would have thought about it already. 
I can work with you to implement the right solution to improve performance.
  Please see my response inline.

Thanks
Govind

From: Neale Ranns <ne...@graphiant.com<mailto:ne...@graphiant.com>>
Sent: Wednesday, March 3, 2021 8:45 AM
To: Govindarajan Mohandoss 
<govindarajan.mohand...@arm.com<mailto:govindarajan.mohand...@arm.com>>; 
vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Cc: nd <n...@arm.com<mailto:n...@arm.com>>
Subject: Re: [vpp-dev] IPSec proposal to improve "ipsec4-output-feature" node 
performance


Hi Govind,

Flow caches always perform well, but they are more difficult to use than they 
first appear. Consider asking yourself these questions:
1 - how many entries can the cache contain?
>> This can be made configurable as per the system need. By default, we can 
>> allocate the hash table size to hold 10K entries.

2 - what do you do when the cache is full? How do you age or recycle old flows?
>> If the flow cache is implemented using a hash table without collision 
>> handling, then age out mechanism is not needed. Whenever a collision occurs,
old entry can be overwritten with new entry. Worst case will be 255 overwrites, 
if all the 256 packets per batch result in same hash value.

3 - how do you flush the cache when the policy set changes?
>> Whenever an SPD rule is deleted, the flow cache will be flushed completely 
>> in the control plane. An IPSec module level flag will be introduced and set 
>> by the
control plane to put the data plane in fall back mode to use linear search. 
This flag will be reset once the control plane flush the flow cache and delete 
the
SPD rule from SPD table. Also, data plane will not add new entry into the flow 
cache during SPD rule deletion.
I have added this logic in my prototype. Please find the changes attached.

In general, what is the rate at which an SPD rule will be deleted by the 
application ? If the deletion rate is low, then we can take the hit of flushing 
the flow cache in control plane.

I had considered in the past changing an SPD definition to use IP subnets 
(rather than IP ranges) and then re-use the tuple-sort/merge algorithm used by 
ACLs. This approach would not need you to answer the awkward questions about a 
cache and it would break the linear dependent lookup (it has other 
dependencies, but they are much better). Two reasons I didn't do this 1) no 
time 2) ipsec is a vnet component and ACL is a plugin, a vnet -> plugin 
dependency is a no-no. If you're lucky some-one might volunteer to make IPsec a 
plugin and this will go away...
>> Please correct my understanding.
In this method, the mask have to be created for every SPD rule and stored in an 
array. On every packet arrival, the mask will be picked up in linear fashion 
and hash will
be computed based on mask and packet header fields. Then bihash will be looked 
up with that hash value. This reduces the overhead of comparing the ranges 
during
linear search. But the mask lookup is still linear. I agree that there will be 
a performance improvement because the range comparison is avoided for every SPD 
entry.
Is there a way to implement it without creating IPSec plugin and without 
depending on ACL plugin ?


/neale


From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> on behalf of Govindarajan 
Mohandoss via lists.fd.io 
<Govindarajan.mohandoss=arm....@lists.fd.io<mailto:Govindarajan.mohandoss=arm....@lists.fd.io>>
Date: Wednesday, 3 March 2021 at 06:57
To: vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Cc: nd <n...@arm.com<mailto:n...@arm.com>>
Subject: [vpp-dev] IPSec proposal to improve "ipsec4-output-feature" node 
performance
Hi Neale,
   I am working on optimizing "ipsec4-output-feature" node on ARM based 
systems. Towards that, I saw an opportunity to supplement SPD table lookup 
(linear search) with Bihash based flow cache.
   This approach is similar to ACL plugin stateful mode implementation. This 
approach will consume extra memory for Bihash and provide O(1) performance for 
SPD rules added at different indices.
   I did a very basic prototype and got good results. Please find the prototype 
patch attached.
   Before I start the actual implementation, I would like to get your feedback. 
It will be great if you can give your comments.

    Following is the idea at high level. Flow cache will be augmented with 
existing linear search based SPD table lookup.

    Enhanced SPD Table lookup logic:
    ---------------------------------------------
    One every packet arrival, following lookup will be done in 
"ipsec4-output-feature" node:
   1. found = Lookup <5 tuple: Bihash based flow cache>
   2. if (!found) {
         found = Lookup <5 tuple: Linear search>
              if (found) {
                Add an entry into <5 tuple: Bihash based flow cache>
              }
       }

    Linear search will happen only for 1st packet in a flow and from 2nd packet 
onwards, match will succeed in bihash table.
    I did a basic prototype and got O(1) performance as expected, when IPv4 5 
tuple rule is added at different indices <1, 10, 100, 1000> in SPD table.

    Following are the per core performance numbers with IPSec NULL encryption 
configuration in ESP Tunnel mode, in ARM CPU based system @MRR with 64B packets:

    Baseline based on existing linear search
    ================================
    SPD index     Performance
    -----------------------------------
    1st match         5.2  MPPS
    10th match      4.51 MPPS
    100th match    2.05 MPPS
    1000th match  266  KPPS

   With Bihash based flow cache (Basic prototype results):
   ==============================================
   SPD index     Performance
   -----------------------------------
   1st match        4.88 MPPS
   10th match     4.88 MPPS
   100th match   4.88 MPPS
   1000th match 4.88 MPPS

   As you can see, we are getting constant performance numbers even when rules 
are added at different indices.
   If you are fine with this approach, I would like to proceed with actual 
implementation.

   I am making an assumption that SPD table will not be populated frequently by 
the application. Please correct me if I am wrong.
   Whenever application add/delete/modify an entry in SPD table, flow cache 
will be purged in the data plane through an interface level flag. I will work 
on this case and send another update.

Thanks
Govind




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18991): https://lists.fd.io/g/vpp-dev/message/18991
Mute This Topic: https://lists.fd.io/mt/81046304/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to