On 3/26/2026 10:13 AM, Konstantin Ananyev wrote:

Add VRF (Virtual Routing and Forwarding) support to the IPv4
FIB library, allowing multiple independent routing tables
within a single FIB instance.

Introduce max_vrfs and vrf_default_nh fields in rte_fib_conf
to configure the number of VRFs and per-VRF default nexthops.
Thanks Vladimir, allowing multiple VRFs per same LPM table will
definitely be a useful thing to have.
Though, I have the same concern as Maxime:
memory requirements are just overwhelming.
Stupid q - why just not to store a pointer to a vector of next-hops
within the table entry?
Am I understand correctly, a vector with max_number_of_vrfs entries and
use vrf id to address a nexthop?
Yes.
Here I can see 2 problems:

1. tbl entries must be the size of a pointer, so no way to use smaller sizes
Yes, but as we are talking about storing nexthops for multiple VRFs anyway,
I don't think it is a big deal.

2. those vectors will be sparsely populated and, depending on the
runtime configuration, may consume a lot of memory too (as Robin
mentioned they may have 1024 VRFs)
Yeas, each VRF vector can become really sparse and we waste a lot of memory.
If that's an issue, we probably can think about something smarter
then simple flat array indexed by vrf-id: something like 2-level B-tree or so.
The main positives that I see in that approach:
- low extra overhead at lookup  - one/two extra pointer de-refernces.
I'm afraidtheoverheadwillbe comparativelylargejustbecausethecurrentimplementationis fastandmost likely hit with a single memory access. However, for a low number of VRFs, B-tree may be a good solution
- it allows CP to allocate/free space for each such vecto separately,
   so we don't need to pre-allocate memory for max possible entries at startup.

Yes, this may work.
But, if we are going to do an extra memory access, I'd better to
maintain an internal hash table with 5 byte keys {24_bits_from_LPM,
16_bits_vrf_id} to retrieve a nexthop.
Hmm... and what to do with entries in tbl8, I mean what will be the key for
them?
Or you don't plan to put entries from tbl8 to that hash table?
The idea is to have a single LPM struct with a join superset of all
prefixes existing in all VRFs. Each prefix in this LPM struct has its
own unique "nexthop", which is not the final next hop, but an
intermediate metadata defining this unique prefix. Then, the following
search is performed with the key containing this intermediate metadata +
vrf_id in some exact match database like hash table. This approach is
the most memory friendly, since there is only one LPM data struct (which
scales well with number of prefixes it has) with intermediate entries
only 4b long.
On the other hand it requires an extra search, so lookup will be slower.
Also, some current LPM optimizations, like tbl8 collapsing if all tbl8
entries have a similar value, will be gone.
Yes, and yes :)
Yes it would help to save memory, and yes lookup will most likely be slower.
The other thing that I consider as a possible drawback here - with current 
rte_hash
implementation we still need to allocate space for all possible max entries at 
startup.
I don't think this is a big problem, since the size of this memory will be reasonable and will not grow linearly with the number of VRFs. So I agree it is an acceptable trade-off
But that's not new in DPDK, and for most cases it is considered as acceptable 
trade-off.
Overall, it seems like a possible approach to me, I suppose the main question 
is:
what will be the price of that extra hash-lookup here.
And this is the key problem. I don't think rte_hash is well suitable here, at best we need some kind of a perfect hash. I have a few ideas on this, stay tuned :)
Again there is a bulk version of hash lookup and in theory it might be it can be
improved further (avx512 version on x86?).

And we can provide to the user with ability to specify custom
alloc/free function for these vectors.
That would help to avoid allocating huge chunks of memory at startup.
I understand that it will be one extra memory dereference,
but probably it will be not that critical in terms of performance .
Again for bulk function  we might be able to pipeline lookups and
de-references and hide that extra load latency.

Add four new experimental APIs:
- rte_fib_vrf_add() and rte_fib_vrf_delete() to manage routes
     per VRF
- rte_fib_vrf_lookup_bulk() for multi-VRF bulk lookups
- rte_fib_vrf_get_rib() to retrieve a per-VRF RIB handle

Signed-off-by: Vladimir Medvedkin <[email protected]>
---
    lib/fib/dir24_8.c        | 241 ++++++++++++++++------
    lib/fib/dir24_8.h        | 255 ++++++++++++++++--------
    lib/fib/dir24_8_avx512.c | 420 +++++++++++++++++++++++++++++++--------
    lib/fib/dir24_8_avx512.h |  80 +++++++-
    lib/fib/rte_fib.c        | 158 ++++++++++++---
    lib/fib/rte_fib.h        |  94 ++++++++-
    6 files changed, 988 insertions(+), 260 deletions(-)

<snip>

--
Regards,
Vladimir

--
Regards,
Vladimir

--
Regards,
Vladimir

Reply via email to