Hi Ori,
On Sun, 9 Jan 2022, Ori Kam wrote:
Hi Stephen and Ivan
-----Original Message-----
From: Stephen Hemminger <step...@networkplumber.org>
Sent: Tuesday, January 4, 2022 11:56 PM
Subject: Re: Understanding Flow API action RSS
On Tue, 4 Jan 2022 21:29:14 +0300 (MSK)
Ivan Malov <ivan.ma...@oktetlabs.ru> wrote:
Hi Stephen,
On Tue, 4 Jan 2022, Stephen Hemminger wrote:
On Tue, 04 Jan 2022 13:41:55 +0100
Thomas Monjalon <tho...@monjalon.net> wrote:
+Cc Ori Kam, rte_flow maintainer
29/12/2021 15:34, Ivan Malov:
Hi all,
In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is
to provide "Queue indices to use". But it is unclear whether the order of
elements is meaningful or not. Does that matter? Can queue indices repeat?
The order probably doesn't matter, it is like the RSS indirection table.
Sorry, but RSS indirection table (RETA) assumes some structure. In it,
queue indices can repeat, and the order is meaningful. In DPDK, RETA
may comprise multiple "groups", each one comprising 64 entries.
This 'queue' array in flow action RSS does not stick with the same
terminology, it does not reuse the definition of RETA "group", etc.
Just "queue indices to use". No definition of order, no structure.
The API contract is not clear. Neither to users, nor to PMDs.
From API in RSS the queues are simply the queue ID, order doesn't matter,
Duplicating the queue may affect the the spread based on the HW/PMD.
In common case each queue should appear only once and the PMD may duplicate
entries to get the best performance.
Look. In a DPDK PMD, one has "global" RSS table. Consider the following
example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue
indices may repeat. They may have different order: 1, 1, 0, 0, ... .
The order is of great importance. If you send a packet to a
DPDK-powered server, you can know in advance its hash value.
Hence, you may strictly predict which RSS table entry this
hash will point at. That predicts the target Rx queue.
So the questions which one should attempt to clarify, are as follows:
1) Is the 'queue' array ordered? (Does the order of elements matter?)
2) Can its elements repeat? (*allowed* or *not allowed*?)
rx queue = RSS_indirection_table[ RSS_hash_value %
RSS_indirection_table_size ]
So you could play with multiple queues matching same hash value, but that
would be uncommon.
An ethdev may have "global" RSS setting with an indirection table of some
fixed size (say, 512). In what comes to flow rules, does that size matter?
Global RSS is only used if the incoming packet does not match any rte_flow
action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS
these take precedence.
Yes, I know all of that. The question is how does the PMD select RETA size
for this action? Can it select an arbitrary value? Or should it stick with
the "global" one (eg. 512)? How does the user know the table size?
If the user simply wants to spread traffic across the given queues,
the effective table size is a don't care to them, and the existing
API contract is fine. But if the user expects that certain packets
hit some precise queues, they need to know the table size for that.
Just like you said RSS simply spread the traffic to the given queues.
Yes, to the given queues. The question is whether the 'queue' array
has RETA properties (order matters; elements can repeat) or not.
If application wants to send traffic to some queue it should use the queue
action.
Yes, but that's not what I mean. Consider the following example. The user
generates packets with random IP addresses at machine A. These packets
hit DPDK at machine B. For a given *packet*, the sender (A) can
compute its RSS hash in software. This will point out the RETA
entry index. But, in order to predict the exact *queue* index,
the sender has to know the table (its contents, its size).
For a "global" DPDK RSS setting, the table can be easily obtained with
an ethdev callback / API. Very simple. Fixed-size table, and it can
be queried. But how does one obtain similar knowledge for RSS action?
So, the question is whether the users should or should not build
any expectations of the effective table size and, if they should,
are they supposed to use the "global" table size for that?
You are right this area is completely undocumented. Personally would really like
it if rte_flow had a reference software implementation and all the HW vendors
had to make sure their HW matched the SW reference version. But this a case
where the funding is all on the HW side, and no one has time or resources
to do a complete SW version..
A sane implementation would configure RSS indirection as across all
rx queues that were available when the device was started; ie all queues
that did not have deferred start set. Then the application would start/stop
queues and use rte_flow to reach them.
But it doesn't appear the HW follows that model.
When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does
that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm?
No the default is always Toeplitz. This goes back to the original definition
of RSS which is in Microsoft NDIS and uses Toeplitz.
Then why have a dedicated enum named TOEPLITZ? Also, once again, the
documentation should be more specific to say which algorithm exactly
this DEFAULT choice provides. Otherwise, it is very vague.
DPDK should have more examples of using rte_flow, I have some samples
but they aren't that useful.
I could not agree more.
Feel free to add/suggest what example are missing.
Thanks,
Ivan M.
Best,
Ori