Hi Ian,

> -----Original Message-----
> From: Ivan Malov <ivan.ma...@oktetlabs.ru>
> Subject: RE: Understanding Flow API action RSS
> 
> Hi Ori,
> 
> Many-many thanks for your commentary.
> 
> The nature of 'queue' array in flow action RSS is clear now.
> I hope PMD vendors and API users share this vision, too.
> Propably, this should be properly documented.
> We'll see what we cad do in that direction.
> 
> Please see one more question below.
> 
> On Mon, 10 Jan 2022, Ori Kam wrote:
> 
> > Hi Ivan,
> >
> >> -----Original Message-----
> >> From: Ivan Malov <ivan.ma...@oktetlabs.ru>
> >> Sent: Sunday, January 9, 2022 3:03 PM
> >> Subject: RE: Understanding Flow API action RSS
> >>
> >> Hi Ori,
> >>
> >> On Sun, 9 Jan 2022, Ori Kam wrote:
> >>
> >>> Hi Stephen and Ivan
> >>>
> >>>> -----Original Message-----
> >>>> From: Stephen Hemminger <step...@networkplumber.org>
> >>>> Sent: Tuesday, January 4, 2022 11:56 PM
> >>>> Subject: Re: Understanding Flow API action RSS
> >>>>
> >>>> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK)
> >>>> Ivan Malov <ivan.ma...@oktetlabs.ru> wrote:
> >>>>
> >>>>> Hi Stephen,
> >>>>>
> >>>>> On Tue, 4 Jan 2022, Stephen Hemminger wrote:
> >>>>>
> >>>>>> On Tue, 04 Jan 2022 13:41:55 +0100
> >>>>>> Thomas Monjalon <tho...@monjalon.net> wrote:
> >>>>>>
> >>>>>>> +Cc Ori Kam, rte_flow maintainer
> >>>>>>>
> >>>>>>> 29/12/2021 15:34, Ivan Malov:
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 
> >>>>>>>> 'queue' is
> >>>>>>>> to provide "Queue indices to use". But it is unclear whether the 
> >>>>>>>> order of
> >>>>>>>> elements is meaningful or not. Does that matter? Can queue indices 
> >>>>>>>> repeat?
> >>>>>>
> >>>>>> The order probably doesn't matter, it is like the RSS indirection 
> >>>>>> table.
> >>>>>
> >>>>> Sorry, but RSS indirection table (RETA) assumes some structure. In it,
> >>>>> queue indices can repeat, and the order is meaningful. In DPDK, RETA
> >>>>> may comprise multiple "groups", each one comprising 64 entries.
> >>>>>
> >>>>> This 'queue' array in flow action RSS does not stick with the same
> >>>>> terminology, it does not reuse the definition of RETA "group", etc.
> >>>>> Just "queue indices to use". No definition of order, no structure.
> >>>>>
> >>>>> The API contract is not clear. Neither to users, nor to PMDs.
> >>>>>
> >>>> From API in RSS the queues are simply the queue ID, order doesn't matter,
> >>> Duplicating the queue may affect the the spread based on the HW/PMD.
> >>> In common case each queue should appear only once and the PMD may 
> >>> duplicate
> >>> entries to get the best performance.
> >>
> >> Look. In a DPDK PMD, one has "global" RSS table. Consider the following
> >> example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue
> >> indices may repeat. They may have different order: 1, 1, 0, 0, ... .
> >> The order is of great importance. If you send a packet to a
> >> DPDK-powered server, you can know in advance its hash value.
> >> Hence, you may strictly predict which RSS table entry this
> >> hash will point at. That predicts the target Rx queue.
> >>
> >> So the questions which one should attempt to clarify, are as follows:
> >> 1) Is the 'queue' array ordered? (Does the order of elements matter?)
> >> 2) Can its elements repeat? (*allowed* or *not allowed*?)
> >>
> >> From API point of view the array is ordered, and may have repeating 
> >> elements.
> >
> >>>
> >>>>>>
> >>>>>>    rx queue = RSS_indirection_table[ RSS_hash_value % 
> >>>>>> RSS_indirection_table_size ]
> >>>>>>
> >>>>>> So you could play with multiple queues matching same hash value, but 
> >>>>>> that
> >>>>>> would be uncommon.
> >>>>>>
> >>>>>>>> An ethdev may have "global" RSS setting with an indirection table of 
> >>>>>>>> some
> >>>>>>>> fixed size (say, 512). In what comes to flow rules, does that size 
> >>>>>>>> matter?
> >>>>>>
> >>>>>> Global RSS is only used if the incoming packet does not match any 
> >>>>>> rte_flow
> >>>>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or 
> >>>>>> RTE_FLOW_ACTION_TYPE_RSS
> >>>>>> these take precedence.
> >>>>>
> >>>>> Yes, I know all of that. The question is how does the PMD select RETA 
> >>>>> size
> >>>>> for this action? Can it select an arbitrary value? Or should it stick 
> >>>>> with
> >>>>> the "global" one (eg. 512)? How does the user know the table size?
> >>>>>
> >>>>> If the user simply wants to spread traffic across the given queues,
> >>>>> the effective table size is a don't care to them, and the existing
> >>>>> API contract is fine. But if the user expects that certain packets
> >>>>> hit some precise queues, they need to know the table size for that.
> >>>>>
> >>> Just like you said RSS simply spread the traffic to the given queues.
> >>
> >> Yes, to the given queues. The question is whether the 'queue' array
> >> has RETA properties (order matters; elements can repeat) or not.
> >>
> >
> > Yes order matters and elements can repeat.
> >
> >>> If application wants to send traffic to some queue it should use the 
> >>> queue action.
> >>
> >> Yes, but that's not what I mean. Consider the following example. The user
> >> generates packets with random IP addresses at machine A. These packets
> >> hit DPDK at machine B. For a given *packet*, the sender (A) can
> >> compute its RSS hash in software. This will point out the RETA
> >> entry index. But, in order to predict the exact *queue* index,
> >> the sender has to know the table (its contents, its size).
> >>
> > Why do application need this info?
> >
> >> For a "global" DPDK RSS setting, the table can be easily obtained with
> >> an ethdev callback / API. Very simple. Fixed-size table, and it can
> >> be queried. But how does one obtain similar knowledge for RSS action?
> >>
> > The RSS action was designed to allow balanced traffic spread.
> > The size of the reta is PMD dependent, in some PMD the size will be
> > the number of queues in others it will be the number of queues but in
> > power of 2, so if the app requested 8 queues the reta will also be 8.
> > In any case PMD should use the given order, if the PMD needs to expend
> > it should cycle on the application requested queues in the order they were 
> > given.
> >
> >
> >>>
> >>>>> So, the question is whether the users should or should not build
> >>>>> any expectations of the effective table size and, if they should,
> >>>>> are they supposed to use the "global" table size for that?
> >>>>
> >>>> You are right this area is completely undocumented. Personally would 
> >>>> really like
> >>>> it if rte_flow had a reference software implementation and all the HW 
> >>>> vendors
> >>>> had to make sure their HW matched the SW reference version. But this a 
> >>>> case
> >>>> where the funding is all on the HW side, and no one has time or resources
> >>>> to do a complete SW version..
> >>>>
> >>>> A sane implementation would configure RSS indirection as across all
> >>>> rx queues that were available when the device was started; ie all queues
> >>>> that did not have deferred start set. Then the application would 
> >>>> start/stop
> >>>> queues and use rte_flow to reach them.
> >>>>
> >>>> But it doesn't appear the HW follows that model.
> >>>>
> >>>>
> >>>>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, 
> >>>>>>>> does
> >>>>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash 
> >>>>>>>> algorithm?
> 
> What do you think about the above question? In my opinion, DEFAULT should
> let the PMD select whatever hash function / algorithm it may want to
> select. Just some vendor-specific optimal choice.
> 
> If the user wants exactly Toeplitz / "standard RSS hash" behaviour,
> they can always specify enum TOEPLITZ. And the PMD must either
> comply or reject. What do you think? Are we on the same page?
> 

Fully agree with you.
The same goes if the user doesn't supply the key, PMD should select some 
default value.

> >>>>>>
> >>>>>> No the default is always Toeplitz.  This goes back to the original 
> >>>>>> definition
> >>>>>> of RSS which is in Microsoft NDIS and uses Toeplitz.
> >>>>>
> >>>>> Then why have a dedicated enum named TOEPLITZ? Also, once again, the
> >>>>> documentation should be more specific to say which algorithm exactly
> >>>>> this DEFAULT choice provides. Otherwise, it is very vague.
> >>>>>
> >>>>>>
> >>>>>> DPDK should have more examples of using rte_flow, I have some samples
> >>>>>> but they aren't that useful.
> >>>>>>
> >>>>>
> >>>>> I could not agree more.
> >>>
> >>> Feel free to add/suggest what example are missing.
> >>>
> >>>>>
> >>>>> Thanks,
> >>>>> Ivan M.
> >>>
> >>> Best,
> >>> Ori
> >>>
> > Best,
> > Ori
> >
> 
> Best regards,
> Ivan M.

Reply via email to