Hi Ian,
> -----Original Message----- > From: Ivan Malov <ivan.ma...@oktetlabs.ru> > Subject: RE: Understanding Flow API action RSS > > Hi Ori, > > Many-many thanks for your commentary. > > The nature of 'queue' array in flow action RSS is clear now. > I hope PMD vendors and API users share this vision, too. > Propably, this should be properly documented. > We'll see what we cad do in that direction. > > Please see one more question below. > > On Mon, 10 Jan 2022, Ori Kam wrote: > > > Hi Ivan, > > > >> -----Original Message----- > >> From: Ivan Malov <ivan.ma...@oktetlabs.ru> > >> Sent: Sunday, January 9, 2022 3:03 PM > >> Subject: RE: Understanding Flow API action RSS > >> > >> Hi Ori, > >> > >> On Sun, 9 Jan 2022, Ori Kam wrote: > >> > >>> Hi Stephen and Ivan > >>> > >>>> -----Original Message----- > >>>> From: Stephen Hemminger <step...@networkplumber.org> > >>>> Sent: Tuesday, January 4, 2022 11:56 PM > >>>> Subject: Re: Understanding Flow API action RSS > >>>> > >>>> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) > >>>> Ivan Malov <ivan.ma...@oktetlabs.ru> wrote: > >>>> > >>>>> Hi Stephen, > >>>>> > >>>>> On Tue, 4 Jan 2022, Stephen Hemminger wrote: > >>>>> > >>>>>> On Tue, 04 Jan 2022 13:41:55 +0100 > >>>>>> Thomas Monjalon <tho...@monjalon.net> wrote: > >>>>>> > >>>>>>> +Cc Ori Kam, rte_flow maintainer > >>>>>>> > >>>>>>> 29/12/2021 15:34, Ivan Malov: > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, > >>>>>>>> 'queue' is > >>>>>>>> to provide "Queue indices to use". But it is unclear whether the > >>>>>>>> order of > >>>>>>>> elements is meaningful or not. Does that matter? Can queue indices > >>>>>>>> repeat? > >>>>>> > >>>>>> The order probably doesn't matter, it is like the RSS indirection > >>>>>> table. > >>>>> > >>>>> Sorry, but RSS indirection table (RETA) assumes some structure. In it, > >>>>> queue indices can repeat, and the order is meaningful. In DPDK, RETA > >>>>> may comprise multiple "groups", each one comprising 64 entries. > >>>>> > >>>>> This 'queue' array in flow action RSS does not stick with the same > >>>>> terminology, it does not reuse the definition of RETA "group", etc. > >>>>> Just "queue indices to use". No definition of order, no structure. > >>>>> > >>>>> The API contract is not clear. Neither to users, nor to PMDs. > >>>>> > >>>> From API in RSS the queues are simply the queue ID, order doesn't matter, > >>> Duplicating the queue may affect the the spread based on the HW/PMD. > >>> In common case each queue should appear only once and the PMD may > >>> duplicate > >>> entries to get the best performance. > >> > >> Look. In a DPDK PMD, one has "global" RSS table. Consider the following > >> example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue > >> indices may repeat. They may have different order: 1, 1, 0, 0, ... . > >> The order is of great importance. If you send a packet to a > >> DPDK-powered server, you can know in advance its hash value. > >> Hence, you may strictly predict which RSS table entry this > >> hash will point at. That predicts the target Rx queue. > >> > >> So the questions which one should attempt to clarify, are as follows: > >> 1) Is the 'queue' array ordered? (Does the order of elements matter?) > >> 2) Can its elements repeat? (*allowed* or *not allowed*?) > >> > >> From API point of view the array is ordered, and may have repeating > >> elements. > > > >>> > >>>>>> > >>>>>> rx queue = RSS_indirection_table[ RSS_hash_value % > >>>>>> RSS_indirection_table_size ] > >>>>>> > >>>>>> So you could play with multiple queues matching same hash value, but > >>>>>> that > >>>>>> would be uncommon. > >>>>>> > >>>>>>>> An ethdev may have "global" RSS setting with an indirection table of > >>>>>>>> some > >>>>>>>> fixed size (say, 512). In what comes to flow rules, does that size > >>>>>>>> matter? > >>>>>> > >>>>>> Global RSS is only used if the incoming packet does not match any > >>>>>> rte_flow > >>>>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or > >>>>>> RTE_FLOW_ACTION_TYPE_RSS > >>>>>> these take precedence. > >>>>> > >>>>> Yes, I know all of that. The question is how does the PMD select RETA > >>>>> size > >>>>> for this action? Can it select an arbitrary value? Or should it stick > >>>>> with > >>>>> the "global" one (eg. 512)? How does the user know the table size? > >>>>> > >>>>> If the user simply wants to spread traffic across the given queues, > >>>>> the effective table size is a don't care to them, and the existing > >>>>> API contract is fine. But if the user expects that certain packets > >>>>> hit some precise queues, they need to know the table size for that. > >>>>> > >>> Just like you said RSS simply spread the traffic to the given queues. > >> > >> Yes, to the given queues. The question is whether the 'queue' array > >> has RETA properties (order matters; elements can repeat) or not. > >> > > > > Yes order matters and elements can repeat. > > > >>> If application wants to send traffic to some queue it should use the > >>> queue action. > >> > >> Yes, but that's not what I mean. Consider the following example. The user > >> generates packets with random IP addresses at machine A. These packets > >> hit DPDK at machine B. For a given *packet*, the sender (A) can > >> compute its RSS hash in software. This will point out the RETA > >> entry index. But, in order to predict the exact *queue* index, > >> the sender has to know the table (its contents, its size). > >> > > Why do application need this info? > > > >> For a "global" DPDK RSS setting, the table can be easily obtained with > >> an ethdev callback / API. Very simple. Fixed-size table, and it can > >> be queried. But how does one obtain similar knowledge for RSS action? > >> > > The RSS action was designed to allow balanced traffic spread. > > The size of the reta is PMD dependent, in some PMD the size will be > > the number of queues in others it will be the number of queues but in > > power of 2, so if the app requested 8 queues the reta will also be 8. > > In any case PMD should use the given order, if the PMD needs to expend > > it should cycle on the application requested queues in the order they were > > given. > > > > > >>> > >>>>> So, the question is whether the users should or should not build > >>>>> any expectations of the effective table size and, if they should, > >>>>> are they supposed to use the "global" table size for that? > >>>> > >>>> You are right this area is completely undocumented. Personally would > >>>> really like > >>>> it if rte_flow had a reference software implementation and all the HW > >>>> vendors > >>>> had to make sure their HW matched the SW reference version. But this a > >>>> case > >>>> where the funding is all on the HW side, and no one has time or resources > >>>> to do a complete SW version.. > >>>> > >>>> A sane implementation would configure RSS indirection as across all > >>>> rx queues that were available when the device was started; ie all queues > >>>> that did not have deferred start set. Then the application would > >>>> start/stop > >>>> queues and use rte_flow to reach them. > >>>> > >>>> But it doesn't appear the HW follows that model. > >>>> > >>>> > >>>>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, > >>>>>>>> does > >>>>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash > >>>>>>>> algorithm? > > What do you think about the above question? In my opinion, DEFAULT should > let the PMD select whatever hash function / algorithm it may want to > select. Just some vendor-specific optimal choice. > > If the user wants exactly Toeplitz / "standard RSS hash" behaviour, > they can always specify enum TOEPLITZ. And the PMD must either > comply or reject. What do you think? Are we on the same page? > Fully agree with you. The same goes if the user doesn't supply the key, PMD should select some default value. > >>>>>> > >>>>>> No the default is always Toeplitz. This goes back to the original > >>>>>> definition > >>>>>> of RSS which is in Microsoft NDIS and uses Toeplitz. > >>>>> > >>>>> Then why have a dedicated enum named TOEPLITZ? Also, once again, the > >>>>> documentation should be more specific to say which algorithm exactly > >>>>> this DEFAULT choice provides. Otherwise, it is very vague. > >>>>> > >>>>>> > >>>>>> DPDK should have more examples of using rte_flow, I have some samples > >>>>>> but they aren't that useful. > >>>>>> > >>>>> > >>>>> I could not agree more. > >>> > >>> Feel free to add/suggest what example are missing. > >>> > >>>>> > >>>>> Thanks, > >>>>> Ivan M. > >>> > >>> Best, > >>> Ori > >>> > > Best, > > Ori > > > > Best regards, > Ivan M.