> On Wed, 2021-09-29 at 10:20 +0000, Ananyev, Konstantin wrote: > > > > > > > > > > > > > > > > > > In current DPDK framework, each RX > > > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > pre-loaded with mbufs > > > > > > > > > > > > > > > > > > for incoming packets. When number of > > > > > > > > > > > > > > > > > > representors scale out in a > > > > > > > > > > > > > > > > > > switch domain, the memory consumption > > > > > > > > > > > > > > > > > > became > > > > > > > > > > > > > > > > > > significant. Most > > > > > > > > > > > > > > > > > > important, polling all ports leads to > > > > > > > > > > > > > > > > > > high > > > > > > > > > > > > > > > > > > cache miss, high > > > > > > > > > > > > > > > > > > latency and low throughput. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This patch introduces shared RX > > > > > > > > > > > > > > > > > > queue. > > > > > > > > > > > > > > > > > > Ports > > > > > > > > > > > > > > > > > > with same > > > > > > > > > > > > > > > > > > configuration in a switch domain > > > > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > > > share > > > > > > > > > > > > > > > > > > RX queue set by specifying sharing > > > > > > > > > > > > > > > > > > group. > > > > > > > > > > > > > > > > > > Polling any queue using same shared > > > > > > > > > > > > > > > > > > RX > > > > > > > > > > > > > > > > > > queue > > > > > > > > > > > > > > > > > > receives packets from > > > > > > > > > > > > > > > > > > all member ports. Source port is > > > > > > > > > > > > > > > > > > identified > > > > > > > > > > > > > > > > > > by mbuf->port. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Port queue number in a shared group > > > > > > > > > > > > > > > > > > should be > > > > > > > > > > > > > > > > > > identical. Queue > > > > > > > > > > > > > > > > > > index is > > > > > > > > > > > > > > > > > > 1:1 mapped in shared group. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Share RX queue is supposed to be > > > > > > > > > > > > > > > > > > polled > > > > > > > > > > > > > > > > > > on > > > > > > > > > > > > > > > > > > same thread. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Multiple groups is supported by group > > > > > > > > > > > > > > > > > > ID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Is this offload specific to the > > > > > > > > > > > > > > > > > representor? If > > > > > > > > > > > > > > > > > so can this name be changed > > > > > > > > > > > > > > > > > specifically to > > > > > > > > > > > > > > > > > representor? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yes, PF and representor in switch domain > > > > > > > > > > > > > > > > could > > > > > > > > > > > > > > > > take advantage. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If it is for a generic case, how the > > > > > > > > > > > > > > > > > flow > > > > > > > > > > > > > > > > > ordering will be maintained? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Not quite sure that I understood your > > > > > > > > > > > > > > > > question. > > > > > > > > > > > > > > > > The control path of is > > > > > > > > > > > > > > > > almost same as before, PF and representor > > > > > > > > > > > > > > > > port > > > > > > > > > > > > > > > > still needed, rte flows not impacted. > > > > > > > > > > > > > > > > Queues still needed for each member port, > > > > > > > > > > > > > > > > descriptors(mbuf) will be > > > > > > > > > > > > > > > > supplied from shared Rx queue in my PMD > > > > > > > > > > > > > > > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My question was if create a generic > > > > > > > > > > > > > > > RTE_ETH_RX_OFFLOAD_SHARED_RXQ offload, > > > > > > > > > > > > > > > multiple > > > > > > > > > > > > > > > ethdev receive queues land into > > > > > > > the same > > > > > > > > > > > > > > > receive queue, In that case, how the flow > > > > > > > > > > > > > > > order > > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > maintained for respective receive queues. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I guess the question is testpmd forward > > > > > > > > > > > > > > stream? > > > > > > > > > > > > > > The > > > > > > > > > > > > > > forwarding logic has to be changed slightly > > > > > > > > > > > > > > in > > > > > > > > > > > > > > case > > > > > > > > > > > > > > of shared rxq. > > > > > > > > > > > > > > basically for each packet in rx_burst result, > > > > > > > > > > > > > > lookup > > > > > > > > > > > > > > source stream according to mbuf->port, > > > > > > > > > > > > > > forwarding > > > > > > > > > > > > > > to > > > > > > > > > > > > > > target fs. > > > > > > > > > > > > > > Packets from same source port could be > > > > > > > > > > > > > > grouped as > > > > > > > > > > > > > > a > > > > > > > > > > > > > > small burst to process, this will accelerates > > > > > > > > > > > > > > the > > > > > > > > > > > > > > performance if traffic > > > > > > > come from > > > > > > > > > > > > > > limited ports. I'll introduce some common api > > > > > > > > > > > > > > to > > > > > > > > > > > > > > do > > > > > > > > > > > > > > shard rxq forwarding, call it with packets > > > > > > > > > > > > > > handling > > > > > > > > > > > > > > callback, so it suites for > > > > > > > > > > > > > > all forwarding engine. Will sent patches > > > > > > > > > > > > > > soon. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > All ports will put the packets in to the same > > > > > > > > > > > > > queue > > > > > > > > > > > > > (share queue), right? Does > > > > > > > > > > > > > this means only single core will poll only, > > > > > > > > > > > > > what > > > > > > > > > > > > > will > > > > > > > > > > > > > happen if there are > > > > > > > > > > > > > multiple cores polling, won't it cause problem? > > > > > > > > > > > > > > > > > > > > > > > > > > And if this requires specific changes in the > > > > > > > > > > > > > application, I am not sure about > > > > > > > > > > > > > the solution, can't this work in a transparent > > > > > > > > > > > > > way > > > > > > > > > > > > > to > > > > > > > > > > > > > the application? > > > > > > > > > > > > > > > > > > > > > > > > Discussed with Jerin, new API introduced in v3 > > > > > > > > > > > > 2/8 > > > > > > > > > > > > that > > > > > > > > > > > > aggregate ports > > > > > > > > > > > > in same group into one new port. Users could > > > > > > > > > > > > schedule > > > > > > > > > > > > polling on the > > > > > > > > > > > > aggregated port instead of all member ports. > > > > > > > > > > > > > > > > > > > > > > The v3 still has testpmd changes in fastpath. > > > > > > > > > > > Right? > > > > > > > > > > > IMO, > > > > > > > > > > > For this > > > > > > > > > > > feature, we should not change fastpath of testpmd > > > > > > > > > > > application. Instead, testpmd can use aggregated > > > > > > > > > > > ports > > > > > > > > > > > probably as > > > > > > > > > > > separate fwd_engine to show how to use this > > > > > > > > > > > feature. > > > > > > > > > > > > > > > > > > > > Good point to discuss :) There are two strategies to > > > > > > > > > > polling > > > > > > > > > > a shared > > > > > > > > > > Rxq: > > > > > > > > > > 1. polling each member port > > > > > > > > > > All forwarding engines can be reused to work as > > > > > > > > > > before. > > > > > > > > > > My testpmd patches are efforts towards this > > > > > > > > > > direction. > > > > > > > > > > Does your PMD support this? > > > > > > > > > > > > > > > > > > Not unfortunately. More than that, every application > > > > > > > > > needs > > > > > > > > > to > > > > > > > > > change > > > > > > > > > to support this model. > > > > > > > > > > > > > > > > Both strategies need user application to resolve port ID > > > > > > > > from > > > > > > > > mbuf and > > > > > > > > process accordingly. > > > > > > > > This one doesn't demand aggregated port, no polling > > > > > > > > schedule > > > > > > > > change. > > > > > > > > > > > > > > I was thinking, mbuf will be updated from driver/aggregator > > > > > > > port as > > > > > > > when it > > > > > > > comes to application. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. polling aggregated port > > > > > > > > > > Besides forwarding engine, need more work to to > > > > > > > > > > demo > > > > > > > > > > it. > > > > > > > > > > This is an optional API, not supported by my PMD > > > > > > > > > > yet. > > > > > > > > > > > > > > > > > > We are thinking of implementing this PMD when it comes > > > > > > > > > to > > > > > > > > > it, > > > > > > > > > ie. > > > > > > > > > without application change in fastpath > > > > > > > > > logic. > > > > > > > > > > > > > > > > Fastpath have to resolve port ID anyway and forwarding > > > > > > > > according > > > > > > > > to > > > > > > > > logic. Forwarding engine need to adapt to support shard > > > > > > > > Rxq. > > > > > > > > Fortunately, in testpmd, this can be done with an > > > > > > > > abstract > > > > > > > > API. > > > > > > > > > > > > > > > > Let's defer part 2 until some PMD really support it and > > > > > > > > tested, > > > > > > > > how do > > > > > > > > you think? > > > > > > > > > > > > > > We are not planning to use this feature so either way it is > > > > > > > OK > > > > > > > to > > > > > > > me. > > > > > > > I leave to ethdev maintainers decide between 1 vs 2. > > > > > > > > > > > > > > I do have a strong opinion not changing the testpmd basic > > > > > > > forward > > > > > > > engines > > > > > > > for this feature.I would like to keep it simple as fastpath > > > > > > > optimized and would > > > > > > > like to add a separate Forwarding engine as means to verify > > > > > > > this > > > > > > > feature. > > > > > > > > > > > > +1 to that. > > > > > > I don't think it a 'common' feature. > > > > > > So separate FWD mode seems like a best choice to me. > > > > > > > > > > -1 :) > > > > > There was some internal requirement from test team, they need > > > > > to > > > > > verify > > > > > all features like packet content, rss, vlan, checksum, > > > > > rte_flow... > > > > > to > > > > > be working based on shared rx queue. > > > > > > > > Then I suppose you'll need to write really comprehensive fwd- > > > > engine > > > > to satisfy your test team :) > > > > Speaking seriously, I still don't understand why do you need all > > > > available fwd-engines to verify this feature. > > > > > > The shared Rxq is low level feature, need to make sure driver > > > higher > > > level features working properly. fwd-engines like csum checks input > > > packet and enable L3/L4 checksum and tunnel offloads accordingly, > > > other engines do their own feature verification. All test > > > automation > > > could be reused with these engines supported seamlessly. > > > > > > > From what I understand, main purpose of your changes to test-pmd: > > > > allow to fwd packet though different fwd_stream (TX through > > > > different > > > > HW queue). > > > > > > Yes, each mbuf in burst come from differnt port, testpmd current > > > fwd- > > > engines relies heavily on source forwarding stream, that's why the > > > patch devide burst result mbufs into sub-burst and use orginal fwd- > > > engine callback to handle. How to handle is not changed. > > > > > > > In theory, if implemented in generic and extendable way - that > > > > might be a useful add-on to tespmd fwd functionality. > > > > But current implementation looks very case specific. > > > > And as I don't think it is a common case, I don't see much point > > > > to > > > > pollute > > > > basic fwd cases with it. > > > > > > Shared Rxq is a ethdev feature that impacts how packets get > > > handled. > > > It's natural to update forwarding engines to avoid broken. > > > > Why is that? > > All it affects the way you RX the packets. > > So why *all* FWD engines have to be updated? > > People will ask why some FWD engine can't work?
It can be documented: which fwd engine supposed to work properly with this feature, which not. BTW, as I understand, as long as RX queues are properly assigned to lcores, any fwd engine will continue to work. Just for engines that are not aware about this feature packets can be send out via wrong TX queue. > > Let say what specific you are going to test with macswap vs macfwd > > mode for that feature? > > If people want to test NIC with real switch, or make sure L2 layer not > get corrupted. I understand that, what I am saying that you probably don't need both to test this feature. > > I still think one specific FWD engine is enough to cover majority of > > test cases. > > Yes, rxonly should be sufficient to verify the fundametal, but to > verify csum, timing, need others. Some back2back test system need io > forwarding, real switch depolyment need macswap... Ok, but nothing stops you to pickup for your purposes a FWD engine with most comprehensive functionality: macswap, 5tswap, even 5tswap+csum update .... > > > > > > > The new macro is introduced to minimize performance impact, I'm > > > also > > > wondering is there an elegant solution :) > > > > I think Jerin suggested a good alternative with eventdev. > > As another approach - might be consider to add an RX callback that > > will return packets only for one particular port (while keeping > > packets > > for other ports cached internally). > > This and the aggreate port API could be options in ethdev layer later. > It can't be the fundamental due performance loss and potential cache > miss. > > > As a 'wild' thought - change testpmd fwd logic to allow multiple TX > > queues > > per fwd_stream and add a function to do TX switching logic. > > But that's probably quite a big change that needs a lot of work. > > > > > Current performance penalty > > > is one "if unlikely" per burst. > > > > It is not only about performance impact. > > It is about keeping test-pmd code simple and maintainable. > > > > > > Think in reverse direction, if we don't update fwd-engines here, > > > all > > > malfunction when shared rxq enabled, users can't verify driver > > > features, are you expecting this? > > > > I expect developers not to rewrite whole test-pmd fwd code for each > > new ethdev feature. > > Here just abstract duplicated code from fwd-engines, an improvement, > keep test-pmd code simple and mantainable. > > > Specially for the feature that is not widely used. > > Based on the huge memory saving, performance and latency gains, it will > be popular to users. > > But the test-pmd is not critical to this feature, I'm ok to drop the > fwd-engine support if you agree. Not sure fully understand you here... If you saying that you decided to demonstrate this feature via some other app/example and prefer to abandon these changes in testpmd - then, yes I don't see any problems with that.