> Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx traffic > classes > > On Wed, 20 May 2020 20:24:43 +0000 Ioana Ciornei wrote: > > > Subject: Re: [PATCH v2 net-next 0/7] dpaa2-eth: add support for Rx > > > traffic classes > > > > > > On Wed, 20 May 2020 15:10:42 +0000 Ioana Ciornei wrote: > > > > DPAA2 has frame queues per each Rx traffic class and the decision > > > > from which queue to pull frames from is made by the HW based on > > > > the queue priority within a channel (there is one channel per each CPU). > > > > > > IOW you're reading the descriptor for the device memory/iomem > > > address and the HW will return the next descriptor based on configured > priority? > > > > That's the general idea but the decision is not made on a frame by > > frame bases but rather on a dequeue operation which can, at a maximum, > > return > > 16 frame descriptors at a time. > > I see! > > > > Presumably strict priority? > > > > Only the two highest traffic classes are in strict priority, while the > > other 6 TCs form two priority tiers - medium(4 TCs) and low (last two TCs). > > > > > > If this should be modeled in software, then I assume there should > > > > be a NAPI instance for each traffic class and the stack should > > > > know in which order to call the poll() callbacks so that the priority is > respected. > > > > > > Right, something like that. But IMHO not needed if HW can serve the > > > right descriptor upon poll. > > > > After thinking this through I don't actually believe that multiple > > NAPI instances would solve this in any circumstance at all: > > > > - If you have hardware prioritization with full scheduling on dequeue > > then job on the driver side is already done. > > - If you only have hardware assist for prioritization (ie hardware > > gives you multiple rings but doesn't tell you from which one to > > dequeue) then you can still use a single NAPI instance just fine and pick > > the > highest priority non-empty ring on-the-fly basically. > > > > What I am having trouble understanding is how the fully software > > implementation of this possible new Rx qdisc should work. Somehow the > > skb->priority should be taken into account when the skb is passing > > though the stack (ie a higher priority skb should surpass another > > previously received skb even if the latter one was received first, but its > > priority > queue is congested). > > I'd think the SW implementation would come down to which ring to service > first. > If there are multiple rings on the host NAPI can try to read from highest > priority > ring first and then move on to next prio. > Not sure if there would be a use case for multiple NAPIs for busy polling or > not. > > I was hoping we can solve this with the new ring config API (which is coming > any > day now, ehh) - in which I hope user space will be able to assign rings to > NAPI > instances, all we would have needed would be also controlling the querying > order. But that doesn't really work for you, it seems, since the selection is > offloaded to HW :S >
Yes, I would need only the configuration of traffic classes and their priorities and not the software prioritization. I'll keep a close eye on the mailing list to see what the new ring config API that you're referring to is adding. > > I don't have a very deep understanding of the stack but I am thinking > > that the > > enqueue_to_backlog()/process_backlog() area could be a candidate place > > for sorting out bottlenecks. In case we do that I don't see why a > > qdisc would be necessary at all and not have everybody benefit from > prioritization based on skb->priority. > > I think once the driver picks the frame up it should run with it to > completion (+/- > GRO). We have natural batching with NAPI processing. > Every NAPI budget high priority rings get a chance to preempt lower ones.