On 08/30/2017 04:37 PM, Cong Wang wrote: > On Tue, Aug 29, 2017 at 8:49 PM, Florian Fainelli <f.faine...@gmail.com> > wrote: >> Le 08/07/17 à 15:26, Florian Fainelli a écrit : >>> Hi, >>> >>> Most DSA supported Broadcom switches have multiple queues per ports >>> (usually 8) and each of these queues can be configured with different >>> pause, drop, hysteresis thresholds and so on in order to make use of the >>> switch's internal buffering scheme and have some queues achieve some >>> kind of lossless behavior (e.g: LAN to LAN traffic for Q7 has a higher >>> priority than LAN to WAN for Q0). >>> >>> This is obviously very workload specific, so I'd want maximum >>> programmability as much as possible. >>> >>> This brings me to a few questions: >>> >>> 1) If we have the DSA slave network devices currently flagged with >>> IFF_NO_QUEUE becoming multi-queue (on TX) aware such that an application >>> can control exactly which switch egress queue is used on a per-flow >>> basis, would that be a problem (this is the dynamic selection of the TX >>> queue)? >> >> So I have this part figured out, with a bunch of changes network devices >> created by DSA are now multiqueue aware and the Broadcom tag layer is >> capable of extracting the queue index, passing it in the tag where >> expected and having the switch forward to the appropriate switch port >> and queue within that port. It also sets the queue mapping in the SKB >> for later consumption by the master network device driver: bcmsysport.c >> because of 2). >> >>> >>> 2) The conduit interface (CPU) port network interface has a congestion >>> control scheme which requires each of its TX queues (32 or 16) to be >>> statically mapped to each of the underlying switch port queues because >>> the congestion/ HW needs to inspect the queue depths of the switch to >>> accept/reject a packet at the CPU's TX ring level. Do we have a good way >>> with tc to map a virtual/stacked device's queue(s) on-top of its >>> physical/underlying device's queues (this is the static queue mapping >>> necessary for congestion to work)? >> >> That part I have not figured out yet, with some static mapping I can >> obtain the results that I want and was even considering the possibility >> of doing something like this: >> >> - register a network device notifier with bcmsysport.c (master network >> device) for this setup >> - expose a helper function allowing me to obtain a given DSA network >> device port index >> - whenever DSA creates network devices reconfigure the ring and queue >> mapping of the TX queues managed by bcmsysport.c with the DSA network >> device port index that has just been registered and just do a 1-1 >> mapping of the 8 queues >> >> You would end-up with something like: >> >> gphy (port 0) queues 0-7 mapped to systemport queues 0-7 >> rgmii_1 (port 1) queues 0-7 mapped to systemport queues 8-15 >> rgmii_2 (port 2) queues 0-7 mapped to systemport queues 16 through 23 >> moca (port 7) queues 0-7 mapped to systemport queues 24-31 >> >> This should be working because bcmsysport's TX queues are not under >> direct control by the user, they are used via DSA created network >> devices which indicate the queue they want to use. When the DSA >> interfaces are brought down, their respective systemport queues now >> become unused. This also works because the number of physical ports of >> the switch times the number of queues is matching the number of TX >> queues from systemport (like if someone designed it with that exact >> purpose in mind ;)). >> >> The only problem with that approach of course is that it embeds a policy >> within the systemport driver. >> >> Ideally I would really like to configure this via tc by setting up a >> mapping between queues of one network devices to queues of another >> network device, is that a possible thing, Jamal, Cong, Jiri, do you know? > > I am not sure if I understand the mapping you are talking about here. > > TC layer rarely deals with hardware queues directly (except probably mq), > so this question probably don't belong to TC. > > OTOH, TC can modify skb->hash, so you can redirect packets to a specific > queue, but this doesn't sound like what you are you looking for.
I am actually building on TC being able to influence the value of skb->queue_mapping, but that is just for the stacked devices, not the underlying conduit device that does the actual transmission. > > Maybe Jiri has more thoughts here since he works on TC offloading things. > Patches with explanations and context (hopefully clearer) here: http://patchwork.ozlabs.org/project/netdev/list/?series=728 Thanks! -- Florian