Quoting Alexander Duyck (2021-01-08 17:33:01) > On Fri, Jan 8, 2021 at 1:07 AM Antoine Tenart <aten...@kernel.org> wrote: > > > > Quoting Alexander Duyck (2021-01-07 17:38:35) > > > On Thu, Jan 7, 2021 at 12:54 AM Antoine Tenart <aten...@kernel.org> wrote: > > > > > > > > Quoting Alexander Duyck (2021-01-06 20:54:11) > > > > > On Wed, Jan 6, 2021 at 10:04 AM Antoine Tenart <aten...@kernel.org> > > > > > wrote: > > > > > > > > That would require to hold rcu_read_lock in the caller and I'd like to > > > > keep it in that function. > > > > > > Actually you could probably make it work if you were to pass a pointer > > > to the RCU pointer. > > > > That should work but IMHO that could be easily breakable by future > > changes as it's a bit tricky. > > > > > > > > if (dev->num_tc) { > > > > > > /* Do not allow XPS on subordinate device directly > > > > > > */ > > > > > > num_tc = dev->num_tc; > > > > > > - if (num_tc < 0) { > > > > > > - ret = -EINVAL; > > > > > > - goto err_rtnl_unlock; > > > > > > - } > > > > > > + if (num_tc < 0) > > > > > > + return -EINVAL; > > > > > > > > > > > > /* If queue belongs to subordinate dev use its map > > > > > > */ > > > > > > dev = netdev_get_tx_queue(dev, index)->sb_dev ? : > > > > > > dev; > > > > > > > > > > > > tc = netdev_txq_to_tc(dev, index); > > > > > > - if (tc < 0) { > > > > > > - ret = -EINVAL; > > > > > > - goto err_rtnl_unlock; > > > > > > - } > > > > > > + if (tc < 0) > > > > > > + return -EINVAL; > > > > > > } > > > > > > > > > > > > > > > > So if we store the num_tc and nr_ids in the dev_maps structure then we > > > > > could simplify this a bit by pulling the num_tc info out of the > > > > > dev_map and only asking the Tx queue for the tc in that case and > > > > > validating it against (tc <0 || num_tc <= tc) and returning an error > > > > > if either are true. > > > > > > > > > > This would also allow us to address the fact that the rxqs feature > > > > > doesn't support the subordinate devices as you could pull out the bit > > > > > above related to the sb_dev and instead call that prior to calling > > > > > xps_queue_show so that you are operating on the correct device map. > > > > > > It probably would be necessary to pass dev and index if we pull the > > > netdev_get_tx_queue()->sb_dev bit out and performed that before we > > > called the xps_queue_show function. Specifically as the subordinate > > > device wouldn't match up with the queue device so we would be better > > > off pulling it out first. > > > > While I agree moving the netdev_get_tx_queue()->sb_dev bit out of > > xps_queue_show seems like a good idea for consistency, I'm not sure > > it'll work: dev->num_tc is not only used to retrieve the number of tc > > but also as a condition on not being 0. We have things like: > > > > // Always done with the original dev. > > if (dev->num_tc) { > > > > [...] > > > > // Can be a subordinate dev. > > tc = netdev_txq_to_tc(dev, index); > > } > > > > And after moving num_tc in the map, we'll have checks like: > > > > if (dev_maps->num_tc != dev->num_tc) > > return -EINVAL; > > We shouldn't. That defeats the whole point and you will never be able > to rely on the num_tc in the device to remain constant. If we are > moving the value to an RCU accessible attribute we should just be > using that value. We can only use that check if we are in an rtnl_lock > anyway and we won't need that if we are just displaying the value. > > The checks should only be used to verify the tc of the queue is within > the bounds of the num_tc of the xps_map.
Right. So that would mean we have to choose between: - Removing the rtnl lock but with the understanding that the value we get when reading the maps might be invalid and not make sense with the current dev->num_tc configuration. - Keeping the rtnl lock (which, I mean, I'd like to remove). My first goal for embedding num_tc in the maps was to prevent accessing the maps out-of-bound when dev->num_tc is updated after the maps are allocated. That's a possibility (I could produce such behaviour with KASAN enabled) even when taking the rtnl lock in the show/store helpers. We're now talking of also removing the rtnl lock, which is fine, I just want to make those two different goals explicit as they're not interdependent. > > > > > I think Jakub had mentioned earlier the idea of possibly moving some > > > > > fields into the xps_cpus_map and xps_rxqs_map in order to reduce the > > > > > complexity of this so that certain values would be protected by the > > > > > RCU lock. > > > > > > > > > > This might be a good time to look at encoding things like the number > > > > > of IDs and the number of TCs there in order to avoid a bunch of this > > > > > duplication. Then you could just pass a pointer to the map you want to > > > > > display and the code should be able to just dump the values.: > > > > > > > > 100% agree to all the above. That would also prevent from making out of > > > > bound accesses when dev->num_tc is increased after dev_maps is > > > > allocated. I do have a series ready to be send storing num_tc into the > > > > maps, and reworking code to use it instead of dev->num_tc. The series > > > > also adds checks to ensure the map is valid when we access it (such as > > > > making sure dev->num_tc == map->num_tc). I however did not move nr_ids > > > > into the map yet, but I'll look into it. > > > > > > > > The idea is to send it as a follow up series, as this one is only moving > > > > code around to improve maintenance and readability. Even if all the > > > > patches were in the same series that would be a prerequisite. > > > > > > Okay, so if we are going to do it as a follow-up that is fine I > > > suppose, but one of the reasons I brought it up is that it would help > > > this patch set in terms of readability/maintainability. An additional > > > change we could look at making would be to create an xps_map pointer > > > array instead of having individual pointers. Then you could simply be > > > passing an index into the array to indicate if we are accessing the > > > CPUs or the RXQs map. > > > > Merging the two maps and embedding an offset in the struct? With the > > upcoming changes embedding information in the map themselves we should > > have a single check to know what map to use. Using a single array with > > offsets would not improve that. Also maps maintenance when num_tc > > is updated would need to take care of both maps, having side effects > > such as removing the old rxqs map when allocating the cpus one (or the > > opposite). > > Sorry, I didn't mean to merge the two maps. Just go from two pointers > to an array containing two pointers. Right now them sitting right next > to each other it becomes pretty easy to just convert them so that > instead of accessing them as: > > dev->xps_rxqs_map > dev->xps_cpus_map > > You could instead access them as: > dev->xps_map[XPS_RXQ]; > dev->xps_map[XPS_CPU]; > > Then instead of all the if/else logic we have in the code you just are > passing the index of the xps_map you want to access and we have the > nr_ids and num_tc all encoded in the map so the code itself. Then for > displaying we are just using the fields from the map to validate. > > We will still end up needing to take the rtnl_lock for the > __netif_set_xps_queue case, however that should be the only case where > we really need it as we have to re-read dev->num_tc and the > netdev_txq_to_tc and guarantee they don't change while we are > programming the map. Thanks for the detailed explanations. That indeed would be good. > That reminds me we may want to add an ASSERT_RTNL to the start of > __netif_set_xps_queue and a comment indicating that we need to hold > the rtnl lock to guarantee that num_tc and the Tx queue to TC mapping > cannot change while we are programming the value into the map. Good idea! Thanks, Antoine