rxqs retrieval in a common function

Antoine Tenart Fri, 08 Jan 2021 11:00:19 -0800

Quoting Alexander Duyck (2021-01-08 17:33:01)
> On Fri, Jan 8, 2021 at 1:07 AM Antoine Tenart <aten...@kernel.org> wrote:
> >
> > Quoting Alexander Duyck (2021-01-07 17:38:35)
> > > On Thu, Jan 7, 2021 at 12:54 AM Antoine Tenart <aten...@kernel.org> wrote:
> > > >
> > > > Quoting Alexander Duyck (2021-01-06 20:54:11)
> > > > > On Wed, Jan 6, 2021 at 10:04 AM Antoine Tenart <aten...@kernel.org> 
> > > > > wrote:
> > > >
> > > > That would require to hold rcu_read_lock in the caller and I'd like to
> > > > keep it in that function.
> > >
> > > Actually you could probably make it work if you were to pass a pointer
> > > to the RCU pointer.
> >
> > That should work but IMHO that could be easily breakable by future
> > changes as it's a bit tricky.
> >
> > > > > >         if (dev->num_tc) {
> > > > > >                 /* Do not allow XPS on subordinate device directly 
> > > > > > */
> > > > > >                 num_tc = dev->num_tc;
> > > > > > -               if (num_tc < 0) {
> > > > > > -                       ret = -EINVAL;
> > > > > > -                       goto err_rtnl_unlock;
> > > > > > -               }
> > > > > > +               if (num_tc < 0)
> > > > > > +                       return -EINVAL;
> > > > > >
> > > > > >                 /* If queue belongs to subordinate dev use its map 
> > > > > > */
> > > > > >                 dev = netdev_get_tx_queue(dev, index)->sb_dev ? : 
> > > > > > dev;
> > > > > >
> > > > > >                 tc = netdev_txq_to_tc(dev, index);
> > > > > > -               if (tc < 0) {
> > > > > > -                       ret = -EINVAL;
> > > > > > -                       goto err_rtnl_unlock;
> > > > > > -               }
> > > > > > +               if (tc < 0)
> > > > > > +                       return -EINVAL;
> > > > > >         }
> > > > > >
> > > > >
> > > > > So if we store the num_tc and nr_ids in the dev_maps structure then we
> > > > > could simplify this a bit by pulling the num_tc info out of the
> > > > > dev_map and only asking the Tx queue for the tc in that case and
> > > > > validating it against (tc <0 || num_tc <= tc) and returning an error
> > > > > if either are true.
> > > > >
> > > > > This would also allow us to address the fact that the rxqs feature
> > > > > doesn't support the subordinate devices as you could pull out the bit
> > > > > above related to the sb_dev and instead call that prior to calling
> > > > > xps_queue_show so that you are operating on the correct device map.
> > >
> > > It probably would be necessary to pass dev and index if we pull the
> > > netdev_get_tx_queue()->sb_dev bit out and performed that before we
> > > called the xps_queue_show function. Specifically as the subordinate
> > > device wouldn't match up with the queue device so we would be better
> > > off pulling it out first.
> >
> > While I agree moving the netdev_get_tx_queue()->sb_dev bit out of
> > xps_queue_show seems like a good idea for consistency, I'm not sure
> > it'll work: dev->num_tc is not only used to retrieve the number of tc
> > but also as a condition on not being 0. We have things like:
> >
> >   // Always done with the original dev.
> >   if (dev->num_tc) {
> >
> >       [...]
> >
> >       // Can be a subordinate dev.
> >       tc = netdev_txq_to_tc(dev, index);
> >   }
> >
> > And after moving num_tc in the map, we'll have checks like:
> >
> >   if (dev_maps->num_tc != dev->num_tc)
> >       return -EINVAL;
> 
> We shouldn't. That defeats the whole point and you will never be able
> to rely on the num_tc in the device to remain constant. If we are
> moving the value to an RCU accessible attribute we should just be
> using that value. We can only use that check if we are in an rtnl_lock
> anyway and we won't need that if we are just displaying the value.
> 
> The checks should only be used to verify the tc of the queue is within
> the bounds of the num_tc of the xps_map.


Right. So that would mean we have to choose between:

- Removing the rtnl lock but with the understanding that the value we
  get when reading the maps might be invalid and not make sense with
  the current dev->num_tc configuration.

- Keeping the rtnl lock (which, I mean, I'd like to remove).

My first goal for embedding num_tc in the maps was to prevent accessing
the maps out-of-bound when dev->num_tc is updated after the maps are
allocated. That's a possibility (I could produce such behaviour with
KASAN enabled) even when taking the rtnl lock in the show/store helpers.

We're now talking of also removing the rtnl lock, which is fine, I just
want to make those two different goals explicit as they're not
interdependent.

> > > > > I think Jakub had mentioned earlier the idea of possibly moving some
> > > > > fields into the xps_cpus_map and xps_rxqs_map in order to reduce the
> > > > > complexity of this so that certain values would be protected by the
> > > > > RCU lock.
> > > > >
> > > > > This might be a good time to look at encoding things like the number
> > > > > of IDs and the number of TCs there in order to avoid a bunch of this
> > > > > duplication. Then you could just pass a pointer to the map you want to
> > > > > display and the code should be able to just dump the values.:
> > > >
> > > > 100% agree to all the above. That would also prevent from making out of
> > > > bound accesses when dev->num_tc is increased after dev_maps is
> > > > allocated. I do have a series ready to be send storing num_tc into the
> > > > maps, and reworking code to use it instead of dev->num_tc. The series
> > > > also adds checks to ensure the map is valid when we access it (such as
> > > > making sure dev->num_tc == map->num_tc). I however did not move nr_ids
> > > > into the map yet, but I'll look into it.
> > > >
> > > > The idea is to send it as a follow up series, as this one is only moving
> > > > code around to improve maintenance and readability. Even if all the
> > > > patches were in the same series that would be a prerequisite.
> > >
> > > Okay, so if we are going to do it as a follow-up that is fine I
> > > suppose, but one of the reasons I brought it up is that it would help
> > > this patch set in terms of readability/maintainability. An additional
> > > change we could look at making would be to create an xps_map pointer
> > > array instead of having individual pointers. Then you could simply be
> > > passing an index into the array to indicate if we are accessing the
> > > CPUs or the RXQs map.
> >
> > Merging the two maps and embedding an offset in the struct? With the
> > upcoming changes embedding information in the map themselves we should
> > have a single check to know what map to use. Using a single array with
> > offsets would not improve that. Also maps maintenance when num_tc
> > is updated would need to take care of both maps, having side effects
> > such as removing the old rxqs map when allocating the cpus one (or the
> > opposite).
> 
> Sorry, I didn't mean to merge the two maps. Just go from two pointers
> to an array containing two pointers. Right now them sitting right next
> to each other it becomes pretty easy to just convert them so that
> instead of accessing them as:
> 
> dev->xps_rxqs_map
> dev->xps_cpus_map
> 
> You could instead access them as:
> dev->xps_map[XPS_RXQ];
> dev->xps_map[XPS_CPU];
> 
> Then instead of all the if/else logic we have in the code you just are
> passing the index of the xps_map you want to access and we have the
> nr_ids and num_tc all encoded in the map so the code itself. Then for
> displaying we are just using the fields from the map to validate.
> 
> We will still end up needing to take the rtnl_lock for the
> __netif_set_xps_queue case, however that should be the only case where
> we really need it as we have to re-read dev->num_tc and the
> netdev_txq_to_tc and guarantee they don't change while we are
> programming the map.

Thanks for the detailed explanations. That indeed would be good.

> That reminds me we may want to add an ASSERT_RTNL to the start of
> __netif_set_xps_queue and a comment indicating that we need to hold
> the rtnl lock to guarantee that num_tc and the Tx queue to TC mapping
> cannot change while we are programming the value into the map.

Good idea!

Thanks,
Antoine

Re: [PATCH net 3/3] net-sysfs: move the xps cpus/rxqs retrieval in a common function

Reply via email to