> -----Original Message----- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev, Konstantin > Sent: Wednesday, January 31, 2018 9:39 PM > To: Chas Williams; dev@dpdk.org > Cc: Lu, Wenzhuo > Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix reconfiguration of rx queues > > > > > > > > > > > > > > > > From: "Charles (Chas) Williams" <ch...@att.com> > > > > > > > > > > .dev_configure() may be called again after RX queues have been setup. > > > > > This has the effect of clearing whatever setting the RX queues > > > > > made for rx_bulk_alloc_allowed or rx_vec_allowed. Only reset > > > > > this configuration is there aren't any currently allocated queues. > > > > > > > > > > Fixes: 01fa1d6215fa ("ixgbe: unify Rx setup") > > > > > > > > > > Signed-off-by: Chas Williams <ch...@att.com> > > > > > --- > > > > > drivers/net/ixgbe/ixgbe_ethdev.c | 18 ++++++++++++++++-- > > > > > 1 file changed, 16 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > index 37eb668..b39249a 100644 > > > > > --- a/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > > > > > @@ -2348,6 +2348,7 @@ ixgbe_dev_configure(struct rte_eth_dev > > > > >*dev) > > > > > struct ixgbe_adapter *adapter = > > > > > (struct ixgbe_adapter *)dev->data->dev_private; > > > > > int ret; > > > > > + uint16_t i; > > > > > > > > > > PMD_INIT_FUNC_TRACE(); > > > > > /* multipe queue mode checking */ @@ -2363,11 +2364,17 @@ > > > > >ixgbe_dev_configure(struct rte_eth_dev *dev) > > > > > > > > > > /* > > > > > * Initialize to TRUE. If any of Rx queues doesn't meet > > > > >the bulk > > > > > - * allocation or vector Rx preconditions we will reset it. > > > > > + * allocation or vector Rx preconditions we will reset it. > > > > > +We > > > > > + * can only do this is there aren't any existing RX queues. > > > > > */ > > > > > + for (i = 0; i < dev->data->nb_rx_queues; i++) { > > > > > + if (dev->data->rx_queues[i]) > > > > > + goto out; > > > > > + } > > > > I don't see why this is needed. > > > > It surely should be possible to reconfigure device with different > > > > number of queues. > > > > Konstantin > > > > > > > > Yes, you can add new queues but you shouldn't reset the bulk and > > > > vec settings that have already been chosen by the previously allocated > queues. > > > Why is that? Might be in new queue setup user will change settings? > > > > > > There is no requirement that the user allocates all the RX queues in the > same way. > > > Some could have a different numbers of descs which is one of the > > > checks in > > > check_rx_burst_bulk_alloc_preconditions() > > Exactly. That's why after dev_configure() user has to call > > queue_setup() for *all* queues he plans to use. > > > > Where the API or documentation does it say that this is necessary? If > > this was a requirement then rte_eth_dev_configure() should drop every > > allocated queue. Since it doesn't do this I can only assume that you > > are allowed to keep using queues after calling rte_eth_dev_configure() > without having to set them up again. > > > > > > > > > > > > > > > > > If those queues > > > > set rx_bulk_alloc_allowed to be false, then this is going to cause > > > > an issue with queue release later on. > > > > > > Could you be a bit more specific here: > > > What you think will be broken in ixgbe_rx_queue_release() in that case? > > > > > > Sorry, I mispoke. It's this function, ixgbe_reset_rx_queue(), > > > > > > /* > > > * By default, the Rx queue setup function allocates enough > > > memory for > > > * IXGBE_MAX_RING_DESC. The Rx Burst bulk allocation > > > function requires > > > * extra memory at the end of the descriptor ring to be zero'd > > > out. > > > */ > > > if (adapter->rx_bulk_alloc_allowed) > > > /* zero out extra memory */ > > > len += RTE_PMD_IXGBE_RX_MAX_BURST; > > > > > > /* > > > * Zero out HW ring memory. Zero out extra memory at the end > > > of > > > * the H/W ring so look-ahead logic in Rx Burst bulk alloc > > > function > > > * reads extra memory as zeros. > > > */ > > > for (i = 0; i < len; i++) { > > > rxq->rx_ring[i] = zeroed_desc; > > > } > > > > > > So you potentially write past the rx_ring[] you allocated. > > > > We always allocate rx_ring[] to maximum possible size plus space for fake > descriptors: > > > > drivers/net/ixgbe/ixgbe_rxtx.h: > > #define RX_RING_SZ ((IXGBE_MAX_RING_DESC + > RTE_PMD_IXGBE_RX_MAX_BURST) > > * \ > > sizeof(union ixgbe_adv_rx_desc)) > > > > then at ixgbe_dev_rx_queue_setup(...): > > > > /* > > * Allocate RX ring hardware descriptors. A memzone large enough to > > * handle the maximum ring size is allocated in order to allow for > > * resizing in later calls to the queue setup function. > > */ > > rz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_idx, > > RX_RING_SZ, IXGBE_ALIGN, > > socket_id); > > ... > > rxq->rx_ring = (union ixgbe_adv_rx_desc *) rz->addr; > > > > What about here? > > > > len = nb_desc; > > if (adapter->rx_bulk_alloc_allowed) > > len += RTE_PMD_IXGBE_RX_MAX_BURST; > > > > rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring", > > sizeof(struct > > ixgbe_rx_entry) * len, > > RTE_CACHE_LINE_SIZE, > > socket_id); > > > > This is later walked and reset in ixgbe_reset_rx_queue() > > > > if (adapter->rx_bulk_alloc_allowed) > > /* zero out extra memory */ > > len += RTE_PMD_IXGBE_RX_MAX_BURST; > > > > /* > > * Zero out HW ring memory. Zero out extra memory at the end > > of > > * the H/W ring so look-ahead logic in Rx Burst bulk alloc > > function > > * reads extra memory as zeros. > > */ > > for (i = 0; i < len; i++) { > > rxq->rx_ring[i] = zeroed_desc; > > } > > > > /* > > * initialize extra software ring entries. Space for these > > extra > > * entries is always allocated > > */ > > memset(&rxq->fake_mbuf, 0x0, sizeof(rxq->fake_mbuf)); > > for (i = rxq->nb_rx_desc; i < len; ++i) { > > rxq->sw_ring[i].mbuf = &rxq->fake_mbuf; > > } > > > > As I can see, right now well behaving applications (one that always call > queue_setup() after dev_configure) wouldn't be affected. > But you right - it is a potential issue and better be addressed. > Though to fix it all we need is either always allocate space for fake_mbufs > inside sw_ring, or just store number of fake mbufs inside rxq. > > > Clearly, rx_bulk_alloc_allowed must remain consistent here. > > I don't think so, see above. > > > > > > > > > > > > > > > You can't change rx_bulk_alloc_allowed once it has been set and you have > allocated queues. > > > In fact, you can't really let some later queue change this setting > > > after the first queue decides what It should be. > > > > See above. > > > > Again, I have pointed out where this is a problem. > > > > Again, this is based on your assumption (wrong one) that it is ok not to call > queue_setup() after dev_configure(). > > > > > > > > > > > > > > > > This breaks: > > > > > > > > rte_eth_dev_configure(..., 1, 1, ...); > > > > rx_queue_setup(1) > > > > [rx_queue_setup decides that it can't support > > > > rx_bulk_alloc_allowed] .. > > > > > > > > Later, you want to add some more queues, you call > > > > > > > > eth_eth_dev_configure(..., 2, 2, ...); > > > > > > After you call dev_configure, you'll have to do queue_setup() for all your > queues. > > > dev_configure() can changes some global device settings, so each > > > queue has to be reconfigured. > > > In your example It should be: > > > eth_eth_dev_configure(..., 2, 2, ...); rx_queue_setup(...,0, ...); > > > rx_queue_setup(...,1, ...); > > > > > > Konstantin > > > > > > > > > Nothing in the API says this must happen. If there where true, > > > shouldn't rte_eth_dev_configure() automatically drop any existing queues? > > > > One of the fists things that ixgbe_dev_rx_queue_setup() - calls > ixgbe_rx_queue_release(). > > In theory that probably could be moved to rte_eth_dev_configure(), but I > think current model is ok too. > > > > >There is existing code that doesn't do this. > > > > Then I think it is a bug in that code. > > > > > Show me > > > in the API where I am required to setup all my queues _again_ after > > > calling rte_eth_dev_configure() > > > > If you think the documentation doesn't explain that properly - feel > > free to raise a bug agains doc and/or provide a patch for it. > > As I explained above - dev_configure() might change some global > > settings (max_rx_pkt, flow-director settings, etc.) that would affect each > queue in that device. > > So queue_setup() is necessary after dev_configure() to make sure it will > operate properly. > > > > If you think the documentation is unclear, then *you* feel free to > > open a bug to clarify what you believe is the correct behavior. Yes, > rte_eth_dev_configure() is free to change globals. > > Any of the API routines are free to change globals. It is up to your > > PMD to deal with the fallout from changing such globals. > > It is, but for that application have to follow the defined procedure. > Current procedure is: dev_configure(); queue_setup(); dev_start(). > You can't skip second step. > Most drivers allocate and fill their private queue structures here. > Some of the values inside these structures are calculated based on dev_conf > parameters. > Obviously if device configuration might been changed - you need to > reconfigure your queues too. > > > Bonding and vmxnet3 are just two examples of this. > > If some PMDs for some cases can work without queue reconfiguration - great. > But it doesn't mean that all PMDs would. > Most Intel PMDs (ixgbe, i40e, fm10k) do expect that behavior. > From my code readings - mlx and qede too. > Probably some other PMDs - I didn't check all of them. > > > The "final" queue "setup" isn't done until .dev_start(). At that point, the > queues are updated. > > In theory, it is probably possible to do things in a way you suggest: > make queue_setup() just to store input queue parameters and move all the > allocation/configuration job into dev_start. > Though it would require quite a lot of changes in many PMDs and might > complicate things. > Personally I don't see big advantage of such change, while the required effort > would be substantial. > Especially if we going to move to the model with a separate rx/tx function per > queue anyway. > > > Again, there are existing examples in the existing code base that do > > not follow your idea of what the API should do. Are they wrong as well? > > I think so. > As I said it might work for some particular PMD, but if it is a generic app, > then it > sounds like a bug to me. > What particular app we are talking about? > > > The preponderance of the evidence seems to be against your idea of > >what the API should be doing. > > > > Queue setup is an expensive operation. If some PMD doesn't need to > > drop and setup queues again after an rte_eth_dev_configure() why are you > forcing it to do so? > > I don't think queue_setup() is that expensive to worry about the price. > We are in control path here. > But It is up to PMD what to put inside that routine. > If PMD is smart enough to detect that nothing really changed from last > invocation of queue_setup() and no extra work required - then it can just > return success straightway. > Again, nothing prevents user to make some stub layer on top of current > rte_ethdev API to minimize number of eth_queue_setup() invocations. > > > It was the choice > > of the author in the ixgbe driver to use globals the way they did. > >The current code is still > > somewhat wrong even after the patch I propose. The configuration of > >each queue depends on what the previous queues did. Since there isn't > >a burst per queue routine, there needs to be a single decision made > >after _all_ of the queues are configured. That is the only time when you > have complete knowledge to make your decisions about which RX mode to use. > > > > > > Konstantin > > > > > > > > > > > > rx_queue_setup(2) > > > > [rx_queue_setup hopefully makes the same choice as rxqid = 1?] > > > > ... > > > > > > > > Is one supposed to release all queues before calling > > > > rte_eth_dev_configure()? If that is true, it seems like the > > > > change_mtu examples I see are possibly wrong. As suggested in > kenrel_nic_interface.rst: > > > > > > > > > > > > ret = rte_eth_dev_configure(port_id, 1, 1, &conf); > > > > if (ret < 0) { > > > > RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", > > > > port_id); > > > > return ret; > > > > } > > > > > > > > /* Restart specific port */ > > > > > > > > ret = rte_eth_dev_start(port_id); > > > > if (ret < 0) { > > > > RTE_LOG(ERR, APP, "Fail to restart port %d\n", > > > > port_id); > > > > return ret; > > > > } > > > > > > > > This is will obviously reset the rx_bulk_alloc_allowed and not > > > > reallocated > the RX queues. > > > > > > > > > > > > > adapter->rx_bulk_alloc_allowed = true; > > > > > adapter->rx_vec_allowed = true; > > > > > > > > > > +out: > > > > > return 0; > > > > > } > > > > > > > > > > @@ -4959,6 +4966,7 @@ ixgbevf_dev_configure(struct rte_eth_dev > > > > >*dev) > > > > > struct rte_eth_conf *conf = &dev->data->dev_conf; > > > > > struct ixgbe_adapter *adapter = > > > > > (struct ixgbe_adapter > > > > >*)dev->data->dev_private; > > > > > + uint16_t i; > > > > > > > > > > PMD_INIT_LOG(DEBUG, "Configured Virtual Function port id: > > > > >%d", > > > > > dev->data->port_id); @@ -4981,11 +4989,17 @@ > > > > >ixgbevf_dev_configure(struct rte_eth_dev *dev) > > > > > > > > > > /* > > > > > * Initialize to TRUE. If any of Rx queues doesn't meet > > > > >the bulk > > > > > - * allocation or vector Rx preconditions we will reset it. > > > > > + * allocation or vector Rx preconditions we will reset it. > > > > > +We > > > > > + * can only do this is there aren't any existing RX queues. > > > > > */ > > > > > + for (i = 0; i < dev->data->nb_rx_queues; i++) { > > > > > + if (dev->data->rx_queues[i]) > > > > > + goto out; > > > > > + } > > > > > adapter->rx_bulk_alloc_allowed = true; > > > > > adapter->rx_vec_allowed = true; > > > > > > > > > > +out: > > > > > return 0; > > > > > } > > > > > > > > > > -- > > > > > 2.9.5
I'd reject this patch for now, according to the discussion above. Feel free to work our new versions with some agreement. Sorry and thanks! /Helin