> -----Original Message----- > From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > Sent: Tuesday, June 09, 2015 4:08 PM > To: Ananyev, Konstantin; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh > > > > On 09/06/15 12:18, Ananyev, Konstantin wrote: > > > > > >> -----Original Message----- > >> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > >> Sent: Wednesday, June 03, 2015 6:47 PM > >> To: Ananyev, Konstantin; dev at dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh > >> > >> > >> > >> On 02/06/15 18:35, Ananyev, Konstantin wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Zoltan Kiss [mailto:zoltan.kiss at linaro.org] > >>>> Sent: Tuesday, June 02, 2015 4:08 PM > >>>> To: Ananyev, Konstantin; dev at dpdk.org > >>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh > >>>> > >>>> > >>>> > >>>> On 02/06/15 14:31, Ananyev, Konstantin wrote: > >>>>> Hi Zoltan, > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Zoltan Kiss > >>>>>> Sent: Monday, June 01, 2015 5:16 PM > >>>>>> To: dev at dpdk.org > >>>>>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: fix checking for tx_free_thresh > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Anyone would like to review this patch? Venky sent a NAK, but I've > >>>>>> explained to him why it is a bug. > >>>>> > >>>>> > >>>>> Well, I think Venky is right here. > >>>> I think the comments above rte_eth_tx_burst() definition are quite clear > >>>> about what tx_free_thresh means, e1000 and i40e use it that way, but not > >>>> ixgbe. > >>>> > >>>>> Indeed that fix, will cause more often unsuccessful checks for DD bits > >>>>> and might cause a > >>>>> slowdown for TX fast-path. > >>>> Not if the applications set tx_free_thresh according to the definition > >>>> of this value. But we can change the default value from 32 to something > >>>> higher, e.g I'm using nb_desc/2, and it works out well. > >>> > >>> Sure we can, as I said below, we can unify it one way or another. > >>> One way would be to make fast-path TX to free TXDs when number of > >>> occupied TXDs raises above tx_free_thresh > >>> (what rte_ethdev.h comments say and what full-featured TX is doing). > >>> Though in that case we have to change default value for tx_free_thresh, > >>> and all existing apps that > >>> using tx_free_thresh==32 and fast-path TX will probably experience a > >>> slowdown. > >> > >> They are in trouble already, because i40e and e1000 uses it as defined. > > > > In fact, i40e has exactly the same problem as ixgbe: > > fast-path and full-featured TX code treat tx_free_thresh in a different > > way. > > igb just ignores input tx_free_thresh, while em has only full featured path. > > > > What I am saying, existing app that uses TX fast-path and sets > > tx_free_thresh=32 > > (as we did in our examples in previous versions) will experience a slowdown, > > if we'll make all TX functions to behave like full-featured ones > > (txq->nb_tx_desc - txq->nb_tx_free > txq->tx_free_thresh). > > > > From other side, if app uses TX full-featured TX and sets > > tx_free_thresh=32, > > then it already has a possible slowdown, because of too often TXDs > > checking. > > So, if we'll change tx_free_thresh semantics to wht fast-path uses, > > It shouldn't see any slowdown, in fact it might see some improvement. > > > >> But I guess most apps are going with 0, which sets the drivers default. > >> Others have to change the value to nb_txd - curr_value to have the same > >> behaviour > >> > >>> Another way would be to make all TX functions to treat > >>> tx_conf->tx_free_thresh as fast-path TX functions do > >>> (free TXDs when number of free TXDs drops below tx_free_thresh) and > >>> update rte_ethdev.h comments. > >> And i40e and e1000e code as well. I don't see what difference it makes > >> which way of definition you use, what I care is that it should be used > >> consistently. > > > > Yes, both ways are possible, the concern is - how to minimise the impact > > for existing apps. > > That's why I am leaning to the fast-path way. > > Make sense to favour the fast-path way, I'll look into that and try to > come up with a patch > > > > >>> > >>> Though, I am not sure that it really worth all these changes. > >>> From one side, whatever tx_free_thresh would be, > >>> the app should still assume that the worst case might happen, > >>> and up to nb_tx_desc mbufs can be consumed by the queue. > >>> From other side, I think the default value should work well for most > >>> cases. > >>> So I am still for graceful deprecation of that config parameter, see > >>> below. > >>> > >>>> > >>>>> Anyway, with current PMD implementation, you can't guarantee that at > >>>>> any moment > >>>>> TX queue wouldn't use more than tx_free_thresh mbufs. > >>>> > >>>> > >>>>> There could be situations (low speed, or link is down for some short > >>>>> period, etc), when > >>>>> much more than tx_free_thresh TXDs are in use and none of them could be > >>>>> freed by HW right now. > >>>>> So your app better be prepared, that up to (nb_tx_desc * > >>>>> num_of_TX_queues) could be in use > >>>>> by TX path at any given moment. > >>>>> > >>>>> Though yes, there is an inconsistency how different ixgbe TX functions > >>>>> treat tx_conf->tx_free_thresh parameter. > >>>>> That probably creates wrong expectations and confusion. > >>>> Yes, ixgbe_xmit_pkts() use it the way it's defined, this two function > >>>> doesn't. > >>>> > >>>>> We might try to unify it's usage one way or another, but I personally > >>>>> don't see much point in it. > >>>>> After all, tx_free_tresh seems like a driver internal choice (based on > >>>>> the nb_tx_desc and other parameters). > >>>>> So I think a better way would be: > >>>>> 1. Deprecate tx_conf->tx_free_thresh (and remove it in later releases) > >>>>> and make > >>>>> each driver to use what it thinks would be the best value. > >>>> But how does the driver knows what's the best for the applications > >>>> traffic pattern? I think it's better to leave the possibility for the > >>>> app to fine tune it. > >>> > >>> My understanding is that for most cases the default value should do > >>> pretty well. > >>> That default value, shouldn't be too small, so we avoid unnecessary & > >>> unsuccessful checks, > >>> and probably shouldn't be too big, to prevent unnecessary mbufs > >>> consumption > >>> (something between nb_tx_desc / 2 and 3 * nb_tx_desc / 4 probably). > >> I agree > >> > >>> > >>> But might be you have a good example, when such tuning is needed? > >>> For what traffic patterns you would set tx_free_thresh to some different > >>> values, > >>> and how will it impact performance? > >> I don't have an actual example, but I think it's worth to keep this > >> tuning option if we already have it. Most people probably wouldn't use > >> it, but I can imagine that the very enthusiastic wants to try out > >> different settings to find the best. > >> E.g. I was testing odp_l2fwd when I came across the problem, and I found > >> it useful to have this option. With its traffic pattern (receive a batch > >> of packets then send them out on an another interface) it can happen > >> that with different clock speeds you can find different optimums. > > > > Actually, after thinking about it a bit more - > > I think it would more depend on how many RX/TX queues particular lcore has > > to manage. > > So, as you said, yes we probably better leave it, at least for now. > > > >> > >>> > >>> Again, if there would be tx_free_pkts(), why someone would also need a > >>> tx_conf->tx_free_thresh? > >> I think about tx_free_pkts as a rainy day option, when you want ALL TX > >> completed packets to be released, because you are out of buffers. > > > > What do you mean as 'rainy day' here? > > App getting short of mbufs? > > As I said before, it could be absolutely valid situation when > > up to nb_tx_desc mbufs per TX queue can be in use. > > So the app better be prepared for such situation anyway. > > Either make sure there are enough mbufs in the pool, > > or somehow gracefully degrade the service. > > Again, if nb_tx_desc is big (let say for ixgbe it could be up to 4K), > > freeing all TXDs at once could take a long time. > > So I think it should be possible to specify maximum number of mbufs to free > > per call. > > > > My thought people will use it to release mbufs when there is an idle period. > > Something like: > > > > n = rx_burst(...); > > if (n == 0) { ...; tx_free_mbufs(...); ... } > > else {...} > > Yes, this is similar to what I'm doing in odp-dpdk: > > https://git.linaro.org/lng/odp-dpdk.git/commitdiff/1a8df254e18bb50dbd835729bc3d01fcb87ebc6b > > The only problem is when the tx_free_threshold is not reached, the > interfaces doesn't even start releasing the mbufs. And when you can't > receive anything, it's likely you won't send out anything as well, which > would normally trigger the releasing of the completed buffers.
Yep, that's why I think it is good to add this function: If you are going to TX something, then you'll probably hit tx_free_thresh and PMD will release unused mbufs 'automatically'. If you have nothing to TX (idle period), you can try to release unused mbufs manually: by calling tx_free_mbufs(). Konstantin > > > > > Konstantin > > > >> While > >> tx_free_thresh is the fast path way of TX completion, when you have the > >> room to wait for more packets to be gathered. > >> > >>> > >>> Konstantin > >>> > >>>> In the meantime we can improve the default selection as well, as I > >>>> suggested above. > >>>> > >>>>> 2. As you suggested in another mail, introduce an new function: > >>>>> uint16_t rte_eth_tx_free_pkts(port_id, queue_id, nb_to_free). > >>>>> That would give upper layer a better control of memory usage, and might > >>>>> be called by the upper layer at idle time, > >>>>> so further tx_burst, don't need to spend time on freeing TXDs/packets. > >>>> I agree. > >>>> > >>>>> > >>>>> Konstantin > >>>>> > >>>>> > >>>>>> > >>>>>> Regards, > >>>>>> > >>>>>> Zoltan > >>>>>> > >>>>>> On 27/05/15 21:12, Zoltan Kiss wrote: > >>>>>>> This check doesn't do what's required by rte_eth_tx_burst: > >>>>>>> "When the number of previously sent packets reached the "minimum > >>>>>>> transmit > >>>>>>> packets to free" threshold" > >>>>>>> > >>>>>>> This can cause problems when txq->tx_free_thresh + [number of > >>>>>>> elements in the > >>>>>>> pool] < txq->nb_tx_desc. > >>>>>>> > >>>>>>> Signed-off-by: Zoltan Kiss <zoltan.kiss at linaro.org> > >>>>>>> --- > >>>>>>> drivers/net/ixgbe/ixgbe_rxtx.c | 4 ++-- > >>>>>>> drivers/net/ixgbe/ixgbe_rxtx_vec.c | 2 +- > >>>>>>> 2 files changed, 3 insertions(+), 3 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>>> b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>>> index 4f9ab22..b70ed8c 100644 > >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>>>>> @@ -250,10 +250,10 @@ tx_xmit_pkts(void *tx_queue, struct rte_mbuf > >>>>>>> **tx_pkts, > >>>>>>> > >>>>>>> /* > >>>>>>> * Begin scanning the H/W ring for done descriptors when the > >>>>>>> - * number of available descriptors drops below tx_free_thresh. > >>>>>>> For > >>>>>>> + * number of in flight descriptors reaches tx_free_thresh. For > >>>>>>> * each done descriptor, free the associated buffer. > >>>>>>> */ > >>>>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) > >>>>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) > >>>>>>> ixgbe_tx_free_bufs(txq); > >>>>>>> > >>>>>>> /* Only use descriptors that are available */ > >>>>>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>>>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>>>>> index abd10f6..f91c698 100644 > >>>>>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>>>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>>>>> @@ -598,7 +598,7 @@ ixgbe_xmit_pkts_vec(void *tx_queue, struct > >>>>>>> rte_mbuf **tx_pkts, > >>>>>>> if (unlikely(nb_pkts > RTE_IXGBE_VPMD_TX_BURST)) > >>>>>>> nb_pkts = RTE_IXGBE_VPMD_TX_BURST; > >>>>>>> > >>>>>>> - if (txq->nb_tx_free < txq->tx_free_thresh) > >>>>>>> + if ((txq->nb_tx_desc - txq->nb_tx_free) > txq->tx_free_thresh) > >>>>>>> ixgbe_tx_free_bufs(txq); > >>>>>>> > >>>>>>> nb_commit = nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, > >>>>>>> nb_pkts); > >>>>>>>