> > On 10/20/2020 10:07 AM, Ananyev, Konstantin wrote: > > > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> testpmd will initialize default max packet length to 1518 which > >>>>>>>>>>>>>> doesn't include vlan tag size in ether overheader. Once, send > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>> max mtu length packet with vlan tag, the max packet length will > >>>>>>>>>>>>>> exceed 1518 that will cause packets dropped directly from NIC > >>>>>>>>>>>>>> hw > >>>>>>>>>> side. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ice can support dual vlan tags that need more 8 bytes for max > >>>>>>>>>>>>>> packet size, so, configures the correct max packet size in > >>>>>>>>>>>>>> dev_config > >>>>>>>>>>> ops. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Fixes: 50cc9d2a6e9d ("net/ice: fix max frame size") > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Signed-off-by: SteveX Yang <stevex.y...@intel.com> > >>>>>>>>>>>>>> --- > >>>>>>>>>>>>>> drivers/net/ice/ice_ethdev.c | 11 +++++++++++ > >>>>>>>>>>>>>> 1 file changed, 11 insertions(+) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> diff --git a/drivers/net/ice/ice_ethdev.c > >>>>>>>>>>>>>> b/drivers/net/ice/ice_ethdev.c index > >>>>>>>>>>>>>> cfd357b05..6b7098444 100644 > >>>>>>>>>>>>>> --- a/drivers/net/ice/ice_ethdev.c > >>>>>>>>>>>>>> +++ b/drivers/net/ice/ice_ethdev.c > >>>>>>>>>>>>>> @@ -3146,6 +3146,7 @@ ice_dev_configure(struct rte_eth_dev > >>>>>>>> *dev) > >>>>>>>>>>>>>> struct ice_adapter *ad = > >>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); > >>>>>>>>>>>>>> struct ice_pf *pf = > >>>>>>>>>>>>>> ICE_DEV_PRIVATE_TO_PF(dev->data->dev_private); > >>>>>>>>>>>>>> +uint32_t frame_size = dev->data->mtu + ICE_ETH_OVERHEAD; > >>>>>>>>>>>>>> int ret; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> /* Initialize to TRUE. If any of Rx queues doesn't meet > >>>>>>>>>>>>>> the @@ > >>>>>>>>>>>>>> -3157,6 > >>>>>>>>>>>>>> +3158,16 @@ ice_dev_configure(struct rte_eth_dev *dev) > >>>>>>>>>>>>>> if (dev->data->dev_conf.rxmode.mq_mode & > >>>>>>>> ETH_MQ_RX_RSS_FLAG) > >>>>>>>>>>>>>> dev->data->dev_conf.rxmode.offloads |= > >>>>>>>>>>> DEV_RX_OFFLOAD_RSS_HASH; > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> +/** > >>>>>>>>>>>>>> + * Considering QinQ packet, max frame size should be equal or > >>>>>>>>>>>>>> + * larger than total size of MTU and Ether overhead. > >>>>>>>>>>>>>> + */ > >>>>>>>>>>>>> > >>>>>>>>>>>>>> +if (frame_size > dev->data->dev_conf.rxmode.max_rx_pkt_len) { > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Why we need this check? > >>>>>>>>>>>>> Can we just call ice_mtu_set directly > >>>>>>>>>>>> > >>>>>>>>>>>> I think that without that check we can silently overwrite > >>>>>>>>>>>> provided > >>>>>>>>>>>> by user dev_conf.rxmode.max_rx_pkt_len value. > >>>>>>>>>>> > >>>>>>>>>>> OK, I see > >>>>>>>>>>> > >>>>>>>>>>> But still have one question > >>>>>>>>>>> dev->data->mtu is initialized to 1518 as default , but if > >>>>>>>>>>> dev->data->application set > >>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len = 1000 in dev_configure. > >>>>>>>>>>> does that mean we will still will set mtu to 1518, is this > >>>>>>>>>>> expected? > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> max_rx_pkt_len should be larger than mtu at least, so we should > >>>>>>>>>> raise > >>>>>>>>>> the max_rx_pkt_len (e.g.:1518) to hold expected mtu value (e.g.: > >>>>>>>>>> 1500). > >>>>>>>>> > >>>>>>>>> Ok, this describe the problem more general and better to replace > >>>>>>>>> exist > >>>>>>>> code comment and commit log for easy understanding. > >>>>>>>>> Please send a new version for reword > >>>>>>>>> > >>>>>>>> > >>>>>>>> I didn't really get this set. > >>>>>>>> > >>>>>>>> Application explicitly sets 'max_rx_pkt_len' to '1518', and a frame > >>>>>>>> bigger than > >>>>>>>> this size is dropped. > >>>>>>> > >>>>>>> Sure, it is normal case for dropping oversize data. > >>>>>>> > >>>>>>>> Isn't this what should be, why we are trying to overwrite user > >>>>>>>> configuration > >>>>>>>> in PMD to prevent this? > >>>>>>>> > >>>>>>> > >>>>>>> But it is a confliction that application/user sets mtu & > >>>>>>> max_rx_pkt_len at the same time. > >>>>>>> This fix will make a decision when confliction occurred. > >>>>>>> MTU value will come from user operation (e.g.: port config mtu 0 > >>>>>>> 1500) directly, > >>>>>>> so, the max_rx_pkt_len will resize itself to adapt expected MTU value > >>>>>>> if its size is smaller than MTU + Ether overhead. > >>>>>>> > >>>>>>>> During eth_dev allocation, mtu set to default '1500', by ethdev > >>>>>>>> layer. > >>>>>>>> And testpmd sets 'max_rx_pkt_len' by default to '1518'. > >>>>>>>> I think Qi's concern above is valid, what is user set > >>>>>>>> 'max_rx_pkt_len' to '1000' > >>>>>>>> and mean it? PMD will not honor the user config. > >>>>>>> > >>>>>>> I'm not sure when set 'mtu' to '1500' and 'max_rx_pkt_len' to '1000', > >>>>>>> what's the behavior expected? > >>>>>>> If still keep the 'max_rx_pkt_len' value, that means the larger 'mtu' > >>>>>>> will be invalid. > >>>>>>> > >>>>>>>> > >>>>>>>> Why not simply increase the default 'max_rx_pkt_len' in testpmd? > >>>>>>>> > >>>>>>> The default 'max_rx_pkt_len' has been initialized to generical value > >>>>>>> (1518) and default 'mtu' is '1500' in testpmd, > >>>>>>> But it isn't suitable to those NIC drivers which Ether overhead is > >>>>>>> larger than 18. (e.g.: ice, i40e) if 'mtu' value is preferable. > >>>>>>> > >>>>>>>> And I guess even better what we need is to tell to the application > >>>>>>>> what the > >>>>>>>> frame overhead PMD accepts. > >>>>>>>> So the application can set proper 'max_rx_pkt_len' value per port > >>>>>>>> for a > >>>>>>>> given/requested MTU value. > >>>>>>>> @Ian, cc'ed, was complaining almost same thing years ago, these PMD > >>>>>>>> overhead macros and 'max_mtu'/'min_mtu' added because of that, > >>>>>>>> perhaps > >>>>>>>> he has a solution now? > >>>>>> > >>>>>> From my perspective the main problem here: > >>>>>> We have 2 different variables for nearly the same thing: > >>>>>> rte_eth_dev_data.mtu and rte_eth_dev_data.dev_conf.max_rx_pkt_len. > >>>>>> and 2 different API to update them: dev_mtu_set() and dev_configure(). > >>>>> > >>>>> According API 'max_rx_pkt_len' is 'Only used if JUMBO_FRAME enabled' > >>>>> Although not sure that is practically what is done for all drivers. > >>>> > >>>> I think most of Intel PMDs use it unconditionally. > >>>> > >>>>> > >>>>>> And inside majority of Intel PMDs we don't keep these 2 variables in > >>>>>> sync: > >>>>>> - mtu_set() will update both variables. > >>>>>> - dev_configure() will update only max_rx_pkt_len, but will keep mtu > >>>>>> intact. > >>>>>> > >>>>>> This patch fixes this inconsistency, which I think is a good thing. > >>>>>> Though yes, it introduces change in behaviour. > >>>>>> > >>>>>> Let say the code: > >>>>>> rte_eth_dev_set_mtu(port, 1500); > >>>>>> dev_conf.max_rx_pkt_len = 1000; > >>>>>> rte_eth_dev_configure(port, 1, 1, &dev_conf); > >>>>>> > >>>>> > >>>>> 'rte_eth_dev_configure()' is one of the first APIs called, it is called > >>>>> before > >>>>> 'rte_eth_dev_set_mtu(). > >>>> > >>>> Usually yes. > >>>> But you can still do sometimes later: dev_mtu_set(); ...; dev_stop(); > >>>> dev_configure(); dev_start(); > >>>> > >>>>> > >>>>> When 'rte_eth_dev_configure()' is called, MTU is set to '1500' by > >>>>> default by > >>>>> ethdev layer, so it is not user configuration, but 'max_rx_pkt_len' is. > >>>> > >>>> See above. > >>>> PMD doesn't know where this MTU value came from (default ethdev value or > >>>> user specified value) > >>>> and probably it shouldn't care. > >>>> > >>>>> > >>>>> And later, when 'rte_eth_dev_set_mtu()' is called, but MTU and > >>>>> 'max_rx_pkt_len' > >>>>> are updated (mostly). > >>>> > >>>> Yes, in mtu_set() we update both. > >>>> But we don't update MTU in dev_configure(), only max_rx_pkt_len. > >>>> That what this patch tries to fix (as I understand it). > >>> > >>> To be more precise - it doesn't change MTU value in dev_configure(), > >>> but instead doesn't allow max_rx_pkt_len to become smaller > >>> then MTU + OVERHEAD. > >>> Probably changing MTU value instead is a better choice. > >>> > >> > >> +1 to change mtu for this case. > >> And this is what happens in practice when there is no > >> 'rte_eth_dev_set_mtu()' > >> call, since PMD is using ('max_rx_pkt_len' - OVERHEAD) to set MTU. > > > > Hmm, I don't see that happens within Intel PMDs. > > As I can read the code: if user never call mtu_set(), then MTU value is > > left intact. > > > > I was checking ice, > in 'ice_dev_start()', 'rxmode.max_rx_pkt_len' is used to configure the device.
Yes, I am not arguing with that. What I am saying - dev_config() doesn't update MTU based on max_rx_pkt_len. While it probably should. > > >> But this won't solve the problem Steve is trying to solve. > > > > You mean we still need to update test-pmd code to calculate max_rx_pkt_len > > properly for default mtu value? > > > > Yes. > Because target of this set is able to receive packets with payload size > 'RTE_ETHER_MTU', if MTU is updated according to the provided 'max_rx_pkt_len', > device still won't able to receive those packets. Agree. > > >>>>> > >>>>> > >>>>>> Before the patch will result: > >>>>>> mtu==1500, max_rx_pkt_len=1000; //out of sync looks wrong to me > >>>>>> > >>>>>> After the patch: > >>>>>> mtu=1500, max_rx_ptk_len=1518; // in sync, change in behaviour. > >>>>>> > >>>>>> If you think we need to preserve current behaviour, > >>>>>> then I suppose the easiest thing would be to change dev_config() code > >>>>>> to update mtu value based on max_rx_pkt_len. > >>>>>> I.E: dev_configure {...; mtu_set(max_rx_pkt_len - OVERHEAD); ...} > >>>>>> So the code snippet above will result: > >>>>>> mtu=982,max_rx_pkt_len=1000; > >>>>>> > >>>>> > >>>>> The 'max_rx_ptk_len' is annoyance for a long time, what do you think to > >>>>> just > >>>>> drop it? > >>>>> > >>>>> By default device will be up with default MTU (1500), later > >>>>> 'rte_eth_dev_set_mtu' can be used to set the MTU, no frame size setting > >>>>> at all. > >>>>> > >>>>> Will this work? > >>>> > >>>> I think it might, but that's a big change, probably too risky at that > >>>> stage... > >>>> > >> > >> Defintely, I was thinking for 21.11. Let me send a deprecation notice and > >> see > >> what happens. > >> > >>>> > >>>>> > >>>>> > >>>>> And for short term, for above Intel PMDs, there must be a place this > >>>>> 'max_rx_pkt_len' value taken into account (mostly 'start()' dev_ops), > >>>>> that > >>>>> function can be updated to take 'max_rx_pkt_len' only if JUMBO_FRAME > >>>>> set, > >>>>> otherwise use the 'MTU' value. > >>>> > >>>> Even if we'll use max_rx_pkt_len only when if JUMBO_FRAME is set, > >>>> I think we still need to keep max_rx_pkt_len and MTU values in sync. > >>>> > >>>>> > >>>>> Without 'start()' updated the current logic won't work after stop & > >>>>> start anyway. > >>>>> > >>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>> And why this same thing can't happen to other PMDs? If this is a > >>>>>>>> problem for > >>>>>>>> all PMDs, we should solve in other level, not for only some PMDs. > >>>>>>>> > >>>>>>> No, all PMDs exist the same issue, another proposal: > >>>>>>> - rte_ethdev provides the unique resize 'max_rx_pkt_len' in > >>>>>>> rte_eth_dev_configure(); > >>>>>>> - provide the uniform API for fetching the NIC's supported Ether > >>>>>>> Overhead size; > >>>>>>> Is it feasible? > >>>>>>> > >>>>>>>>> > >>>>>>>>>> Generally, the mtu value can be adjustable from user (e.g.: ip link > >>>>>>>>>> set ens801f0 mtu 1400), hence, we just adjust the max_rx_pkt_len to > >>>>>>>>>> satisfy mtu requirement. > >>>>>>>>>> > >>>>>>>>>>> Should we just call ice_mtu_set(dev, > >>>>>>>>>>> dev_conf.rxmode.max_rx_pkt_len) > >>>>>>>>>>> here? > >>>>>>>>>> ice_mtu_set(dev, mtu) will append ether overhead to > >>>>>>>>>> frame_size/max_rx_pkt_len, so we need pass the mtu value as the 2nd > >>>>>>>>>> parameter, or not the max_rx_pkt_len. > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> And please remove above comment, since ether overhead is already > >>>>>>>>>>>> considered in ice_mtu_set. > >>>>>>>>>> Ether overhead is already considered in ice_mtu_set, but it also > >>>>>>>>>> should be considered as the adjustment condition that if > >>>>>>>>>> ice_mtu_set > >>>>>>>> need be invoked. > >>>>>>>>>> So, it perhaps should remain this comment before this if() > >>>>>>>>>> condition. > >>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> +ret = ice_mtu_set(dev, dev->data->mtu); if (ret != 0) return > >>>>>>>>>>>>>> +ret; } > >>>>>>>>>>>>>> + > >>>>>>>>>>>>>> ret = ice_init_rss(pf); > >>>>>>>>>>>>>> if (ret) { > >>>>>>>>>>>>>> PMD_DRV_LOG(ERR, "Failed to enable rss for PF"); > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> 2.17.1 > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>> > >>> > >