Hi Konstantin,
Please see my comments inline
Thanks.
Trevor
At 2023-09-20 16:04:41, "Konstantin Ananyev" <konstantin.v.anan...@yandex.ru>
wrote:
>
>Hi Trevor,
>
>
>>
>> At 2023-09-18 02:04:19, "Konstantin Ananyev"
>> <konstantin.v.anan...@yandex.ru> wrote:
>>>03/09/2023 05:01, Trevor Tao пишет:
>>>> Now the port Rx mq_mode had been set to RTE_ETH_MQ_RX_RSS, and offload
>>>> mode set to RTE_ETH_RX_OFFLOAD_CHECKSUM by default, but some hardware
>>>> and/or virtual interface does not support the RSS and offload mode
>>>> presupposed, e.g., some virtio interfaces in the cloud don't support
>>>> RSS and may only partly support RTE_ETH_RX_OFFLOAD_UDP_CKSUM/
>>>> RTE_ETH_RX_OFFLOAD_TCP_CKSUM,
>>>> but not RTE_ETH_RX_OFFLOAD_IPV4_CKSUM, and the error msg here:
>>>>
>>>> virtio_dev_configure(): RSS support requested but not supported by
>>>> the device
>>>> Port0 dev_configure = -95
>>>>
>>>> and:
>>>> Ethdev port_id=0 requested Rx offloads 0xe does not match Rx offloads
>>>> capabilities 0x201d in rte_eth_dev_configure()
>>>>
>>>> So to enable the l3fwd running in that environment, the Rx mode requirement
>>>> can be relaxed to reflect the hardware feature reality here, and the l3fwd
>>>> can run smoothly then.
>>>> A warning msg would be provided to user in case it happens here.
>>>>
>>>> On the other side, enabling the software cksum check in case the
>>>> hw support missing.
>>>>
>>>> Fixes: af75078fece3 ("first public release")
>>>> Cc: sta...@dpdk.org
>>>
>>>I don't think there was abug here.
>>>We are talking about changing current requirements for the app.
>>>So not sure it is a real fix and that such change can be
>>
>>>propagated to stable releases.
>> Trevor: I think it's not a bug fix but a feature enhancement, it would
>> enable l3fwd to work smoothly on the HW/virtual interfaces which don't
>> support RSS and/or cksum offloading.
>
>
>Yes. it seems like sort of an enhancement.
>While 'Fixes: ...' are for bugs only.
>AFAIK, only bug-fixes are take for backporting by stable releases.
>That's why there seems no point to add CC: sta...@dpdk.org
>
>Another generic things:
>- l3fwd doc and release notes probably need to be updated
Trevor>>I think it's ok to update the l3fwd doc and release notes, but I would
like to know which part of the doc/notes is approriate to add the enhancement
declaration.
>- as you areintroducing 2 distinct features: no-rss and no-ipv4-cksum
> it is probably better to split it into 2 different patches (in the
>same series).
Trevor>>I think it's ok to split it into 2 patches here in the same series, if
you would like to.
Thanks.
>
>>
>>
>>>
>>>>
>>>> Signed-off-by: Trevor Tao <taozj...@163.com>
>>>> ---
>>>> examples/l3fwd/l3fwd.h | 12 +++++++++++-
>>>> examples/l3fwd/main.c | 21 +++++++++++++++++++--
>>>> 2 files changed, 30 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
>>>> index b55855c932..cc10643c4b 100644
>>>> --- a/examples/l3fwd/l3fwd.h
>>>> +++ b/examples/l3fwd/l3fwd.h
>>>> @@ -115,6 +115,8 @@ extern struct acl_algorithms acl_alg[];
>>>>
>>>> extern uint32_t max_pkt_len;
>>>>
>>>> +extern struct rte_eth_conf port_conf;
>>>> +
>>>> /* Send burst of packets on an output interface */
>>>> static inline int
>>>> send_burst(struct lcore_conf *qconf, uint16_t n, uint16_t port)
>>>> @@ -170,7 +172,15 @@ is_valid_ipv4_pkt(struct rte_ipv4_hdr *pkt, uint32_t
>>>> link_len)
>>>> return -1;
>>>>
>>>> /* 2. The IP checksum must be correct. */
>>>> - /* this is checked in H/W */
>>>> + /* if this is not checked in H/W, check it. */
>>>> + if ((port_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_IPV4_CKSUM) == 0) {
>>>
>>>Might be better to check particular mbuf flag:
>>>if ((mbuf->ol_flags & RTE_MBUF_F_RX_IP_CKSUM_MASK) ==
>>
>>>TE_MBUF_F_RX_IP_CKSUM_UNKNOWN) {...}
>> Trevor: the utility function is_valid_ipv4_pkt is just against an IPv4 pkt,
>> and there's no mbuf information, and if needed, there would be an extra
>> ol_flags added here to check if it was already done by the ethernet device,
>> but look for a sample in:
>> https://github.com/DPDK/dpdk/blob/main/examples/l3fwd-power/main.c#L487
>> so I think it's ok to just use the port_conf here. If you still think it's
>> better to use m->ol_flags, please tell me.
>
>
>Yep, passing ol_flags, or mbuf itself seems like a proper way to do it.
>Aproach taken in l3fwd-power doesn't look right to me, see below.
>
>>>
>>>> + uint16_t actual_cksum, expected_cksum;
>>>> + actual_cksum = pkt->hdr_checksum;
>>>> + pkt->hdr_checksum = 0;
>>>> + expected_cksum = rte_ipv4_cksum(pkt);
>>>> + if (actual_cksum != expected_cksum)
>>>> + return -2;
>>>> + }
>>>>
>>>> /*
>>>> * 3. The IP version number must be 4. If the version number is not 4
>>>> diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
>>>> index 6063eb1399..37aec64718 100644
>>>> --- a/examples/l3fwd/main.c
>>>> +++ b/examples/l3fwd/main.c
>>>> @@ -117,7 +117,7 @@ static struct lcore_params * lcore_params =
>>>> lcore_params_array_default;
>>>> static uint16_t nb_lcore_params = sizeof(lcore_params_array_default) /
>>>> sizeof(lcore_params_array_default[0]);
>>>>
>>>> -static struct rte_eth_conf port_conf = {
>>>> +struct rte_eth_conf port_conf = {
>>>> .rxmode = {
>>>> .mq_mode = RTE_ETH_MQ_RX_RSS,
>>>> .offloads = RTE_ETH_RX_OFFLOAD_CHECKSUM,
>>>> @@ -1257,8 +1257,12 @@ l3fwd_poll_resource_setup(void)
>>>> local_port_conf.rx_adv_conf.rss_conf.rss_hf &=
>>>> dev_info.flow_type_rss_offloads;
>>>>
>>>> - if (dev_info.max_rx_queues == 1)
>>>> + /* relax the rx rss requirement */
>>>> + if (dev_info.max_rx_queues == 1 ||
>>>> !local_port_conf.rx_adv_conf.rss_conf.rss_hf) {
>>>> + printf("warning: modified the rx mq_mode to
>>>> RTE_ETH_MQ_RX_NONE base on"
>>>> + " device capability\n");
>>>> local_port_conf.rxmode.mq_mode = RTE_ETH_MQ_RX_NONE;
>>>
>>>Should we probably instead have a new commnad-line option to explicitly
>>>disable RSS?
>>
>>>Something like: '--no-rss' or so?
>> Trevor: the RSS capability for a certain port was got by the
>> rte_eth_dev_info_get() automatically, and we think the user should not care
>> about its status beforehand, but if it's missing, a warning notification for
>> the degrade here would be proposed to make it run smoothly.
>
>Personally, I still think it would be better the user will
>have an ability to disable it explicitly.
>Same as l3fwd does now with 'parse-ptype'.
>
>>>
>>>> + }
>>>>
>>>> if (local_port_conf.rx_adv_conf.rss_conf.rss_hf !=
>>>> port_conf.rx_adv_conf.rss_conf.rss_hf) {
>>>> @@ -1269,6 +1273,19 @@ l3fwd_poll_resource_setup(void)
>>>> local_port_conf.rx_adv_conf.rss_conf.rss_hf);
>>>> }
>>>>
>>>> + /* relax the rx offload requirement */
>>>> + if ((local_port_conf.rxmode.offloads &
>>>> dev_info.rx_offload_capa) !=
>>>> + local_port_conf.rxmode.offloads) {
>>>> + printf("Port %u requested Rx offloads 0x%"PRIx64" does
>>>> not"
>>>> + " match Rx offloads capabilities 0x%"PRIx64"\n",
>>>> + portid, local_port_conf.rxmode.offloads,
>>>> + dev_info.rx_offload_capa);
>>>> + local_port_conf.rxmode.offloads &=
>>>> dev_info.rx_offload_capa;
>>>> + port_conf.rxmode.offloads =
>>>> local_port_conf.rxmode.offloads;
>>>
>>>Why to remove offloads in port_conf?
>>>There could be multiple ports, and on others desired HW offloads might
>>
>>>be supported.
>> Trevor: Yes, there would be multiple ports, so if one of the ports lack HW
>> offload, it would be ok to just use the relaxed requirement here, like we
>> previously talked in
>> https://github.com/DPDK/dpdk/blob/main/examples/l3fwd-power/main.c#L487, if
>> you still think it's needed to use the per-port case, it would be ok to use
>> the ol_flags as talked previously.
>
>
>But then, depending on the ports order you can end-up with IP_CKSUM
>offload enabled on some ports (but not used), while completely disable
>on other ports - even if these ports do support IP_CKSUM.
>I think the better way would be not to touch port_conf here, and above
>use ol_flags to decide should we compute cksum in SW or not.
>
>
>>>
>>>> + printf("warning: modified the rx offload to 0x%"PRIx64"
>>>> based on device"
>>>> + " capability\n",
>>>> local_port_conf.rxmode.offloads);
>>>> + }
>>>> +
>>>> ret = rte_eth_dev_configure(portid, nb_rx_queue,
>>>> (uint16_t)n_tx_queue, &local_port_conf);
>>>> if (ret < 0)