> -----Original Message-----
> From: Olivier Matz <olivier.m...@6wind.com>
> Sent: Friday, December 27, 2019 1:54 PM
> To: Andrew Rybchenko <arybche...@solarflare.com>
> Cc: Yigit, Ferruh <ferruh.yi...@intel.com>; Shahaf Shuler
> <shah...@mellanox.com>; dev@dpdk.org; Ananyev, Konstantin
> <konstantin.anan...@intel.com>; Thomas Monjalon <tho...@monjalon.net>;
> Richardson, Bruce <bruce.richard...@intel.com>; Matan
> Azrad <ma...@mellanox.com>; Jerin Jacob Kollanukkaran <jer...@marvell.com>
> Subject: Re: [dpdk-dev] questions about new offload ethdev api
>
> Hi,
>
> Few comments below.
>
> On Mon, Dec 16, 2019 at 11:39:05AM +0300, Andrew Rybchenko wrote:
> > On 12/10/19 9:07 PM, Ferruh Yigit wrote:
> > > On 1/23/2018 2:34 PM, Shahaf Shuler wrote:
> > >> Tuesday, January 23, 2018 3:53 PM, Olivier Matz:
> > >
> > > <...>
> > >
> > >>>
> > >>> 2/ meaning of rxmode.jumbo_frame, rxmode.enable_scatter,
> > >>> rxmode.max_rx_pkt_len
> > >>>
> > >>> While it's not related to the new API, it is probably a good opportunity
> > >>> to clarify the meaning of these flags. I'm not able to find a good
> > >>> documentation about them.
> > >>>
> > >>> Here is my understanding, the configuration only depends on: - the
> > >>> maximum
> > >>> rx frame length - the amount of data available in a mbuf (minus
> > >>> headroom)
> > >>>
> > >>> Flags to set in rxmode (example):
> > >>> +---------------+----------------+----------------+-----------------+ |
> > >>> |mbuf_data_len=1K|mbuf_data_len=2K|mbuf_data_len=16K|
> > >>> +---------------+----------------+----------------+-----------------+
> > >>> |max_rx_len=1500|enable_scatter | | |
> > >>> +---------------+----------------+----------------+-----------------+
> > >>> |max_rx_len=9000|enable_scatter, |enable_scatter, |jumbo_frame | |
> > >>> |jumbo_frame |jumbo_frame | |
> > >>> +---------------+----------------+----------------+-----------------+
>
> Due to successive quotes, the table was not readable in my mail client,
> here it is again (narrower):
>
> +------------+---------------+---------------+---------------+
> | |mbuf_data_len= |mbuf_data_len= |mbuf_data_len= |
> | |1K |2K |16K |
> +------------+---------------+---------------+---------------+
> |max_rx_len= |enable_scatter | | |
> |1500 | | | |
> +------------+---------------+---------------+---------------+
> |max_rx_len= |enable_scatter,|enable_scatter,|jumbo_frame |
> |9000 |jumbo_frame |jumbo_frame | |
> +------------+---------------+---------------+---------------+
>
> > >>> If this table is correct, the flag jumbo_frame would be equivalent to
> > >>> check
> > >>> if max_rx_pkt_len is above a threshold.
> > >>>
> > >>> And enable_scatter could be deduced from the mbuf size of the given rxq
> > >>> (which is a bit harder but maybe doable).
> > >>
> > >> I glad you raised this subject. We had a lot of discussion on it
> > >> internally
> > >> in Mellanox.
> > >>
> > >> I fully agree. All application needs is to specify the maximum packet
> > >> size it
> > >> wants to receive.
> > >>
> > >> I think also the lack of documentation is causing PMDs to use those flags
> > >> wrongly. For example - some PMDs set the jumbo_frame flag internally
> > >> without
> > >> it being set by the application.
> > >>
> > >> I would like to add one more item : MTU. What is the relation (if any)
> > >> between setting MTU and the max_rx_len ? I know MTU stands for Max
> > >> Transmit
> > >> Unit, however at least in Linux it is the same for the Send and the
> > >> receive.
> > >>
> > >>
> > >
> > > (Resurrecting the thread after two years, I will reply again with latest
> > > understanding.)
> > >
> > > Thanks Olivier for above summary and table, and unfortunately usage still
> > > not
> > > consistent between PMDs. According my understanding:
> > >
> > > 'max_rx_pkt_len' is user configuration value, to limit the size packet
> > > that is
> > > shared with host, but this doesn't limit the size of packet that NIC
> > > receives.
>
> When you say the size of packet shared with the host, do you mean for
> instance that the NIC will receive a 1500B packet and will only write
> 128 bytes of data in the mbuf?
>
> If yes, this was not my understanding. I suppose it could be used for
> monitoring. What should be the value for rx offload infos like checksum
> or packet type if the packet (or the header) is truncated?
>
> > Also comment in lib/librte_ethdev/rte_ethdev.h says that the
> > rxmode field is used if (and I think only if) JUMBO_FRAME is
> > enabled. So, if user wants to set it on device configure stage,
> > device *must* support JUMBO_FRAME offload which mean that
> > driver code handles rxmode.max_rx_pkt_len and either accept it
> > and configures HW appropriately or return an error if specified
> > value is wrong. Basically it is written in jumbo frame feature
> > definition in features.rst. User has max_rx_pktlen in dev_info
> > to find out maximum supported value for max_rx_pkt_len.
> >
> > > Like if the mbuf size of the mempool used by a queue is 1024 bytes, we
> > > don't
> > > want packets bigger than buffer size, but if NIC supports it is possible
> > > receive
> > > 6000 bytes packet and split data into multiple buffers, and we can use
> > > multi
> > > segment packets to represent it.
> > > So what we need is NIC ability to limit the size of data to share to host
> > > and
> > > scattered Rx support (device + driver).
> >
> > It requires RX_SCATTER offload enabled and it must be
> > controlled by the user only (not PMD) since it basically
> > mean if the application is ready to handle multi-segment
> > packets (have code which takes a look at the number of
> > segments and next pointers etc). Moreover, application
> > may disable MULTI_SEG Tx offload (and drivers may ignore
> > number of segments and next pointer as well).
>
> Agree, I think it is important that the application can control the
> enabling of rx scatter, either by a flag, or simply by passing
> max_rx_len <= mbuf_data_len.
>
> > > But MTU limits the size of the packet that NIC receives.
> >
> > Personally I've never treated it this way. For me the only
> > difference between max_rx_pkt_len and MTU is:
> > - max_rx_pkt_len is entire packet with all L2 headers and
> > even FCS (basically everything which could be provided
> > to application in mbuf)
> > - MTU does not cover L2 (and VLANs, I'm not sure about MPLS)
> >
> > > Assuming above are correct J,
> > >
> > > Using mbuf data size as 'max_rx_pkt_len' without asking from user is an
> > > option,
> > > but perhaps user has different reason to limit packet size, so I think
> > > better to
> > > keep as different config option.
> > >
> > > I think PMD itself enabling "jumbo frame" offload is not too bad, and
> > > acceptable, since providing a large MTU already implies it.
> >
> > Yes
>
> +1
>
> > > But not sure about PMD enabling scattered Rx, application may want to
> > > force to
> > > receive single segment mbufs, for that case PMD enabling this config on
> > > its own
> > > looks like a problem.
> >
> > Yes
>
> +1
>
> > > But user really needs this when a packet doesn't fit to the mbuf, so
> > > providing a
> > > MTU larger than 'max_rx_pkt_len' _may_ imply enabling scattered Rx, I
> > > assume
> > > this is the logic in some PMDs which looks acceptable.
> >
> > I don't think so. IMO auto enabling Rx scatter from PMD is a
> > breakage of a contract between application and driver.
> > As stated above the application may be simply not ready to
> > handle multi-segment packets correctly.
> >
> > I think that providing an MTU larger than 'max_rx_pkt_len' is simply a
> > change of max_rx_pkt_len = (MTU plus space for L2+).
>
> As VLAN(s) are not taken in account in MTU, it means that if MTU is
> 1500, max Ethernet len is 1500 + 14 (eth hdr) + 4 (vlan) + 4 (2nd vlan /
> qinq) + 4 (crc) = 1522.
>
> Shouldn't we only use a L2 lengths instead of MTU? I don't know what
> is usually expected by different hardware (mtu or l2 len).
>
> > > And PMD behavior should be according for mentioned configs:
> > >
> > > 1) Don't change user provided 'max_rx_pkt_len' value
> > I have no strong opinion. However, it is important to clarify
> > which understanding of max_rx_pkt_len vs MTU is the right one.
> >
> > > 2) If jumbo frame is not enabled, don't limit the size of packets to the
> > > host (I
> > > think this is based on assumption that mbuf size always will be > 1514)
> >
> > I think that JUMBO_FRAME is not relevant here. It is just a
> > promise to take a look at max_rx_pkt_len on configure or
> > start stage.
> >
> > > 3) When user request to set the MTU bigger than ETH_MAX, PMD enable jumbo
> > > frame
> > > support (if it is not enabled by user already and supported by HW). If HW
> > > doesn't support if of course it should fail.
> >
> > I'm not sure which ETH_MAX is mentioned above.
> > #define ETH_MAX_MTU 0xFFFFU /* 65535, same as IP_MAX_MTU */
> > or do you mean
> > #define ETH_FRAME_LEN 1514 /* Max. octets in frame sans FCS */
> > or even
> > #define ETH_DATA_LEN 1500 /* Max. octets in payload */
> >
> > We should be careful when we talk about Ethernet lengths and
> > MTU.
> >
> > > 4) When user request to set MTU bigger than 'max_rx_pkt_len'
> >
> > I think the second parameter to consider here is not
> > max_rx_pkt_len, but amount of space for data in single
> > mbuf (for all Rx queues).
> >
> > > 4a) if "scattered Rx" is enabled, configure the MTU and limit packet size
> > > to
> > > host to 'max_rx_pkt_len'
> >
> > Yes and no. IMO configure the MTU and bump max_rx_pkt_len.
> >
> > > 4b) if "scattered Rx" is not enabled but HW supports it, enable
> > > "scattered Rx"
> > > by PMD, configure the MTU and limit packet size to host to
> > > 'max_rx_pkt_len'
> >
> > No, I think it is wrong to enable Rx scatter from PMD.
> >
> > > 4c) if "scattered Rx" is not enabled and not supported by HW, fail MTU
> > > set.
> >
> > Yes, regardless support in HW.
> >
> > > 4d) if HW doesn't support to limit the packet size to host, but requested
> > > MTU
> > > bigger than 'max_rx_pkt_len' it should fail.
> >
> > I would rephrase it as impossibility to disable Rx scatter.
> > If so, it must be driver responsibility to drop scattered
> > packets if Rx scatter offload is not enabled.
> >
> > > Btw, I am aware of that some PMDs have a larger MTU by default and can't
> > > limit
> > > the packet size to host to 'max_rx_pkt_len' value, I don't know what to
> > > do in
> > > that case, fail in configure? Or at least be sure configured mempool's
> > > mbuf size
> > > is big enough?
> >
> > See above.
> >
> > Thanks for reminder about the topic.
>
> I have the impression that what we want can be done with these 3
> values:
>
> - max_rx_pkt_size: maximum size of received packet, larger ones are dropped
> - max_rx_data_size: maximum size of data copied in a mbuf chain, larger
> packets
> are truncated
Do we really need 'max_rx_data_size'?
Can't it just always be equal to max_rx_pkt_size (as we have it right now)?
> - max_rx_seg_size: maximum size written in a segment (this can be retrieved
> from pool private info = rte_pktmbuf_data_room_size() -
> RTE_PKTMBUF_HEADROOM)
>
> I think the first 2 values should be L2 lengths, including CRC if
> CRC-receive is enabled.
>
> if max_rx_data_size <= max_rx_seg_size, scatter is disabled
> if max_rx_data_size < max_rx_pkt_size, packets can be truncated
>
> In case a PMD is not able to limit a packet size, it can be advertised
> by a capability, and it would be up to the application to do it by sw.
>
>
> Olivier