> Subject: RE: [PATCH v2] app/testpmd: use Tx preparation in txonly engine > > > > >>>>>>> TSO breaks when MSS spans more than 8 data fragments. Those > > > >>>>>>> packets will be dropped by Tx preparation API, but it will > > > >>> cause > > > >>>>>>> MDD event if txonly forwarding engine does not call the Tx > > > >>>>> preparation > > > >>>>>>> API before transmitting packets. > > > >>>>>>> > > > >>>>>> > > > >>>>>> txonly is used commonly, adding Tx prepare for a specific case > > > >>> may > > > >>>>>> impact performance for users. > > > >>>>>> > > > >>>>>> What happens when driver throws MDD (Malicious Driver > > Detection) > > > >>>>> event, > > > >>>>>> can't it be ignored? As you are already OK to drop the packet, > > > >>> can > > > >>>>>> device be configured to drop these packages? > > > >>>>>> > > > >>>>>> > > > >>>>>> Or as Jerin suggested adding a new forwarding engine is a > > > >>> solution, > > > >>>>> but > > > >>>>>> that will create code duplication, I prefer to not have it if > > > >>> this > > > >>>>> can > > > >>>>>> be handled in device level. > > > >>>>> > > > >>>>> Actually I am agree with the author of the patch - when TX > > offloads > > > >>>>> and/or multisegs are enabled, > > > >>>>> user supposed to invoke eth_tx_prepare(). > > > >>>>> Not doing that seems like a bug to me. > > > >>>> > > > >>>> I strongly disagree with that statement, Konstantin! > > > >>>> It is not documented anywhere that using TX offloads and/or > > multisegs > > > >>> requires calling rte_eth_tx_prepare() before > > > >>>> rte_eth_tx_burst(). And none of the examples do it. > > > >>> > > > >>> In fact, we do use it for test-pmd/csumonly.c. > > > >>> About other sample apps: > > > >>> AFAIK, not many of other DPDK apps do use L4 offloads. > > > >>> Right now special treatment (pseudo-header cksum calculation) is > > needed > > > >>> only for L4 offloads (CKSUM, SEG). > > > >>> So, majority of our apps who rely on other TX offloads (multi-seg, > > ipv4 > > > >>> cksum, vlan insertion) happily run without > > > >>> calling tx_prepare(), even though it is not the safest way. > > > >>> > > > >>>> > > > >>>> In my opinion: > > > >>>> If some driver has limitations for a feature, e.g. max 8 > > fragments, > > > >>> it should be documented for that driver, so the application > > > >>>> developer can make the appropriate decisions when designing the > > > >>> application. > > > >>>> Furthermore, we have APIs for the drivers to expose to the > > > >>> applications what the driver supports, so the application can > > configure > > > >>>> itself optimally at startup. Perhaps those APIs need to be > > expanded. > > > >>>> And if a feature limitation is common across the majority of > > drivers, > > > >>> that limitation should be mentioned in the documentation of the > > > >>>> feature itself. > > > >>> > > > >>> Many of such limitations *are* documented and in fact we do have > > an API > > > >>> to check max segments that each driver support, > > > >>> see struct rte_eth_desc_lim. > > > >> > > > >> Yes, this is the kind of API we should provide, so the application > > can configure itself appropriately. > > > >> > > > >>> The problem is: > > > >>> - none of our sample app does proper check on these values, so > > users > > > >>> don't have a good example how to do it. > > > >> > > > >> Agreed. > > > >> Adding an example showing how to do it properly would be the best > > solution. > > > >> Calling tx_prepare() in the examples is certainly not the solution. > > > >> > > > >>> - with current DPDK API not all of HW/PMD requirements could be > > > >>> extracted programmatically: > > > >>> let say majority of Intel PMDs for TCP offloads expect pseudo- > > header > > > >>> cksum to be pre-calculated by the SW. > > > >> > > > >> I hope this requirement is documented somewhere. > > > >> > > > >>> another example, some HW expects pkt_len to be bigger then some > > > >>> threshold value, otherwise HW hang may appear. > > > >> > > > >> I hope this requirement is also documented somewhere. > > > > > > > > No idea, I found it only in the code. > > > > > > IMHO Tx burst must check such limitations. If you made your HW simpler > > > (or just lost it on initial testing), pay in your drivers (or your > > > HW+driver will be unusable because of such problems). > > > > > > >> Generally, if the requirements cannot be extracted > > programmatically, they must be prominently documented, like this note to > > > >> rte_eth_rx_burst(): > > > > > > > > Obviously, more detailed documentation is always good, but... > > > > Right now we have 50+ different PMDs from different vendors. > > > > Even if each and every of them will carefully document all possible > > limitations and necessary preparation steps, > > > > how DPDK app developer supposed to deal with all that? > > > > Do you expect everyone, to read carefully through all of them, and > > handle all of them properly oh his own > > > > in each and every DPDK app he is going to write? > > > > That seems unrealistic. > > > > Again what to do with backward compatibility: when new driver (with > > new limitations) will arise > > > > *after* your app is already written and tested? > > > > > > +1 > > > > > > >> > > > >> * @note > > > >> * Some drivers using vector instructions require that *nb_pkts* > > is > > > >> * divisible by 4 or 8, depending on the driver implementation. > > > > > > I'm wondering what application should do if it needs to send just one > > > packet and do it now. IMHO, such limitations are not acceptable. > > This common limitation for vector drivers is for RX burst, not TX burst. > I agree such a limitation would be unacceptable for TX. > > > > > > > >> > > > >>> - As new HW and PMD keep appearing it is hard to predict what > > extra > > > >>> limitations/requirements will arise, > > > >>> that's why tx_prepare() was introduced as s driver op. > > > >>> > > > >>>> > > > >>>> We don't want to check in the fast path what can be checked at > > > >>> startup or build time! > > > >>> > > > >>> If your app supposed to work with just a few, known in advance, > > NIC > > > >>> models, then sure, you can do that. > > > >>> For apps that supposed to work 'in general' with any possible > > PMDs > > > >>> that DPDK supports - that might be a problem. > > > >>> That's why tx_prepare() was introduced and it is strongly > > recommended > > > >>> to use it by the apps that do use TX offloads. > > > >>> Probably tx_prepare() is not the best possible approach, but right > > now > > > >>> there are not many alternatives within DPDK. > > > >> > > > >> What exactly is an application supposed to do if tx_prepare() > > doesn't accept the full burst? It doesn't return information about > > > what is > > > >> wrong. > > > > > > > > It provides some information, but it is *very* limited: just index > > of 'bad' packet and error code. > > > > In theory app can try to handle it in a 'smart' way: let say if > > ENOTSUP is returned, then try to disable all HW offloads > > > > and do all in SW. But again, it is much better to do so *before* > > submitting packets for TX, so in practice everyone > > > > just drop such 'bad' packets. > > > > > > > > Dropping the packets might not be an option, e.g. for applications > > used in life support or tele-medicine. > > > > > > Critical applications should be able to do all Tx offloads in SW and > > > retry. Of course, various statistics and logs should help to improve > > the > > > application. > > > > > > > If the packet is 'bad', then it is much better to drop it, then TX > > corrupted packet or even hang NIC HW completely. > > > > > > IMHO Tx burst must drop packet which could hang NIC HW completely. I > > > realize that it is an extra checks and performance drop, but vendor > > > should pay in performance if HW is not good enough. > > > > > > > Though off-course it is much better to have an app that would check > > for limitations that can be checked by API provided > > > > and enable only supported offloads. > > > > > > Yes, that's why API to get limitation is much better than > > documentation. > > > > > > >> If limitations are documented, an application can use the lowest > > common denominator of the NICs it supports. And if the > > > application is > > > >> supposed to work in general, that becomes the lowest common > > denominator of all NICs. > > > > > > > > I agree: for limitations that can be extracted with generic API, > > like: > > > > number of segments per packet, supported TX offloads, mbuf fileds > > that must be provided for each TX offload, etc. - > > > > it is responsibility of well-written application to obey all of > > them. > > > > Yes, many tx_prepare() implementations do such checks anyway, but > > from my perspective it is sort of last-line of defense. > > > > For well written application that should just never happen. > > > > But there is one more important responsibility of tx_prepare() - it > > performs PMD specific packet modifications for requested > > > offloads. > > > > As I already mentioned for Intel NICs - it does pseudo-header cksum > > calcucation, for packets with size less then minimal, it can > > > > probably do padding (even if doesn't do it right now), for some > > other PMDs - might be something else, I didn't check. > > > > Obviously it saves app developer from a burden to do all these > > things on his own. > > > > > > > >> It looks like tx_prepare() has become a horrible workaround for > > undocumented limitations. > > > > > > I strongly disagree. Documentation is never a solution for a generic > > > application which is intended to work on any HW and IMHO it is the > > > goal to have more and more applications which work on any HW. > > > > > > >> Limitations due to hardware and/or software tradeoffs are > > unavoidable, so we have to live with them; but we should not accept > > > >> PMDs with undocumented limitations. > > > > > > > > As I already said, more documentation never hurts, but for that > > case, I think it is not enough. > > > > I expect PMD to provide a tx_prepare() implementation that would > > deal with such specific things. > > > > > > +1 > > > > Anyway, back to the original patch - I looked at it once again, and > > realized that the problem > > > > is just in the unsupported number of segments. > > > > As we discussed above - such limitations should be handled by well > > written app, > > > > but none of DPDK apps does it right now. > > > > So probably it is good opportunity to do things in a proper way and > > introduce such checks > > > > in testpmd ;) > > > > > > +1 since we already have fields in device information to report such > > > limitations, but it does not say that Tx prepare should be dropped. > > > Drivers which don't need Tx prepare keep it NULL and it returns > > > immediately from ethdev. Since it is done per-burst, it should not > > > affect performance a lot. > > > > > > > 100% agree. I think tx_prepare needs to stay, and in general has to be > > strongly recommended > > for apps that do use TX offloads/multi-segs. > > <irony> > Then tx_prepare should randomly reject packets and segments, to ensure the > application has sufficient SW fallback implemented. > </irony> > > No, seriously, considering the above arguments, I think PMD conformance > requirements would be a better solution. > If a PMD claims support for some feature, it must conform to some minimum > requirements for that feature. > > E.g. support for multi-seg TX must be able to handle min. 8 segments (or some > other reasonable number).
Hmm... why is that? I thought we had a consensus here that good behaving app should use existing API (rte_eth_desc_lim) to deduce min common denominator, and that we need some code example, or even better a lib function that would do such thing. > > Common limitations and preconditions, such as minimum RX burst size for > vector drivers and pseudo-header checksum precalculation > for TCP offload, should be accepted and prominently documented. I still don't understand why these preconditions can't be done by tx_prepare(). > Unusual HW limitations, such as inability to pad short Ethernet frames, > should be handled by the driver's TX function as workarounds > for unacceptable limitations (in DPDK API context) in the HW. With current design, rte_eth_tx_burst() is not allowed to modify mbufs whose refcnt > 1. That's one of the reason why tx_prepare() was introduced.