Great, thanks! On Wed, Mar 17, 2021 at 6:03 PM Florin Coras <fcoras.li...@gmail.com> wrote:
> Infra woes solved so finally merged! > > Florin > > On Mar 17, 2021, at 6:13 AM, Ivan Shvedunov <ivan...@gmail.com> wrote: > > I see CI keeps failing for my patch despite rebase... I have run the tests > that are failing locally with my patch applied (test_map, test_flowprobe, > test_memif) and they do pass there. So indeed must be something related to > the test environment > > On Tue, Mar 16, 2021 at 7:02 PM Florin Coras <fcoras.li...@gmail.com> > wrote: > >> Hi Ivan, >> >> >> On Mar 16, 2021, at 8:51 AM, Ivan Shvedunov <ivan...@gmail.com> wrote: >> >> Thanks for clearing things up :) >> >> >> Thanks for bringing this up ;-) >> >> Concerning our use case: it's about PFCP protocol [1] that's used in the >> 5G / LTE CUPS setting. High throughput is usually not important there as >> that's a control protocol but sometimes it happens to produce rather large >> UDP packets. We use PFCP in our VPP-based project [2], it's implemented on >> top of the VPP session layer there. >> >> >> Got it and cool! >> >> BTW we've been hitting the hard-coded IP reassembly limit (3 fragments) >> with it too, mind if I post a patch that makes it configurable? >> >> >> CC’ed Klement for his opinion on this. >> >> Cheers, >> Florin >> >> >> [1] https://en.wikipedia.org/wiki/PFCP >> [2] https://github.com/travelping/upg-vpp >> >> >> On Tue, Mar 16, 2021 at 6:19 PM Florin Coras <fcoras.li...@gmail.com> >> wrote: >> >>> Hi Ivan, >>> >>> >>> On Mar 16, 2021, at 6:40 AM, Ivan Shvedunov <ivan...@gmail.com> wrote: >>> >>> Hi Florin, >>> >>> (inline) >>> >>> On Mon, Mar 15, 2021 at 10:27 PM Florin Coras <fcoras.li...@gmail.com> >>> wrote: >>> >>>> On Mar 15, 2021, at 9:08 AM, Ivan Shvedunov <ivan...@gmail.com> wrote: >>>> >>>> Hello, >>>> >>>> I've noticed a problem in the VPP session layer and posted a patch to >>>> fix it [1]. >>>> Namely, UDP datagrams with length > 1908 that require buffer chaining >>>> get corrupted, because there's a bug in session_tx_fifo_chain_tail() that's >>>> responsible for the buffer chaining (it doesn't account for >>>> SESSION_CONN_HDR_LEN). The bug only affects datagram transports and doesn't >>>> break TCP. >>>> >>>> >>>> Looks good. Thanks! More lower. >>>> >>> >>> There seems to be an issue with CI, but I'm not sure if it's related to >>> the patch in any way. >>> >>> >>> Most probably not. Rebased. >>> >>> >>> >>>> >>>> >>>> There's another problem with the UDP code: "udp mtu". As far as I >>>> understand, the plan there [2] is to use the MTU of the output interface >>>> for each datagram being sent, and "udp { mtu ... }" setting is only used as >>>> a temporary measure. Yet, it seems to me that what is happening when a >>>> datagram exceeds that MTU value is not exactly correct: instead of >>>> undergoing IP fragmentation as one would expect, the datagrams get split >>>> into multiple UDP datagrams. This is not handled correctly by apps that use >>>> UDP, most of the time, and did cause me some hours spent debugging strange >>>> app behavior. Wouldn't failing to send such datagrams be more correct? >>>> >>>> >>>> The thing to be aware of here is that session layer offers as service >>>> the sending of datagrams written by apps to the network. If the datagrams >>>> are larger than the mtu (mss of sorts actually) configured for udp, session >>>> layer chops the datagrams to mtu size. The network layer can then fragment >>>> the resulting udp datagrams in accordance to the output interface’s mtu. >>>> >>> >>>> It’s not recommended to use this feature because original datagram >>>> borders are lost. Moreover, losses will lead to even more problems. The >>>> option is there for historical reasons so we could consider removing it at >>>> one point, once we’re fully convinced it’s not useful. >>>> >>> >>> Problem is, the UDP fragmentation feature is on by default, with default >>> size limit being 1500 [1]. So, if one ever needs to send UDP datagrams >>> bigger than that they will be split, which may be rather surprising if one >>> is unaware of this UDP splitting feature. Maybe it's worth changing the >>> default value to some large number so that the splitting does not happen >>> unexpectedly? >>> >>> >>> The goal there is to avoid exercising the ip fragmentation code for the >>> reasons you discovered lower, i.e., default max of 3 fragments. Splitting >>> udp datagrams will always lead to some sort of problems because of loss so >>> for now the value used is 1.5k just because it’s the common interface mtu. >>> Those interested in using larger datagrams (8-9kB) will hopefully find this >>> or older threads instead of going through the pain you had to go through! >>> >>> >>> >>>> Another thing I've noticed is that if UDP MTU is high enough >>>> sometimes the datagrams being sent still somehow are split into smaller IP >>>> fragments than necessary. E.g. I have MTU 9000 on my interfaces and UDP MTU >>>> also is 9000, and 8000-byte UDP datagram is sent in one piece, while the IP >>>> packets carrying 11000-byte UDP datagram are split into ~2kb IP fragments. >>>> Any ideas why this could happen? >>>> >>>> >>>> Hm, have you tried running an iperf3 test like the one part of make >>>> test [1] to see if the issue is still reproducible? Also, I’d avoid sending >>>> from the app dgrams larger than udp’s mtu. >>>> >>> >>> In this case the datagrams were smaller than the "udp mtu" value, but >>> larger than the interface's MTU. >>> Looking at ip_frag code [2], it seems like it just can't create >>> fragments bigger than the buffer size (2048 bytes by default), that is, it >>> can't chain buffers for the fragments it creates, unless I'm somehow >>> mistaken. With IP reassembly limit e.g. in VPP itself being just 3 this may >>> be indeed an issue. E.g. one has MTU of 9000 and expects a packet of length >>> 10000 to be split in 2 and not 5 fragments; these 5 fragments will not be >>> reassembled if received by another VPP's session layer further along the >>> line. But perhaps one is expected to increase buffer size when using higher >>> interface MTU values? >>> >>> >>> I’ve never tried that but fair enough. Klement might be able to shed >>> some light here. >>> >>> Out of curiosity, are you trying to send that large datagrams to improve >>> efficiency/throughput? Going above 8kB, in a sustained fashion and without >>> some form of pacing will probably lead to udp/session layer overwhelming >>> the nic, unless you’re using 100Gbps nics. >>> >>> Regards, >>> Florin >>> >>> >>> [1] https://github.com/FDio/vpp/blob/0ac5782/src/vnet/udp/udp.c#L522 >>> [2] >>> https://github.com/FDio/vpp/blob/0ac5782e600097b66e6b06e0b9edc79651f3a4bd/src/vnet/ip/ip_frag.c#L107-L109 >>> >>> -- >>> Ivan Shvedunov <ivan...@gmail.com> >>> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9 F7D0 613E C0F8 0BC5 >>> 2807 >>> >>> >>> >> >> -- >> Ivan Shvedunov <ivan...@gmail.com> >> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9 F7D0 613E C0F8 0BC5 >> 2807 >> >> >> > > -- > Ivan Shvedunov <ivan...@gmail.com> > ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9 F7D0 613E C0F8 0BC5 > 2807 > > > -- Ivan Shvedunov <ivan...@gmail.com> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9 F7D0 613E C0F8 0BC5 2807
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18956): https://lists.fd.io/g/vpp-dev/message/18956 Mute This Topic: https://lists.fd.io/mt/81353121/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-