Re: [vpp-dev] UDP data corruption and MTU issues

Ivan Shvedunov Wed, 17 Mar 2021 06:13:56 -0700

I see CI keeps failing for my patch despite rebase... I have run the tests
that are failing locally with my patch applied (test_map, test_flowprobe,
test_memif) and they do pass there. So indeed must be something related to
the test environment


On Tue, Mar 16, 2021 at 7:02 PM Florin Coras <[email protected]> wrote:

> Hi Ivan,
>
>
> On Mar 16, 2021, at 8:51 AM, Ivan Shvedunov <[email protected]> wrote:
>
> Thanks for clearing things up :)
>
>
> Thanks for bringing this up ;-)
>
> Concerning our use case: it's about PFCP protocol [1] that's used in the
> 5G / LTE CUPS setting. High throughput is usually not important there as
> that's a control protocol but sometimes it happens to produce rather large
> UDP packets. We use PFCP in our VPP-based project [2], it's implemented on
> top of the VPP session layer there.
>
>
> Got it and cool!
>
> BTW we've been hitting the hard-coded IP reassembly limit (3 fragments)
> with it too, mind if I post a patch that makes it configurable?
>
>
> CC’ed Klement for his opinion on this.
>
> Cheers,
> Florin
>
>
> [1] https://en.wikipedia.org/wiki/PFCP
> [2] https://github.com/travelping/upg-vpp
>
>
> On Tue, Mar 16, 2021 at 6:19 PM Florin Coras <[email protected]>
> wrote:
>
>> Hi Ivan,
>>
>>
>> On Mar 16, 2021, at 6:40 AM, Ivan Shvedunov <[email protected]> wrote:
>>
>> Hi Florin,
>>
>> (inline)
>>
>> On Mon, Mar 15, 2021 at 10:27 PM Florin Coras <[email protected]>
>> wrote:
>>
>>> On Mar 15, 2021, at 9:08 AM, Ivan Shvedunov <[email protected]> wrote:
>>>
>>>   Hello,
>>>
>>>   I've noticed a problem in the VPP session layer and posted a patch to
>>> fix it [1].
>>> Namely, UDP datagrams with length > 1908 that require buffer chaining
>>> get corrupted, because there's a bug in session_tx_fifo_chain_tail() that's
>>> responsible for the buffer chaining (it doesn't account for
>>> SESSION_CONN_HDR_LEN). The bug only affects datagram transports and doesn't
>>> break TCP.
>>>
>>>
>>> Looks good. Thanks! More lower.
>>>
>>
>> There seems to be an issue with CI, but I'm not sure if it's related to
>> the patch in any way.
>>
>>
>> Most probably not. Rebased.
>>
>>
>>
>>>
>>>
>>>   There's another problem with the UDP code: "udp mtu". As far as I
>>> understand, the plan there [2] is to use the MTU of the output interface
>>> for each datagram being sent, and "udp { mtu ... }" setting is only used as
>>> a temporary measure. Yet, it seems to me that what is happening when a
>>> datagram exceeds that MTU value is not exactly correct: instead of
>>> undergoing IP fragmentation as one would expect, the datagrams get split
>>> into multiple UDP datagrams. This is not handled correctly by apps that use
>>> UDP, most of the time, and did cause me some hours spent debugging strange
>>> app behavior. Wouldn't failing to send such datagrams be more correct?
>>>
>>>
>>> The thing to be aware of here is that session layer offers as service
>>> the sending of datagrams written by apps to the network. If the datagrams
>>> are larger than the mtu (mss of sorts actually) configured for udp, session
>>> layer chops the datagrams to mtu size. The network layer can then fragment
>>> the resulting udp datagrams in accordance to the output interface’s mtu.
>>>
>>
>>> It’s not recommended to use this feature because original datagram
>>> borders are lost. Moreover, losses will lead to even more problems. The
>>> option is there for historical reasons so we could consider removing it at
>>> one point, once we’re fully convinced it’s not useful.
>>>
>>
>> Problem is, the UDP fragmentation feature is on by default, with default
>> size limit being 1500 [1]. So, if one ever needs to send UDP datagrams
>> bigger than that they will be split, which may be rather surprising if one
>> is unaware of this UDP splitting feature. Maybe it's worth changing the
>> default value to some large number so that the splitting does not happen
>> unexpectedly?
>>
>>
>> The goal there is to avoid exercising the ip fragmentation code for the
>> reasons you discovered lower, i.e., default max of 3 fragments. Splitting
>> udp datagrams will always lead to some sort of problems because of loss so
>> for now the value used is 1.5k just because it’s the common interface mtu.
>> Those interested in using larger datagrams (8-9kB) will hopefully find this
>> or older threads instead of going through the pain you had to go through!
>>
>>
>>
>>>   Another thing I've noticed is that if UDP MTU is high enough sometimes
>>> the datagrams being sent still somehow are split into smaller IP fragments
>>> than necessary. E.g. I have MTU 9000 on my interfaces and UDP MTU also is
>>> 9000, and 8000-byte UDP datagram is sent in one piece, while the IP packets
>>> carrying 11000-byte UDP datagram are split into ~2kb IP fragments. Any
>>> ideas why this could happen?
>>>
>>>
>>> Hm, have you tried running an iperf3 test like the one part of make test
>>> [1] to see if the issue is still reproducible? Also, I’d avoid sending from
>>> the app dgrams larger than udp’s mtu.
>>>
>>
>> In this case the datagrams were smaller than the "udp mtu" value, but
>> larger than the interface's MTU.
>> Looking at ip_frag code [2], it seems like it just can't create fragments
>> bigger than the buffer size (2048 bytes by default), that is, it can't
>> chain buffers for the fragments it creates, unless I'm somehow mistaken.
>> With IP reassembly limit e.g. in VPP itself being just 3 this may be indeed
>> an issue. E.g. one has MTU of 9000 and expects a packet of length 10000 to
>> be split in 2 and not 5 fragments; these 5 fragments will not be
>> reassembled if received by another VPP's session layer further along the
>> line. But perhaps one is expected to increase buffer size when using higher
>> interface MTU values?
>>
>>
>> I’ve never tried that but fair enough. Klement might be able to shed some
>> light here.
>>
>> Out of curiosity, are you trying to send that large datagrams to improve
>> efficiency/throughput? Going above 8kB, in a sustained fashion and without
>> some form of pacing will probably lead to udp/session layer overwhelming
>> the nic, unless you’re using 100Gbps nics.
>>
>> Regards,
>> Florin
>>
>>
>> [1] https://github.com/FDio/vpp/blob/0ac5782/src/vnet/udp/udp.c#L522
>> [2]
>> https://github.com/FDio/vpp/blob/0ac5782e600097b66e6b06e0b9edc79651f3a4bd/src/vnet/ip/ip_frag.c#L107-L109
>>
>> --
>> Ivan Shvedunov <[email protected]>
>> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5
>> 2807
>>
>>
>>
>
> --
> Ivan Shvedunov <[email protected]>
> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5
> 2807
>
>
>

-- 
Ivan Shvedunov <[email protected]>
;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 2807

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18953): https://lists.fd.io/g/vpp-dev/message/18953
Mute This Topic: https://lists.fd.io/mt/81353121/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] UDP data corruption and MTU issues

Reply via email to