Infra woes solved so finally merged!

Florin

> On Mar 17, 2021, at 6:13 AM, Ivan Shvedunov <ivan...@gmail.com> wrote:
> 
> I see CI keeps failing for my patch despite rebase... I have run the tests 
> that are failing locally with my patch applied (test_map, test_flowprobe, 
> test_memif) and they do pass there. So indeed must be something related to 
> the test environment
> 
> On Tue, Mar 16, 2021 at 7:02 PM Florin Coras <fcoras.li...@gmail.com 
> <mailto:fcoras.li...@gmail.com>> wrote:
> Hi Ivan, 
> 
> 
>> On Mar 16, 2021, at 8:51 AM, Ivan Shvedunov <ivan...@gmail.com 
>> <mailto:ivan...@gmail.com>> wrote:
>> 
>> Thanks for clearing things up :)
> 
> Thanks for bringing this up ;-)
> 
>> Concerning our use case: it's about PFCP protocol [1] that's used in the 5G 
>> / LTE CUPS setting. High throughput is usually not important there as that's 
>> a control protocol but sometimes it happens to produce rather large UDP 
>> packets. We use PFCP in our VPP-based project [2], it's implemented on top 
>> of the VPP session layer there.
> 
> Got it and cool!
> 
>> BTW we've been hitting the hard-coded IP reassembly limit (3 fragments) with 
>> it too, mind if I post a patch that makes it configurable?
> 
> CC’ed Klement for his opinion on this. 
> 
> Cheers, 
> Florin
> 
>> 
>> [1] https://en.wikipedia.org/wiki/PFCP <https://en.wikipedia.org/wiki/PFCP>
>> [2] https://github.com/travelping/upg-vpp 
>> <https://github.com/travelping/upg-vpp>
>> 
>> 
>> On Tue, Mar 16, 2021 at 6:19 PM Florin Coras <fcoras.li...@gmail.com 
>> <mailto:fcoras.li...@gmail.com>> wrote:
>> Hi Ivan,
>> 
>> 
>>> On Mar 16, 2021, at 6:40 AM, Ivan Shvedunov <ivan...@gmail.com 
>>> <mailto:ivan...@gmail.com>> wrote:
>>> 
>>> Hi Florin,
>>> 
>>> (inline)
>>> 
>>> On Mon, Mar 15, 2021 at 10:27 PM Florin Coras <fcoras.li...@gmail.com 
>>> <mailto:fcoras.li...@gmail.com>> wrote:
>>>> On Mar 15, 2021, at 9:08 AM, Ivan Shvedunov <ivan...@gmail.com 
>>>> <mailto:ivan...@gmail.com>> wrote:
>>>> 
>>>>   Hello,
>>>> 
>>>>   I've noticed a problem in the VPP session layer and posted a patch to 
>>>> fix it [1].
>>>> Namely, UDP datagrams with length > 1908 that require buffer chaining get 
>>>> corrupted, because there's a bug in session_tx_fifo_chain_tail() that's 
>>>> responsible for the buffer chaining (it doesn't account for 
>>>> SESSION_CONN_HDR_LEN). The bug only affects datagram transports and 
>>>> doesn't break TCP.
>>> 
>>> Looks good. Thanks! More lower.
>>> 
>>> There seems to be an issue with CI, but I'm not sure if it's related to the 
>>> patch in any way.
>> 
>> Most probably not. Rebased. 
>> 
>>>  
>>> 
>>>> 
>>>>   There's another problem with the UDP code: "udp mtu". As far as I 
>>>> understand, the plan there [2] is to use the MTU of the output interface 
>>>> for each datagram being sent, and "udp { mtu ... }" setting is only used 
>>>> as a temporary measure. Yet, it seems to me that what is happening when a 
>>>> datagram exceeds that MTU value is not exactly correct: instead of 
>>>> undergoing IP fragmentation as one would expect, the datagrams get split 
>>>> into multiple UDP datagrams. This is not handled correctly by apps that 
>>>> use UDP, most of the time, and did cause me some hours spent debugging 
>>>> strange app behavior. Wouldn't failing to send such datagrams be more 
>>>> correct?
>>> 
>>> The thing to be aware of here is that session layer offers as service the 
>>> sending of datagrams written by apps to the network. If the datagrams are 
>>> larger than the mtu (mss of sorts actually) configured for udp, session 
>>> layer chops the datagrams to mtu size. The network layer can then fragment 
>>> the resulting udp datagrams in accordance to the output interface’s mtu. 
>>> 
>>> It’s not recommended to use this feature because original datagram borders 
>>> are lost. Moreover, losses will lead to even more problems. The option is 
>>> there for historical reasons so we could consider removing it at one point, 
>>> once we’re fully convinced it’s not useful. 
>>> 
>>> Problem is, the UDP fragmentation feature is on by default, with default 
>>> size limit being 1500 [1]. So, if one ever needs to send UDP datagrams 
>>> bigger than that they will be split, which may be rather surprising if one 
>>> is unaware of this UDP splitting feature. Maybe it's worth changing the 
>>> default value to some large number so that the splitting does not happen 
>>> unexpectedly?
>> 
>> The goal there is to avoid exercising the ip fragmentation code for the 
>> reasons you discovered lower, i.e., default max of 3 fragments. Splitting 
>> udp datagrams will always lead to some sort of problems because of loss so 
>> for now the value used is 1.5k just because it’s the common interface mtu. 
>> Those interested in using larger datagrams (8-9kB) will hopefully find this 
>> or older threads instead of going through the pain you had to go through!
>> 
>>> 
>>> 
>>>>   Another thing I've noticed is that if UDP MTU is high enough sometimes 
>>>> the datagrams being sent still somehow are split into smaller IP fragments 
>>>> than necessary. E.g. I have MTU 9000 on my interfaces and UDP MTU also is 
>>>> 9000, and 8000-byte UDP datagram is sent in one piece, while the IP 
>>>> packets carrying 11000-byte UDP datagram are split into ~2kb IP fragments. 
>>>> Any ideas why this could happen?
>>> 
>>> Hm, have you tried running an iperf3 test like the one part of make test 
>>> [1] to see if the issue is still reproducible? Also, I’d avoid sending from 
>>> the app dgrams larger than udp’s mtu. 
>>> 
>>> In this case the datagrams were smaller than the "udp mtu" value, but 
>>> larger than the interface's MTU.
>>> Looking at ip_frag code [2], it seems like it just can't create fragments 
>>> bigger than the buffer size (2048 bytes by default), that is, it can't 
>>> chain buffers for the fragments it creates, unless I'm somehow mistaken. 
>>> With IP reassembly limit e.g. in VPP itself being just 3 this may be indeed 
>>> an issue. E.g. one has MTU of 9000 and expects a packet of length 10000 to 
>>> be split in 2 and not 5 fragments; these 5 fragments will not be 
>>> reassembled if received by another VPP's session layer further along the 
>>> line. But perhaps one is expected to increase buffer size when using higher 
>>> interface MTU values?
>> 
>> I’ve never tried that but fair enough. Klement might be able to shed some 
>> light here. 
>> 
>> Out of curiosity, are you trying to send that large datagrams to improve 
>> efficiency/throughput? Going above 8kB, in a sustained fashion and without 
>> some form of pacing will probably lead to udp/session layer overwhelming the 
>> nic, unless you’re using 100Gbps nics. 
>> 
>> Regards, 
>> Florin
>> 
>>> 
>>> [1] https://github.com/FDio/vpp/blob/0ac5782/src/vnet/udp/udp.c#L522 
>>> <https://github.com/FDio/vpp/blob/0ac5782/src/vnet/udp/udp.c#L522>
>>> [2] 
>>> https://github.com/FDio/vpp/blob/0ac5782e600097b66e6b06e0b9edc79651f3a4bd/src/vnet/ip/ip_frag.c#L107-L109
>>>  
>>> <https://github.com/FDio/vpp/blob/0ac5782e600097b66e6b06e0b9edc79651f3a4bd/src/vnet/ip/ip_frag.c#L107-L109>
>>> 
>>> -- 
>>> Ivan Shvedunov <ivan...@gmail.com <mailto:ivan...@gmail.com>>
>>> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 2807
>> 
>> 
>> 
>> -- 
>> Ivan Shvedunov <ivan...@gmail.com <mailto:ivan...@gmail.com>>
>> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 2807
> 
> 
> 
> -- 
> Ivan Shvedunov <ivan...@gmail.com <mailto:ivan...@gmail.com>>
> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 2807

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18955): https://lists.fd.io/g/vpp-dev/message/18955
Mute This Topic: https://lists.fd.io/mt/81353121/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to