Hi,

> On 18 Mar 2022, at 12:18, otr...@employees.org wrote:
>
> Klement,
>
>>>>>>> Following up on this thread.
>>>>>>> The changes in 34877 led to some undesired behaviour in the "real 
>>>>>>> world(tm)".
>>>>>>> In the close pattern below it left sessions in established state, and 
>>>>>>> with a relatively low cps
>>>>>>> would consume the whole session table.
>>>>>>>
>>>>>>> The change here 
>>>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgerrit.fd.io%2Fr%2Fc%2Fvpp%2F%2B%2F35692&data=04%7C01%7Cmiklos.tirpak%40emnify.com%7Cb4c443e9201f4a9b3ed708da08da92bc%7Cf644ad61a00a4982bed140ea728f0209%7C0%7C0%7C637832032666562945%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=8Kr5p2z1kzEnnbpapGDRKYSXsYOggG%2FuElSo5KZ0r70%3D&reserved=0
>>>>>>>  nat: tweak rfc7857 tcp connection tracking
>>>>>>> proposes to move the needle somewhat more towards protecting the 
>>>>>>> session table.
>>>>>>> Views? Miklos, Klement?
>>>>>>>
>>>>>>> The RFC7857 state machine introduced in 56c492a is a trade-off.
>>>>>>> It tries to retain sessions as much as possible and also offers
>>>>>>> some protection against spurious RST by re-establishing sessions if data
>>>>>>> is received after the RST. From experience in the wild, this algorithm 
>>>>>>> is
>>>>>>> a little too liberal, as it leaves too many spurious established 
>>>>>>> sessions
>>>>>>> in the session table.
>>>>>>>
>>>>>>> E.g. a oberserved pattern is:
>>>>>>> client      server
>>>>>>>          <- FIN, ACK
>>>>>>> ACK      ->
>>>>>>> ACK      ->
>>>>>>> RST, ACK ->
>>>>>>
>>>>>> So why not just add a new state change where RST+half-closed moves to 
>>>>>> TRANS instead of throwing everything away?
>>>>>
>>>>> What do you mean by "throwing everything away"?
>>>>> Reset the state flags? Now it goes to transitory, and it will stay in 
>>>>> transitory as long as packets are flowing.
>>>>
>>>> Assuming the guys writing RFC gave it a thorough thought and that current 
>>>> state tracking is mostly done with RFC in mind, then changing it 
>>>> dramatically feels like it might not cover corner cases which we are 
>>>> currently not aware of. Feels like instead of doing one tweak let’s 
>>>> rewrite the whole thing approach.
>>>
>>> I wouldn't make too many assumptions about guys writing RFCs, given that 
>>> I'm one of them. ;-)
>>
>> Oh! When you put it like this … ;-)
>>
>>> There is a history here, and initially NATs were viewed as breaking the 
>>> Internet architecture, and if NATs should be specified at all, the 
>>> overriding concern was to make them as transparent to applications as 
>>> possible.
>>> Given the centralisation of the Internet and the level of packet 
>>> mangling/middleboxes we now have, combined with the run-out of IPv4 
>>> addresses, applications have been forced to adapt. I don't think you can 
>>> expect long-lived TCP sessions to survive at all anymore.
>>
>> Wouldn’t it be then easier to just have transitory timeout on for all 
>> sessions all the time? Yes, you would have to turn on (tcp) keepalives for 
>> your (ssh) sessions … And also Miklos might be a bit unhappy, but you would 
>> get a very very simple solution ….

I think there might be other relatively simple ways to solve the problem :-)
And yes, many devices in our case rarely send data and keep the TCP connections 
established for hours, or even longer with keepalives (at a very low rate) 
indeed. In the IoT field, the long-living TCP connection is a "feature". Power 
consumption and data cost matter, hence, many try to minimize the 
communication. The short timer would break this functionality.

We have also observed the same problem you mention above, with a slightly 
different packet flow.
When the device (NAT inside) looses the connectivity and the other side 
terminates the connection after a while:
   <- FIN, ACK
   <- FIN, ACK
   <- FIN, ACK
...
the half-closed session stays in established state. This can be a relatively 
frequent case in our deployment.

I think moving the session state to transitory when the first FIN is seen could 
be an easy solution.
I would also propose to make sure that the last_heard value is updated when 
there are re-transmissions so that the session is not deleted before the last 
re-transmitted FIN. (This would probably also solve the problem of routing any 
outstanding data should anybody have a concern with that. -- This is different 
scenario of course, one side terminates while the other one still has data to 
send. We have not seen any such behavior so far.)

Later on, the connectivity may be restored and the device could try to send 
data again. In this case, I think the server would respond with an RST if the 
session still existed. This might have been the intention of the RFC, to keep 
the session established to be able to route the RST:
PUSH, ACK ->
  <- RST

This could be also solved by VPP responding with an RST directly, at least on 
NAT inside. The result is a much quicker session re-establishment compared to 
just dropping the packets and letting the client time out.
What do you think? We could contribute to the RST.

Thanks,
Miklos

>
> I suppose by doing the 3-way handshake you have proven to me (the NAT) that 
> you are intending to communicate.
> And by doing that, I promise to be a little kinder to you than I do for a UDP 
> session.
> Would the world break if all sessions got a 2 minute timeout, probably not. 
> Most sessions are very short.
> Addresses as we have learnt with IPv6 are ephemeral. You need a session 
> layer, and run something like mosh if you want long lived ssh-like sessions.
>
> The current proposal was trying to find a compromise here.

Right. Btw. the RFC says you SHOULD honour the timers, but it doesn’t say you 
MUST honour them. Based on above talk about non-expectance of long-lived 
session support anyway, maybe even a very simple one-LRU rules them all (as in, 
whenever you need a new session, you simply reuse the session which saw traffic 
least recently. This way, under pressure, new sessions would terminate possibly 
lively old-ish sessions, but you never have to track anything and if not under 
pressure, even broken scenarios could work if the clients are able to cope with 
them.

Anyhow, if a deployment is actively running out of space, I’d say something is 
wrong with the config, setup or is simply incorrectly scaled ...

>
>>> The main concern about RST was to recover from a 3rd party sending RSTs 
>>> into the session.
>>>
>>>
>>>>>>> With the current state machine this would leave the session in 
>>>>>>> established state.
>>>>>>>
>>>>>>> These proposed changes do:
>>>>>>>  - require 3-way handshake to establish session.
>>>>>>
>>>>>> How does this help? Would you also need to track sequence numbers as was 
>>>>>> done before?
>>>>>
>>>>> It helps in the case where someone would spoof a a SYN, then RFC would 
>>>>> leave the spurious session in established.
>>>>> The proposed state machine will leave it in transitory (the client to 
>>>>> server ACK would never be seen).
>>>>
>>>> Ah, so you are assuming a legitimate client is connecting to a nefarious 
>>>> server, which cannot produce it’s own SYN (or ACK) packet, but has the 
>>>> capability to spoof a SYN packet, yes?
>>>> Or is it a nefarious client which is unable to produce a SYN packet, but 
>>>> capable of spoofing a SYN packet?
>>>
>>> Neither I think, I'm concerned about a nefarious 3rd party trying to attack 
>>> the session table. Yes, somewhat depending on how the NAT is configured the 
>>> attacker has to be on the inside. Depending on the 3-way handshake also 
>>> ensures the NAT state is better synchronised with the client and server 
>>> state, than just using the 2-way. Do you see this causing issues?
>>
>> I don’t see any value added besides code being more complex.
>> If I have inside access I can drain the session table with scapy (which is a 
>> very slow way of doing things) easily even without keeping any local state 
>> and it doesn’t matter if you track 2way or 3way ….
>> (haven’t we had this discussion a couple of times already? feels a bit like 
>> beating a dead horse. NAT just sucks - malicious actor on the inside can 
>> simply make life miserable for all others UNLESS you implement a limit per 
>> inside host).
>
> You could do a limit per inside host. Now IPv4 is somewhat more beneficial 
> here, but the inside host might have 10/8 to play with still...
>
> I do have a proposal written up for a IPv4 plan B. That could have been done 
> instead of IPv6, that offers stateless NATs... I was intending to wait until 
> April 1st to publish. ;-)
>
> Best regards,
> Ole
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21068): https://lists.fd.io/g/vpp-dev/message/21068
Mute This Topic: https://lists.fd.io/mt/88218698/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to