afedulov commented on PR #25866: URL: https://github.com/apache/flink/pull/25866#issuecomment-2587777224
I believe that since we are not dealing with a memory leak but rather with different memory allocation (thanks @ferenc-csaky for confirming this!), we should aim to include the upgrade in both the 1.19 and 1.20 releases. While there is some risk of exposing users to OOM kills by pushing some workloads over hard memory limits, we have to weight themagainst the risks posed by leaving numerous critical CVEs exposed on the network stack. <img width="398" alt="image" src="https://github.com/user-attachments/assets/be65d248-f7ce-4747-bb93-9f096e77bc8c" /> Netty 3.10.6 is the last 3.x release and therefore officially reached EOL more than 8 years ago. I also read reports of it suffering from multiple GC and memory management issues, so it is not like we are transitioning away from something that is rock solid and works perfectly to an experimental release. The approach proposed by @He-Pin in the above [comment](https://github.com/apache/flink/pull/25866#issuecomment-2580904462) sounds reasonable to me. If we adopt this approach, I think we should actually add `-Dio.netty.tryReflectionSetAccessible=true` parameter as default to the startup scripts, not merely to the documentation. An open question remains: should we retain the 3.x-style memory management by using `unpooled` allocations? My current understanding is that using both parameters should approximate the existing behavior, but it does not seem like `unpooled` allocation is strictly required for our case. @He-Pin, what’s your perspective on this? As for fixing it in 1.20.2 or 1.20.1 - I am not convinced that having CI running more times will provide us required confidence. especially given that the primary concern is overspepping the memory limits in existing deployments, not stability. The primary concern lies in breaching memory limits in existing deployments rather than stability. If we agree this change is necessary for a patch release eventually, it would be logical to apply it now for both 1.19.2 and 1.20.1. I will make sure this potential concern is explicitly mentioned in the release notes. @ferenc-csaky Do we have a rough understanding of how much more memory consumption does this new version induce? Does it look like some fixed amount or something that scales with the number of connections? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org