afedulov commented on PR #25866:
URL: https://github.com/apache/flink/pull/25866#issuecomment-2587777224

   I believe that since we are not dealing with a memory leak but rather with 
different memory allocation (thanks @ferenc-csaky for confirming this!), we 
should aim to include the upgrade in both the 1.19 and 1.20 releases. While 
there is some risk of exposing users to OOM kills by pushing some workloads 
over hard memory limits, we have to weight themagainst the risks posed by 
leaving numerous critical CVEs exposed on the network stack.
   <img width="398" alt="image" 
src="https://github.com/user-attachments/assets/be65d248-f7ce-4747-bb93-9f096e77bc8c";
 />
   
   Netty 3.10.6 is the last 3.x release and therefore officially reached EOL 
more than 8 years ago. I also read reports of it suffering from multiple GC and 
memory management issues, so it is not like we are transitioning away from 
something that is rock solid and works perfectly to an experimental release. 
   
   The approach proposed by @He-Pin in the above 
[comment](https://github.com/apache/flink/pull/25866#issuecomment-2580904462) 
sounds reasonable to me. If we adopt this approach, I think we should actually 
add `-Dio.netty.tryReflectionSetAccessible=true` parameter as default to the 
startup scripts, not merely to the documentation. An open question remains: 
should we retain the 3.x-style memory management by using `unpooled` 
allocations? My current understanding is that using both parameters should 
approximate the existing behavior, but it does not seem like `unpooled` 
allocation is strictly required for our case. @He-Pin, what’s your perspective 
on this?
   
   As for fixing it in 1.20.2 or 1.20.1 - I am not convinced that having CI 
running more times will provide us required confidence. especially given that 
the primary concern is overspepping the memory limits in existing deployments, 
not stability. The primary concern lies in breaching memory limits in existing 
deployments rather than stability. If we agree this change is necessary for a 
patch release eventually, it would be logical to apply it now for both 1.19.2 
and 1.20.1. I will make sure this potential concern is explicitly mentioned in 
the release notes.
   
   @ferenc-csaky Do we have a rough understanding of how much more memory 
consumption does this new version induce? Does it look like some fixed amount 
or something that scales with the number of connections? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to