azagrebin opened a new pull request #10940: [FLINK-14894][core][mem] Do not 
explicitly release unsafe memory when managed segment is freed
URL: https://github.com/apache/flink/pull/10940
 
 
   ## What is the purpose of the change
   
   The conclusion at the moment is that release unsafe memory, while 
potentially having link on it in Java code, is dangerous. We revert this to 
rely only on GC when there are no links in Java code. The problem can happen 
e.g. if task thread exits w/o joining with IO threads (e.g. spilling in batch 
job) then the unsafe memory is released but it can be written w/o segfault by 
IO thread. At the same time, other task can allocate interleaving memory which 
can be spoiled by that IO thread. We still keep it unsafe to allocate it 
outside of JVM direct memory limit to not interfere with direct allocations, 
also it does not make sense for RocksDB native memory (also accounted in 
MemoryManager) to be part of direct memory limit.
   
   The potential downside can be that over-allocating of unsafe memory will not 
hit the direct limit and will not cause GC immediately which will be the only 
way to release it. In this case, it can cause out-of-memory failures w/o 
triggering GC to release a lot of potentially already unused memory.
   
   If we see the delayed release as a problem then we can investigate further 
optimisations, like:
   - directly monitoring phantom reference queue of the cleaner (if JVM detects 
quickly that there are no more reference to the memory) and explicitly release 
memory ready for GC asap, e.g. after Task exit
   - monitor allocated memory amount and block allocation until GC releases 
occupied memory instead of failing with out-of-memory immediately
   
   ## Brief change log
   
   remove `cleaner` from `HybridMemorySegment` and `cleaner#run` from 
`HybridMemorySegment#free`
   
   ## Verifying this change
   
   existing tests
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (can be)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to