azagrebin opened a new pull request #10940: [FLINK-14894][core][mem] Do not explicitly release unsafe memory when managed segment is freed URL: https://github.com/apache/flink/pull/10940 ## What is the purpose of the change The conclusion at the moment is that release unsafe memory, while potentially having link on it in Java code, is dangerous. We revert this to rely only on GC when there are no links in Java code. The problem can happen e.g. if task thread exits w/o joining with IO threads (e.g. spilling in batch job) then the unsafe memory is released but it can be written w/o segfault by IO thread. At the same time, other task can allocate interleaving memory which can be spoiled by that IO thread. We still keep it unsafe to allocate it outside of JVM direct memory limit to not interfere with direct allocations, also it does not make sense for RocksDB native memory (also accounted in MemoryManager) to be part of direct memory limit. The potential downside can be that over-allocating of unsafe memory will not hit the direct limit and will not cause GC immediately which will be the only way to release it. In this case, it can cause out-of-memory failures w/o triggering GC to release a lot of potentially already unused memory. If we see the delayed release as a problem then we can investigate further optimisations, like: - directly monitoring phantom reference queue of the cleaner (if JVM detects quickly that there are no more reference to the memory) and explicitly release memory ready for GC asap, e.g. after Task exit - monitor allocated memory amount and block allocation until GC releases occupied memory instead of failing with out-of-memory immediately ## Brief change log remove `cleaner` from `HybridMemorySegment` and `cleaner#run` from `HybridMemorySegment#free` ## Verifying this change existing tests ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (can be) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services