Hi, I want to discuss a bit about the discussion[1] in the pending PR[2] for Java Dataset(it's no longer "Datasets" I guess?) API.
- Background: We are transferring C++ Arrow buffers to Java side BufferAllocators. We should decide whether to use -XX:MaxDirectMemorySize as a limit of these buffers. If yes, what should be a desired solution? - Possible alternative solutions so far: 1. Reserve from Bits.java from Java side Pros: Share memory counter with JVM direct byte buffers, No JNI overhead, less codes Cons: More invocations (each buffer a call to Bits#reserveMemory) 2. Reserve from Bits.java from C++ side Pros: Share memory counter with JVM direct byte buffers, Less invocations (e.g. if using Jemalloc, we can somehow perform one call for one underlying trunk) Cons: JNI overhead, more codes 3. Reserve from Netty's PlatformDependent.java from Java side Pros: Share memory counter with Netty-based buffers, No JNI overhead, less codes Cons: More invocations 4. Reserve from Netty's PlatformDependent.java from C++ side Pros: Share memory counter with Netty-based buffers, Less invocations Cons: JNI overhead, more codes 5. Not to implement any of the above, respect to BufferAllocator's limit only. So far I prefer 5, not to use any of the solutions. I am not sure if "direct memory" is a good indicator for these off-heap buffers, because we should finally have to decide to share counter with either JVM direct byte buffers or Netty-based buffers. As far as I could think about, a complete solution may ideally either to have a global counter for all types of off-heap buffers, or give each type a individual counter. So do you have any thoughts or suggestions on this topic? It would be great if we could have a conclusion soon as the PR was blocked for some time. Thanks in advance :) Best, Hongze [1] https://github.com/apache/arrow/pull/7030#issuecomment-657096664 [2] https://github.com/apache/arrow/pull/7030