Hi,

I want to discuss a bit about the discussion[1] in the pending PR[2] for Java 
Dataset(it's no longer "Datasets" I guess?) API.


- Background:

We are transferring C++ Arrow buffers to Java side BufferAllocators. We should 
decide whether to use -XX:MaxDirectMemorySize as a limit of these buffers. If 
yes, what should be a desired solution?

- Possible alternative solutions so far:

1. Reserve from Bits.java from Java side

Pros: Share memory counter with JVM direct byte buffers, No JNI overhead, less 
codes
Cons: More invocations (each buffer a call to Bits#reserveMemory)

2. Reserve from Bits.java from C++ side

Pros: Share memory counter with JVM direct byte buffers, Less invocations (e.g. 
if using Jemalloc, we can somehow perform one call for one underlying trunk)
Cons: JNI overhead, more codes

3. Reserve from Netty's PlatformDependent.java from Java side

Pros: Share memory counter with Netty-based buffers, No JNI overhead, less codes
Cons: More invocations

4. Reserve from Netty's PlatformDependent.java from C++ side

Pros: Share memory counter with Netty-based buffers, Less invocations
Cons: JNI overhead, more codes

5. Not to implement any of the above, respect to BufferAllocator's limit only.


So far I prefer 5, not to use any of the solutions. I am not sure if "direct 
memory" is a good indicator for these off-heap buffers, because we should 
finally have to decide to share counter with either JVM direct byte buffers or 
Netty-based buffers. As far as I could think about, a complete solution may 
ideally either to have a global counter for all types of off-heap buffers, or 
give each type a individual counter.

So do you have any thoughts or suggestions on this topic? It would be great if 
we could have a conclusion soon as the PR was blocked for some time. Thanks in 
advance :)


Best,
Hongze

[1] https://github.com/apache/arrow/pull/7030#issuecomment-657096664
[2] https://github.com/apache/arrow/pull/7030

Reply via email to