I think ideally we would have a Java interface that would support all of: - Memory mapped files - Anonymous shared memory segments (e.g. POSIX shm) - NVM / Mnemonic
We already have the ability to do zero-copy reads from buffer-like objects in C++ and IO interfaces that support zero copy (like memory mapped files). We can do zero-copy reads from ArrowBuf in Java but we are missing the interfaces to shared memory sources - Wes On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <ga...@apache.org> wrote: > Hi Wes, > > Thank you for the explanation. the usage of > https://issues.apache.org/jira/browse/ARROW-721 could be directly supported > by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk makes > use of unsafe to expose a plain memory space for Arrow to use without > performance penalties. that's why most of the big data frameworks take the > advantage of unsafe, please refer to > https://mnemonic.apache.org/docs/domusecases.html for the use cases. we > could work on this ticket if you think that's exactly what you want. > > Regarding the NVM tech., that is what Mnemonic created for. it could be > used to directly persist Java generic objects and collection on NVM with no > SerDe. so what kind of basic tools you mentioned? probably, we can help > also identify the gaps for Mnemonic as well. Thanks! > > Very truly yours, > Gary > > > > > > > > > > > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <wesmck...@gmail.com> wrote: > >> hi Gary, >> >> The Java libraries are not yet capable of writing or zero-copy reads >> of Arrow datasets to/from shared memory or memory-mapped files: >> https://issues.apache.org/jira/browse/ARROW-721. We've developed quite >> a bit of technology on the C++ side for dealing with shared memory IPC >> but we need someone to help with that on the Java side. >> >> In the context of NVM technologies, it would be nice to be able to >> persist a dataset to NVM and continue to do analytics on it, while >> retaining a "handle" so that the dataset can be easily recovered in >> the event of process failure. We may arrive at new use cases once some >> of the basic tools exist. >> >> - Wes >> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <ga...@apache.org> wrote: >> > Thank you for sharing the videos. We are very interested in how to >> support >> > Arrow data format and collection very closely, could you please help to >> > point out which interfaces to allow Mnemonic act as a memory provider for >> > the user to store and access Arrow managed datasets ? Thanks! >> > >> > Very truly yours, >> > Gary. >> > >> > >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <ivan.sadi...@gmail.com> >> > wrote: >> > >> >> Great presentation! Thank you for sharing. >> >> >> >> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <wesmck...@gmail.com> >> wrote: >> >> >> >> > Absolutely. I will do that now >> >> > >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org> >> wrote: >> >> > > Thanks for sharing. Can we tweet those videos as well? I see that >> >> > https://twitter.com/apachearrow <https://twitter.com/apachearrow> >> only >> >> > tweeted your slides. >> >> > > >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <wesmck...@gmail.com> >> >> wrote: >> >> > >> >> >> > >> hi all, >> >> > >> >> >> > >> In case folks here are interested, I gave a keynote this week at >> >> > >> JupyterCon explaining my motivations for being involved in Apache >> >> > >> Arrow and how I see it fitting in with the data science ecosystem >> long >> >> > >> term: >> >> > >> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs >> >> > >> >> >> > >> I also gave an interview going a little deeper into some of the >> topics >> >> > >> from the talk: >> >> > >> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU >> >> > >> >> >> > >> I believe we have an exciting journey ahead of us, but it's >> certainly >> >> > >> going to take a lot of collaboration and community development. >> >> > >> >> >> > >> - Wes >> >> > > >> >> > >> >> >>