The community for both podlings are bigger than the ones show up at Strata =)
Would love to have the summary of the discussions in the dev@ list if indeed some discussions happening at Strata. - Henry On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <yanping.w...@intel.com> wrote: > Hi, All > > I met with Jacques today at Strata, we think it would be great that Arrow > and Mnemonic communities can have a F2F meeting together to talk about our > integration. > I have following two days, 4/11 Monday afternoon, or 4/15 Friday. > We can meet at intel SC campus. > > Would you let me know if you are able to join us and which day you'd > prefer? > > Thanks > Yanping > > > On Mar 29, 2016, at 4:38 PM, Gary <ga...@apache.org<mailto: > ga...@apache.org>> wrote: > > Yes, I agree with you and that's great if we could brainstorm here to > collect more ideas about enabling non-volatile memory usage for Apache > Arrow through Mnemonic. > > for the questions, my ideas are: > > > - Right now you are using unpooled persistent memory. Does that make sense > or does chunking make more sense? > > Gary: I think it could make some sense if developer knows that their > datasets are very big and they want Apache Arrow to keep most of them in > memory for intensive computing e.g. sort. > the developer certainly can spill their Mnemonic managed > datasets into disk but this way seems a bit inefficient in some scenarios > that might depend on concrete application logic . > > > - What do you think is the right way to transition back and forth between > persistent and ephemeral memory? What do you think will be the first > pattern to be adopted. For example, do you think we should try to use it as > a tiered storage for sort spilling (before hitting the disk), or should we > use it for caching? > Gary: my 2 cents, the netty library looks not yet provide a elegant switch > mechanism for Arrow to use, probably we can change the logic around > "initialCapacity > directArena.chunkSize" to control which buffer put on > off-heap or managed by Mnemonic, another approach is to let memory > clustering mechanism of Mnemonic managing hybrid memory-like spaces instead > of part logics of class PooledByteBufAllocatorL. > Regarding the sorting, I think it is a typical case of random access to > the data, we should avoid spilling as much as possible. > my 2 cents, the performance could be > all in off-heap if possible > mnemonic used as cache > all in mnemonic > using NVMe/disk > off-heap + spilling > the code simplicity would be > all in off-heap if possible > all in mnemonic using NVMe/disk > mnemonic > used as cache > off-heap + spilling > > the reason why the mode "mnemonic used as cache + spilling" probably > unnecessary is mnemonic could provide nearly equivalent capacity of disk. > > Thanks. > Gary. > > > -----Original Message----- > > From: Jacques Nadeau [mailto:jacq...@apache.org] > > Sent: Tuesday, March 29, 2016 8:05 AM > > To: <mailto:dev@arrow.apache.org> dev@arrow.apache.org<mailto: > dev@arrow.apache.org> > > Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra. > for Apache Arrow > > > > This is super cool. A couple of questions: > > > > - Right now you are using unpooled persistent memory. Does that make sense > or does chunking make more sense? > > - What do you think is the right way to transition back and forth between > persistent and ephemeral memory? What do you think will be the first > pattern to be adopted. For example, do you think we should try to use it as > a tiered storage for sort spilling (before hitting the disk), or should we > use it for caching? > > > > I think it will be much easier to think about this in the context of a > primary or first use case. Do you have something in mind or should we > brainstorm here? > > > > On Wed, Mar 23, 2016 at 7:16 PM, Gary <ga...@apache.org<mailto: > ga...@apache.org>> wrote: > > > > > Hello, > > > > > > We have created a patch for Apache Arrow to leverage Apache > > > incubator Mnemonic as an alternative infra. for underlying memory > > > resources allocation, you can find it as below forked repo. > > > > > > <https://github.com/NonVolatileComputing/arrow> > https://github.com/NonVolatileComputing/arrow > > > > > > By this way, Apache Arrow could take some structural benefits from > > > Mnemonic project they are > > > > > > - Arrow is able to leverage larger capacity of high performance > > > hybrid storage devices. e.g. high-end SSD, NVMe > > > > > > - Mnemonic provide a potential opportunity for Arrow to > > > optimize/tuning its allocation algorithms as a native Arrow-oriented > > > allocation services > > > > > > - The non-volatile features of Mnemonic make it possible that > > > Arrow could make its columnar in-memory data shared between different > > > applications or across life-cycle of single application > > > > > > - Arrow could take advantages of coming Mnemonic features of > > > memory clustering/DOG (distributed object graph) and massive native > > > computing > > > > > > - Mnemonic helps to reduce the pressure of main memory utilization > > > and its related system wide overheads. > > > > > > Our this patch is designed to minimize the changes for user to use > > > Arrow, please check out the test cases provided by this patch for your > > > reference. > > > > > > Note that, we need to put allocator services to a specified > > > position (indicated by pom.xml) for Mnemonic backed Arrow related test > > > cases to run because those services are required for external > > > memory-like device management. > > > > > > Please give your comments and review feedback for better > > > collaboration of Apache Arrow and Mnemonic, Thanks. > > > > > > Best Regards. > > > Gary. > > > > > > > > > > > <smime.p7m> > <gpgol000.txt> >