+1 Discussions should be summarized and brought back to the mailing list(s). Recommendations are fine, but any decisions should be made on-list.
-Taylor > On Mar 30, 2016, at 8:31 PM, Patrick Hunt <ph...@apache.org> wrote: > > Remember that no decisions should be made at the meeting. It's fine to > have discussions, but those need to be brought back to the community > before decisions are made. Summarizing for the dev@ mailing list, also > jiras, etc... are good ways to socialize the issues. > > Patrick > >> On Wed, Mar 30, 2016 at 5:17 PM, Henry Saputra <henry.sapu...@gmail.com> >> wrote: >> The community for both podlings are bigger than the ones show up at Strata >> =) >> >> Would love to have the summary of the discussions in the dev@ list if >> indeed some discussions happening at Strata. >> >> - Henry >> >> On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <yanping.w...@intel.com> >> wrote: >> >>> Hi, All >>> >>> I met with Jacques today at Strata, we think it would be great that Arrow >>> and Mnemonic communities can have a F2F meeting together to talk about our >>> integration. >>> I have following two days, 4/11 Monday afternoon, or 4/15 Friday. >>> We can meet at intel SC campus. >>> >>> Would you let me know if you are able to join us and which day you'd >>> prefer? >>> >>> Thanks >>> Yanping >>> >>> >>> On Mar 29, 2016, at 4:38 PM, Gary <ga...@apache.org<mailto: >>> ga...@apache.org>> wrote: >>> >>> Yes, I agree with you and that's great if we could brainstorm here to >>> collect more ideas about enabling non-volatile memory usage for Apache >>> Arrow through Mnemonic. >>> >>> for the questions, my ideas are: >>> >>> >>> - Right now you are using unpooled persistent memory. Does that make sense >>> or does chunking make more sense? >>> >>> Gary: I think it could make some sense if developer knows that their >>> datasets are very big and they want Apache Arrow to keep most of them in >>> memory for intensive computing e.g. sort. >>> the developer certainly can spill their Mnemonic managed >>> datasets into disk but this way seems a bit inefficient in some scenarios >>> that might depend on concrete application logic . >>> >>> >>> - What do you think is the right way to transition back and forth between >>> persistent and ephemeral memory? What do you think will be the first >>> pattern to be adopted. For example, do you think we should try to use it as >>> a tiered storage for sort spilling (before hitting the disk), or should we >>> use it for caching? >>> Gary: my 2 cents, the netty library looks not yet provide a elegant switch >>> mechanism for Arrow to use, probably we can change the logic around >>> "initialCapacity > directArena.chunkSize" to control which buffer put on >>> off-heap or managed by Mnemonic, another approach is to let memory >>> clustering mechanism of Mnemonic managing hybrid memory-like spaces instead >>> of part logics of class PooledByteBufAllocatorL. >>> Regarding the sorting, I think it is a typical case of random access to >>> the data, we should avoid spilling as much as possible. >>> my 2 cents, the performance could be >>> all in off-heap if possible > mnemonic used as cache > all in mnemonic >>> using NVMe/disk > off-heap + spilling >>> the code simplicity would be >>> all in off-heap if possible > all in mnemonic using NVMe/disk > mnemonic >>> used as cache > off-heap + spilling >>> >>> the reason why the mode "mnemonic used as cache + spilling" probably >>> unnecessary is mnemonic could provide nearly equivalent capacity of disk. >>> >>> Thanks. >>> Gary. >>> >>> >>> -----Original Message----- >>> >>> From: Jacques Nadeau [mailto:jacq...@apache.org] >>> >>> Sent: Tuesday, March 29, 2016 8:05 AM >>> >>> To: <mailto:dev@arrow.apache.org> dev@arrow.apache.org<mailto: >>> dev@arrow.apache.org> >>> >>> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra. >>> for Apache Arrow >>> >>> >>> >>> This is super cool. A couple of questions: >>> >>> >>> >>> - Right now you are using unpooled persistent memory. Does that make sense >>> or does chunking make more sense? >>> >>> - What do you think is the right way to transition back and forth between >>> persistent and ephemeral memory? What do you think will be the first >>> pattern to be adopted. For example, do you think we should try to use it as >>> a tiered storage for sort spilling (before hitting the disk), or should we >>> use it for caching? >>> >>> >>> >>> I think it will be much easier to think about this in the context of a >>> primary or first use case. Do you have something in mind or should we >>> brainstorm here? >>> >>> >>> >>> On Wed, Mar 23, 2016 at 7:16 PM, Gary <ga...@apache.org<mailto: >>> ga...@apache.org>> wrote: >>> >>> >>> >>>> Hello, >>> >>> >>>> We have created a patch for Apache Arrow to leverage Apache >>> >>>> incubator Mnemonic as an alternative infra. for underlying memory >>> >>>> resources allocation, you can find it as below forked repo. >>> >>> >>>> <https://github.com/NonVolatileComputing/arrow> >>> https://github.com/NonVolatileComputing/arrow >>> >>> >>>> By this way, Apache Arrow could take some structural benefits from >>> >>>> Mnemonic project they are >>> >>> >>>> - Arrow is able to leverage larger capacity of high performance >>> >>>> hybrid storage devices. e.g. high-end SSD, NVMe >>> >>> >>>> - Mnemonic provide a potential opportunity for Arrow to >>> >>>> optimize/tuning its allocation algorithms as a native Arrow-oriented >>> >>>> allocation services >>> >>> >>>> - The non-volatile features of Mnemonic make it possible that >>> >>>> Arrow could make its columnar in-memory data shared between different >>> >>>> applications or across life-cycle of single application >>> >>> >>>> - Arrow could take advantages of coming Mnemonic features of >>> >>>> memory clustering/DOG (distributed object graph) and massive native >>> >>>> computing >>> >>> >>>> - Mnemonic helps to reduce the pressure of main memory utilization >>> >>>> and its related system wide overheads. >>> >>> >>>> Our this patch is designed to minimize the changes for user to use >>> >>>> Arrow, please check out the test cases provided by this patch for your >>> >>>> reference. >>> >>> >>>> Note that, we need to put allocator services to a specified >>> >>>> position (indicated by pom.xml) for Mnemonic backed Arrow related test >>> >>>> cases to run because those services are required for external >>> >>>> memory-like device management. >>> >>> >>>> Please give your comments and review feedback for better >>> >>>> collaboration of Apache Arrow and Mnemonic, Thanks. >>> >>> >>>> Best Regards. >>> >>>> Gary. >>> >>> >>> >>> >>> <smime.p7m> >>> <gpgol000.txt> >>>