Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

P. Taylor Goetz Wed, 30 Mar 2016 19:14:02 -0700

+1

Discussions should be summarized and brought back to the mailing list(s).  
Recommendations are fine, but any decisions should be made on-list.


-Taylor

> On Mar 30, 2016, at 8:31 PM, Patrick Hunt <[email protected]> wrote:
> 
> Remember that no decisions should be made at the meeting. It's fine to
> have discussions, but those need to be brought back to the community
> before decisions are made. Summarizing for the dev@ mailing list, also
> jiras, etc... are good ways to socialize the issues.
> 
> Patrick
> 
>> On Wed, Mar 30, 2016 at 5:17 PM, Henry Saputra <[email protected]> 
>> wrote:
>> The community for both podlings are bigger than the ones show up at Strata
>> =)
>> 
>> Would love to have the summary of the discussions in the dev@ list if
>> indeed some discussions happening at Strata.
>> 
>> - Henry
>> 
>> On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <[email protected]>
>> wrote:
>> 
>>> Hi, All
>>> 
>>> I met with Jacques today at Strata, we think it would be great that Arrow
>>> and Mnemonic communities can have a F2F meeting together to talk about our
>>> integration.
>>> I have following two days, 4/11 Monday afternoon, or 4/15 Friday.
>>> We can meet at  intel SC campus.
>>> 
>>> Would you let me know if you are able to join us and which day you'd
>>> prefer?
>>> 
>>> Thanks
>>> Yanping
>>> 
>>> 
>>> On Mar 29, 2016, at 4:38 PM, Gary <[email protected]<mailto:
>>> [email protected]>> wrote:
>>> 
>>> Yes, I agree with you and that's great if we could brainstorm here to
>>> collect more ideas about enabling non-volatile memory usage for Apache
>>> Arrow through Mnemonic.
>>> 
>>> for the questions, my ideas are:
>>> 
>>> 
>>> - Right now you are using unpooled persistent memory. Does that make sense
>>> or does chunking make more sense?
>>> 
>>> Gary: I think it could make some sense if developer knows that their
>>> datasets are very big and they want Apache Arrow to keep most of them in
>>> memory for intensive computing e.g. sort.
>>>          the developer certainly can spill their Mnemonic managed
>>> datasets into disk but this way seems a bit inefficient in some scenarios
>>> that might depend on concrete application logic .
>>> 
>>> 
>>> - What do you think is the right way to transition back and forth between
>>> persistent and ephemeral memory? What do you think will be the first
>>> pattern to be adopted. For example, do you think we should try to use it as
>>> a tiered storage for sort spilling (before hitting the disk), or should we
>>> use it for caching?
>>> Gary: my 2 cents, the netty library looks not yet provide a elegant switch
>>> mechanism for Arrow to use, probably we can change the logic around
>>> "initialCapacity > directArena.chunkSize" to control which buffer put on
>>> off-heap or managed by Mnemonic, another approach is to let memory
>>> clustering mechanism of Mnemonic managing hybrid memory-like spaces instead
>>> of part logics of class PooledByteBufAllocatorL.
>>> Regarding the sorting, I think it is a typical case of random access to
>>> the data, we should avoid spilling as much as possible.
>>> my 2 cents, the performance could be
>>> all in off-heap if possible > mnemonic used as cache > all in mnemonic
>>> using NVMe/disk >  off-heap + spilling
>>> the code simplicity would be
>>> all in off-heap if possible >  all in mnemonic using NVMe/disk > mnemonic
>>> used as cache >  off-heap + spilling
>>> 
>>> the reason why the mode "mnemonic used as cache + spilling" probably
>>> unnecessary is mnemonic could provide nearly equivalent capacity of disk.
>>> 
>>> Thanks.
>>> Gary.
>>> 
>>> 
>>> -----Original Message-----
>>> 
>>> From: Jacques Nadeau [mailto:[email protected]]
>>> 
>>> Sent: Tuesday, March 29, 2016 8:05 AM
>>> 
>>> To: <mailto:[email protected]> [email protected]<mailto:
>>> [email protected]>
>>> 
>>> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra.
>>> for Apache Arrow
>>> 
>>> 
>>> 
>>> This is super cool. A couple of questions:
>>> 
>>> 
>>> 
>>> - Right now you are using unpooled persistent memory. Does that make sense
>>> or does chunking make more sense?
>>> 
>>> - What do you think is the right way to transition back and forth between
>>> persistent and ephemeral memory? What do you think will be the first
>>> pattern to be adopted. For example, do you think we should try to use it as
>>> a tiered storage for sort spilling (before hitting the disk), or should we
>>> use it for caching?
>>> 
>>> 
>>> 
>>> I think it will be much easier to think about this in the context of a
>>> primary or first use case. Do you have something in mind or should we
>>> brainstorm here?
>>> 
>>> 
>>> 
>>> On Wed, Mar 23, 2016 at 7:16 PM, Gary <[email protected]<mailto:
>>> [email protected]>> wrote:
>>> 
>>> 
>>> 
>>>> Hello,
>>> 
>>> 
>>>>   We have created a patch for Apache Arrow to leverage Apache
>>> 
>>>> incubator Mnemonic as an alternative infra. for underlying memory
>>> 
>>>> resources allocation, you can find it as below forked repo.
>>> 
>>> 
>>>> <https://github.com/NonVolatileComputing/arrow>
>>> https://github.com/NonVolatileComputing/arrow
>>> 
>>> 
>>>>    By this way, Apache Arrow could take some structural benefits from
>>> 
>>>> Mnemonic project they are
>>> 
>>> 
>>>>    - Arrow is able to leverage larger capacity of high performance
>>> 
>>>> hybrid storage devices. e.g. high-end SSD, NVMe
>>> 
>>> 
>>>>    - Mnemonic provide a potential opportunity for Arrow to
>>> 
>>>> optimize/tuning its allocation algorithms as a native Arrow-oriented
>>> 
>>>> allocation services
>>> 
>>> 
>>>>    - The non-volatile features of  Mnemonic make it possible that
>>> 
>>>> Arrow could make its columnar in-memory data shared between different
>>> 
>>>> applications or across life-cycle of single application
>>> 
>>> 
>>>>    - Arrow could take advantages of coming Mnemonic features of
>>> 
>>>> memory clustering/DOG (distributed object graph) and massive native
>>> 
>>>> computing
>>> 
>>> 
>>>>    - Mnemonic helps to reduce the pressure of main memory utilization
>>> 
>>>> and its related system wide overheads.
>>> 
>>> 
>>>>   Our this patch is designed to minimize the changes for user to use
>>> 
>>>> Arrow, please check out the test cases provided by this patch for your
>>> 
>>>> reference.
>>> 
>>> 
>>>>   Note that, we need to put allocator services to a specified
>>> 
>>>> position (indicated by pom.xml) for Mnemonic backed Arrow related test
>>> 
>>>> cases to run because those services are required for external
>>> 
>>>> memory-like device management.
>>> 
>>> 
>>>>   Please give your comments and review feedback for better
>>> 
>>>> collaboration of Apache Arrow and Mnemonic, Thanks.
>>> 
>>> 
>>>> Best Regards.
>>> 
>>>> Gary.
>>> 
>>> 
>>> 
>>> 
>>> <smime.p7m>
>>> <gpgol000.txt>
>>>

Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Reply via email to