Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Patrick Hunt Wed, 30 Mar 2016 17:32:48 -0700

Remember that no decisions should be made at the meeting. It's fine to
have discussions, but those need to be brought back to the community
before decisions are made. Summarizing for the dev@ mailing list, also
jiras, etc... are good ways to socialize the issues.


Patrick

On Wed, Mar 30, 2016 at 5:17 PM, Henry Saputra <henry.sapu...@gmail.com> wrote:
> The community for both podlings are bigger than the ones show up at Strata
> =)
>
> Would love to have the summary of the discussions in the dev@ list if
> indeed some discussions happening at Strata.
>
> - Henry
>
> On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <yanping.w...@intel.com>
> wrote:
>
>> Hi, All
>>
>> I met with Jacques today at Strata, we think it would be great that Arrow
>> and Mnemonic communities can have a F2F meeting together to talk about our
>> integration.
>> I have following two days, 4/11 Monday afternoon, or 4/15 Friday.
>> We can meet at  intel SC campus.
>>
>> Would you let me know if you are able to join us and which day you'd
>> prefer?
>>
>> Thanks
>> Yanping
>>
>>
>> On Mar 29, 2016, at 4:38 PM, Gary <ga...@apache.org<mailto:
>> ga...@apache.org>> wrote:
>>
>> Yes, I agree with you and that's great if we could brainstorm here to
>> collect more ideas about enabling non-volatile memory usage for Apache
>> Arrow through Mnemonic.
>>
>> for the questions, my ideas are:
>>
>>
>> - Right now you are using unpooled persistent memory. Does that make sense
>> or does chunking make more sense?
>>
>> Gary: I think it could make some sense if developer knows that their
>> datasets are very big and they want Apache Arrow to keep most of them in
>> memory for intensive computing e.g. sort.
>>           the developer certainly can spill their Mnemonic managed
>> datasets into disk but this way seems a bit inefficient in some scenarios
>> that might depend on concrete application logic .
>>
>>
>> - What do you think is the right way to transition back and forth between
>> persistent and ephemeral memory? What do you think will be the first
>> pattern to be adopted. For example, do you think we should try to use it as
>> a tiered storage for sort spilling (before hitting the disk), or should we
>> use it for caching?
>> Gary: my 2 cents, the netty library looks not yet provide a elegant switch
>> mechanism for Arrow to use, probably we can change the logic around
>> "initialCapacity > directArena.chunkSize" to control which buffer put on
>> off-heap or managed by Mnemonic, another approach is to let memory
>> clustering mechanism of Mnemonic managing hybrid memory-like spaces instead
>> of part logics of class PooledByteBufAllocatorL.
>> Regarding the sorting, I think it is a typical case of random access to
>> the data, we should avoid spilling as much as possible.
>> my 2 cents, the performance could be
>> all in off-heap if possible > mnemonic used as cache > all in mnemonic
>> using NVMe/disk >  off-heap + spilling
>> the code simplicity would be
>> all in off-heap if possible >  all in mnemonic using NVMe/disk > mnemonic
>> used as cache >  off-heap + spilling
>>
>> the reason why the mode "mnemonic used as cache + spilling" probably
>> unnecessary is mnemonic could provide nearly equivalent capacity of disk.
>>
>> Thanks.
>> Gary.
>>
>>
>> -----Original Message-----
>>
>> From: Jacques Nadeau [mailto:jacq...@apache.org]
>>
>> Sent: Tuesday, March 29, 2016 8:05 AM
>>
>> To: <mailto:dev@arrow.apache.org> dev@arrow.apache.org<mailto:
>> dev@arrow.apache.org>
>>
>> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra.
>> for Apache Arrow
>>
>>
>>
>> This is super cool. A couple of questions:
>>
>>
>>
>> - Right now you are using unpooled persistent memory. Does that make sense
>> or does chunking make more sense?
>>
>> - What do you think is the right way to transition back and forth between
>> persistent and ephemeral memory? What do you think will be the first
>> pattern to be adopted. For example, do you think we should try to use it as
>> a tiered storage for sort spilling (before hitting the disk), or should we
>> use it for caching?
>>
>>
>>
>> I think it will be much easier to think about this in the context of a
>> primary or first use case. Do you have something in mind or should we
>> brainstorm here?
>>
>>
>>
>> On Wed, Mar 23, 2016 at 7:16 PM, Gary <ga...@apache.org<mailto:
>> ga...@apache.org>> wrote:
>>
>>
>>
>> > Hello,
>>
>> >
>>
>> >    We have created a patch for Apache Arrow to leverage Apache
>>
>> > incubator Mnemonic as an alternative infra. for underlying memory
>>
>> > resources allocation, you can find it as below forked repo.
>>
>> >
>>
>> > <https://github.com/NonVolatileComputing/arrow>
>> https://github.com/NonVolatileComputing/arrow
>>
>> >
>>
>> >     By this way, Apache Arrow could take some structural benefits from
>>
>> > Mnemonic project they are
>>
>> >
>>
>> >     - Arrow is able to leverage larger capacity of high performance
>>
>> > hybrid storage devices. e.g. high-end SSD, NVMe
>>
>> >
>>
>> >     - Mnemonic provide a potential opportunity for Arrow to
>>
>> > optimize/tuning its allocation algorithms as a native Arrow-oriented
>>
>> > allocation services
>>
>> >
>>
>> >     - The non-volatile features of  Mnemonic make it possible that
>>
>> > Arrow could make its columnar in-memory data shared between different
>>
>> > applications or across life-cycle of single application
>>
>> >
>>
>> >     - Arrow could take advantages of coming Mnemonic features of
>>
>> > memory clustering/DOG (distributed object graph) and massive native
>>
>> > computing
>>
>> >
>>
>> >     - Mnemonic helps to reduce the pressure of main memory utilization
>>
>> > and its related system wide overheads.
>>
>> >
>>
>> >    Our this patch is designed to minimize the changes for user to use
>>
>> > Arrow, please check out the test cases provided by this patch for your
>>
>> > reference.
>>
>> >
>>
>> >    Note that, we need to put allocator services to a specified
>>
>> > position (indicated by pom.xml) for Mnemonic backed Arrow related test
>>
>> > cases to run because those services are required for external
>>
>> > memory-like device management.
>>
>> >
>>
>> >    Please give your comments and review feedback for better
>>
>> > collaboration of Apache Arrow and Mnemonic, Thanks.
>>
>> >
>>
>> > Best Regards.
>>
>> > Gary.
>>
>> >
>>
>> >
>>
>> >
>>
>> <smime.p7m>
>> <gpgol000.txt>
>>

Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Reply via email to