RE: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Wang, Yanping Wed, 30 Mar 2016 23:16:01 -0700

Yeah, I was so busy and in hurry to catch other sessions. We only talked about 
2 minutes :-) 
After Jacques and Wes's Arrow presentation, someone in audiences asked if Arrow 
is going to use RDMA, I answered: RDMA is going to be used in Mnemonic project 
to support data transfer among nodes and clusters. 
It makes perfect sense we position Mnemonic under Arrow to support its use of 
persistent storage media.


Thanks Patrick, Henry, Tayler G for the guideline. We can brainstorm ideas in 
both dev lists, and post those ideas in jira so developers can see where our 
projects are heading to. 
Gary and I are located in Portland Oregon, we usually plan our SC visits 2 
weeks ahead. 

Thanks,
Yanping


-----Original Message-----
From: Jacques Nadeau [mailto:[email protected]] 
Sent: Wednesday, March 30, 2016 7:34 PM
To: [email protected]
Cc: [email protected]; [email protected]
Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for 
Apache Arrow

Yup. Will do.

The discussion today was limited to "let's meet".



On Wed, Mar 30, 2016 at 7:13 PM, P. Taylor Goetz <[email protected]> wrote:

> +1
>
> Discussions should be summarized and brought back to the mailing list(s).
> Recommendations are fine, but any decisions should be made on-list.
>
> -Taylor
>
> > On Mar 30, 2016, at 8:31 PM, Patrick Hunt <[email protected]> wrote:
> >
> > Remember that no decisions should be made at the meeting. It's fine to
> > have discussions, but those need to be brought back to the community
> > before decisions are made. Summarizing for the dev@ mailing list, also
> > jiras, etc... are good ways to socialize the issues.
> >
> > Patrick
> >
> >> On Wed, Mar 30, 2016 at 5:17 PM, Henry Saputra <[email protected]>
> wrote:
> >> The community for both podlings are bigger than the ones show up at
> Strata
> >> =)
> >>
> >> Would love to have the summary of the discussions in the dev@ list if
> >> indeed some discussions happening at Strata.
> >>
> >> - Henry
> >>
> >> On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <[email protected]>
> >> wrote:
> >>
> >>> Hi, All
> >>>
> >>> I met with Jacques today at Strata, we think it would be great that
> Arrow
> >>> and Mnemonic communities can have a F2F meeting together to talk about
> our
> >>> integration.
> >>> I have following two days, 4/11 Monday afternoon, or 4/15 Friday.
> >>> We can meet at  intel SC campus.
> >>>
> >>> Would you let me know if you are able to join us and which day you'd
> >>> prefer?
> >>>
> >>> Thanks
> >>> Yanping
> >>>
> >>>
> >>> On Mar 29, 2016, at 4:38 PM, Gary <[email protected]<mailto:
> >>> [email protected]>> wrote:
> >>>
> >>> Yes, I agree with you and that's great if we could brainstorm here to
> >>> collect more ideas about enabling non-volatile memory usage for Apache
> >>> Arrow through Mnemonic.
> >>>
> >>> for the questions, my ideas are:
> >>>
> >>>
> >>> - Right now you are using unpooled persistent memory. Does that make
> sense
> >>> or does chunking make more sense?
> >>>
> >>> Gary: I think it could make some sense if developer knows that their
> >>> datasets are very big and they want Apache Arrow to keep most of them
> in
> >>> memory for intensive computing e.g. sort.
> >>>          the developer certainly can spill their Mnemonic managed
> >>> datasets into disk but this way seems a bit inefficient in some
> scenarios
> >>> that might depend on concrete application logic .
> >>>
> >>>
> >>> - What do you think is the right way to transition back and forth
> between
> >>> persistent and ephemeral memory? What do you think will be the first
> >>> pattern to be adopted. For example, do you think we should try to use
> it as
> >>> a tiered storage for sort spilling (before hitting the disk), or
> should we
> >>> use it for caching?
> >>> Gary: my 2 cents, the netty library looks not yet provide a elegant
> switch
> >>> mechanism for Arrow to use, probably we can change the logic around
> >>> "initialCapacity > directArena.chunkSize" to control which buffer put
> on
> >>> off-heap or managed by Mnemonic, another approach is to let memory
> >>> clustering mechanism of Mnemonic managing hybrid memory-like spaces
> instead
> >>> of part logics of class PooledByteBufAllocatorL.
> >>> Regarding the sorting, I think it is a typical case of random access to
> >>> the data, we should avoid spilling as much as possible.
> >>> my 2 cents, the performance could be
> >>> all in off-heap if possible > mnemonic used as cache > all in mnemonic
> >>> using NVMe/disk >  off-heap + spilling
> >>> the code simplicity would be
> >>> all in off-heap if possible >  all in mnemonic using NVMe/disk >
> mnemonic
> >>> used as cache >  off-heap + spilling
> >>>
> >>> the reason why the mode "mnemonic used as cache + spilling" probably
> >>> unnecessary is mnemonic could provide nearly equivalent capacity of
> disk.
> >>>
> >>> Thanks.
> >>> Gary.
> >>>
> >>>
> >>> -----Original Message-----
> >>>
> >>> From: Jacques Nadeau [mailto:[email protected]]
> >>>
> >>> Sent: Tuesday, March 29, 2016 8:05 AM
> >>>
> >>> To: <mailto:[email protected]> [email protected]<mailto:
> >>> [email protected]>
> >>>
> >>> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative
> infra.
> >>> for Apache Arrow
> >>>
> >>>
> >>>
> >>> This is super cool. A couple of questions:
> >>>
> >>>
> >>>
> >>> - Right now you are using unpooled persistent memory. Does that make
> sense
> >>> or does chunking make more sense?
> >>>
> >>> - What do you think is the right way to transition back and forth
> between
> >>> persistent and ephemeral memory? What do you think will be the first
> >>> pattern to be adopted. For example, do you think we should try to use
> it as
> >>> a tiered storage for sort spilling (before hitting the disk), or
> should we
> >>> use it for caching?
> >>>
> >>>
> >>>
> >>> I think it will be much easier to think about this in the context of a
> >>> primary or first use case. Do you have something in mind or should we
> >>> brainstorm here?
> >>>
> >>>
> >>>
> >>> On Wed, Mar 23, 2016 at 7:16 PM, Gary <[email protected]<mailto:
> >>> [email protected]>> wrote:
> >>>
> >>>
> >>>
> >>>> Hello,
> >>>
> >>>
> >>>>   We have created a patch for Apache Arrow to leverage Apache
> >>>
> >>>> incubator Mnemonic as an alternative infra. for underlying memory
> >>>
> >>>> resources allocation, you can find it as below forked repo.
> >>>
> >>>
> >>>> <https://github.com/NonVolatileComputing/arrow>
> >>> https://github.com/NonVolatileComputing/arrow
> >>>
> >>>
> >>>>    By this way, Apache Arrow could take some structural benefits from
> >>>
> >>>> Mnemonic project they are
> >>>
> >>>
> >>>>    - Arrow is able to leverage larger capacity of high performance
> >>>
> >>>> hybrid storage devices. e.g. high-end SSD, NVMe
> >>>
> >>>
> >>>>    - Mnemonic provide a potential opportunity for Arrow to
> >>>
> >>>> optimize/tuning its allocation algorithms as a native Arrow-oriented
> >>>
> >>>> allocation services
> >>>
> >>>
> >>>>    - The non-volatile features of  Mnemonic make it possible that
> >>>
> >>>> Arrow could make its columnar in-memory data shared between different
> >>>
> >>>> applications or across life-cycle of single application
> >>>
> >>>
> >>>>    - Arrow could take advantages of coming Mnemonic features of
> >>>
> >>>> memory clustering/DOG (distributed object graph) and massive native
> >>>
> >>>> computing
> >>>
> >>>
> >>>>    - Mnemonic helps to reduce the pressure of main memory utilization
> >>>
> >>>> and its related system wide overheads.
> >>>
> >>>
> >>>>   Our this patch is designed to minimize the changes for user to use
> >>>
> >>>> Arrow, please check out the test cases provided by this patch for your
> >>>
> >>>> reference.
> >>>
> >>>
> >>>>   Note that, we need to put allocator services to a specified
> >>>
> >>>> position (indicated by pom.xml) for Mnemonic backed Arrow related test
> >>>
> >>>> cases to run because those services are required for external
> >>>
> >>>> memory-like device management.
> >>>
> >>>
> >>>>   Please give your comments and review feedback for better
> >>>
> >>>> collaboration of Apache Arrow and Mnemonic, Thanks.
> >>>
> >>>
> >>>> Best Regards.
> >>>
> >>>> Gary.
> >>>
> >>>
> >>>
> >>>
> >>> <smime.p7m>
> >>> <gpgol000.txt>
> >>>
>

RE: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Reply via email to