Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Henry Saputra Wed, 30 Mar 2016 17:18:06 -0700

The community for both podlings are bigger than the ones show up at Strata
=)


Would love to have the summary of the discussions in the dev@ list if
indeed some discussions happening at Strata.

- Henry

On Wed, Mar 30, 2016 at 5:03 PM, Wang, Yanping <yanping.w...@intel.com>
wrote:

> Hi, All
>
> I met with Jacques today at Strata, we think it would be great that Arrow
> and Mnemonic communities can have a F2F meeting together to talk about our
> integration.
> I have following two days, 4/11 Monday afternoon, or 4/15 Friday.
> We can meet at  intel SC campus.
>
> Would you let me know if you are able to join us and which day you'd
> prefer?
>
> Thanks
> Yanping
>
>
> On Mar 29, 2016, at 4:38 PM, Gary <ga...@apache.org<mailto:
> ga...@apache.org>> wrote:
>
> Yes, I agree with you and that's great if we could brainstorm here to
> collect more ideas about enabling non-volatile memory usage for Apache
> Arrow through Mnemonic.
>
> for the questions, my ideas are:
>
>
> - Right now you are using unpooled persistent memory. Does that make sense
> or does chunking make more sense?
>
> Gary: I think it could make some sense if developer knows that their
> datasets are very big and they want Apache Arrow to keep most of them in
> memory for intensive computing e.g. sort.
>           the developer certainly can spill their Mnemonic managed
> datasets into disk but this way seems a bit inefficient in some scenarios
> that might depend on concrete application logic .
>
>
> - What do you think is the right way to transition back and forth between
> persistent and ephemeral memory? What do you think will be the first
> pattern to be adopted. For example, do you think we should try to use it as
> a tiered storage for sort spilling (before hitting the disk), or should we
> use it for caching?
> Gary: my 2 cents, the netty library looks not yet provide a elegant switch
> mechanism for Arrow to use, probably we can change the logic around
> "initialCapacity > directArena.chunkSize" to control which buffer put on
> off-heap or managed by Mnemonic, another approach is to let memory
> clustering mechanism of Mnemonic managing hybrid memory-like spaces instead
> of part logics of class PooledByteBufAllocatorL.
> Regarding the sorting, I think it is a typical case of random access to
> the data, we should avoid spilling as much as possible.
> my 2 cents, the performance could be
> all in off-heap if possible > mnemonic used as cache > all in mnemonic
> using NVMe/disk >  off-heap + spilling
> the code simplicity would be
> all in off-heap if possible >  all in mnemonic using NVMe/disk > mnemonic
> used as cache >  off-heap + spilling
>
> the reason why the mode "mnemonic used as cache + spilling" probably
> unnecessary is mnemonic could provide nearly equivalent capacity of disk.
>
> Thanks.
> Gary.
>
>
> -----Original Message-----
>
> From: Jacques Nadeau [mailto:jacq...@apache.org]
>
> Sent: Tuesday, March 29, 2016 8:05 AM
>
> To: <mailto:dev@arrow.apache.org> dev@arrow.apache.org<mailto:
> dev@arrow.apache.org>
>
> Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra.
> for Apache Arrow
>
>
>
> This is super cool. A couple of questions:
>
>
>
> - Right now you are using unpooled persistent memory. Does that make sense
> or does chunking make more sense?
>
> - What do you think is the right way to transition back and forth between
> persistent and ephemeral memory? What do you think will be the first
> pattern to be adopted. For example, do you think we should try to use it as
> a tiered storage for sort spilling (before hitting the disk), or should we
> use it for caching?
>
>
>
> I think it will be much easier to think about this in the context of a
> primary or first use case. Do you have something in mind or should we
> brainstorm here?
>
>
>
> On Wed, Mar 23, 2016 at 7:16 PM, Gary <ga...@apache.org<mailto:
> ga...@apache.org>> wrote:
>
>
>
> > Hello,
>
> >
>
> >    We have created a patch for Apache Arrow to leverage Apache
>
> > incubator Mnemonic as an alternative infra. for underlying memory
>
> > resources allocation, you can find it as below forked repo.
>
> >
>
> > <https://github.com/NonVolatileComputing/arrow>
> https://github.com/NonVolatileComputing/arrow
>
> >
>
> >     By this way, Apache Arrow could take some structural benefits from
>
> > Mnemonic project they are
>
> >
>
> >     - Arrow is able to leverage larger capacity of high performance
>
> > hybrid storage devices. e.g. high-end SSD, NVMe
>
> >
>
> >     - Mnemonic provide a potential opportunity for Arrow to
>
> > optimize/tuning its allocation algorithms as a native Arrow-oriented
>
> > allocation services
>
> >
>
> >     - The non-volatile features of  Mnemonic make it possible that
>
> > Arrow could make its columnar in-memory data shared between different
>
> > applications or across life-cycle of single application
>
> >
>
> >     - Arrow could take advantages of coming Mnemonic features of
>
> > memory clustering/DOG (distributed object graph) and massive native
>
> > computing
>
> >
>
> >     - Mnemonic helps to reduce the pressure of main memory utilization
>
> > and its related system wide overheads.
>
> >
>
> >    Our this patch is designed to minimize the changes for user to use
>
> > Arrow, please check out the test cases provided by this patch for your
>
> > reference.
>
> >
>
> >    Note that, we need to put allocator services to a specified
>
> > position (indicated by pom.xml) for Mnemonic backed Arrow related test
>
> > cases to run because those services are required for external
>
> > memory-like device management.
>
> >
>
> >    Please give your comments and review feedback for better
>
> > collaboration of Apache Arrow and Mnemonic, Thanks.
>
> >
>
> > Best Regards.
>
> > Gary.
>
> >
>
> >
>
> >
>
> <smime.p7m>
> <gpgol000.txt>
>

Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Reply via email to