Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Wang, Yanping Wed, 30 Mar 2016 17:04:34 -0700

Hi, All

I met with Jacques today at Strata, we think it would be great that Arrow and 
Mnemonic communities can have a F2F meeting together to talk about our 
integration.
I have following two days, 4/11 Monday afternoon, or 4/15 Friday.
We can meet at  intel SC campus.


Would you let me know if you are able to join us and which day you'd prefer?

Thanks
Yanping


On Mar 29, 2016, at 4:38 PM, Gary <[email protected]<mailto:[email protected]>> 
wrote:

Yes, I agree with you and that's great if we could brainstorm here to collect 
more ideas about enabling non-volatile memory usage for Apache Arrow through 
Mnemonic.

for the questions, my ideas are:


- Right now you are using unpooled persistent memory. Does that make sense or 
does chunking make more sense?

Gary: I think it could make some sense if developer knows that their datasets 
are very big and they want Apache Arrow to keep most of them in memory for 
intensive computing e.g. sort.
          the developer certainly can spill their Mnemonic managed datasets 
into disk but this way seems a bit inefficient in some scenarios that might 
depend on concrete application logic .


- What do you think is the right way to transition back and forth between 
persistent and ephemeral memory? What do you think will be the first pattern to 
be adopted. For example, do you think we should try to use it as a tiered 
storage for sort spilling (before hitting the disk), or should we use it for 
caching?
Gary: my 2 cents, the netty library looks not yet provide a elegant switch 
mechanism for Arrow to use, probably we can change the logic around 
"initialCapacity > directArena.chunkSize" to control which buffer put on 
off-heap or managed by Mnemonic, another approach is to let memory clustering 
mechanism of Mnemonic managing hybrid memory-like spaces instead of part logics 
of class PooledByteBufAllocatorL.
Regarding the sorting, I think it is a typical case of random access to the 
data, we should avoid spilling as much as possible.
my 2 cents, the performance could be
all in off-heap if possible > mnemonic used as cache > all in mnemonic using 
NVMe/disk >  off-heap + spilling
the code simplicity would be
all in off-heap if possible >  all in mnemonic using NVMe/disk > mnemonic used 
as cache >  off-heap + spilling

the reason why the mode "mnemonic used as cache + spilling" probably 
unnecessary is mnemonic could provide nearly equivalent capacity of disk.

Thanks.
Gary.


-----Original Message-----

From: Jacques Nadeau [mailto:[email protected]]

Sent: Tuesday, March 29, 2016 8:05 AM

To: <mailto:[email protected]> 
[email protected]<mailto:[email protected]>

Subject: Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for 
Apache Arrow



This is super cool. A couple of questions:



- Right now you are using unpooled persistent memory. Does that make sense or 
does chunking make more sense?

- What do you think is the right way to transition back and forth between 
persistent and ephemeral memory? What do you think will be the first pattern to 
be adopted. For example, do you think we should try to use it as a tiered 
storage for sort spilling (before hitting the disk), or should we use it for 
caching?



I think it will be much easier to think about this in the context of a primary 
or first use case. Do you have something in mind or should we brainstorm here?



On Wed, Mar 23, 2016 at 7:16 PM, Gary 
<[email protected]<mailto:[email protected]>> wrote:



> Hello,

>

>    We have created a patch for Apache Arrow to leverage Apache

> incubator Mnemonic as an alternative infra. for underlying memory

> resources allocation, you can find it as below forked repo.

>

> <https://github.com/NonVolatileComputing/arrow> 
> https://github.com/NonVolatileComputing/arrow

>

>     By this way, Apache Arrow could take some structural benefits from

> Mnemonic project they are

>

>     - Arrow is able to leverage larger capacity of high performance

> hybrid storage devices. e.g. high-end SSD, NVMe

>

>     - Mnemonic provide a potential opportunity for Arrow to

> optimize/tuning its allocation algorithms as a native Arrow-oriented

> allocation services

>

>     - The non-volatile features of  Mnemonic make it possible that

> Arrow could make its columnar in-memory data shared between different

> applications or across life-cycle of single application

>

>     - Arrow could take advantages of coming Mnemonic features of

> memory clustering/DOG (distributed object graph) and massive native

> computing

>

>     - Mnemonic helps to reduce the pressure of main memory utilization

> and its related system wide overheads.

>

>    Our this patch is designed to minimize the changes for user to use

> Arrow, please check out the test cases provided by this patch for your

> reference.

>

>    Note that, we need to put allocator services to a specified

> position (indicated by pom.xml) for Mnemonic backed Arrow related test

> cases to run because those services are required for external

> memory-like device management.

>

>    Please give your comments and review feedback for better

> collaboration of Apache Arrow and Mnemonic, Thanks.

>

> Best Regards.

> Gary.

>

>

>

<smime.p7m>
<gpgol000.txt>

Re: A Proposal Apache Incubator Mnemonic as an alternative infra. for Apache Arrow

Reply via email to