Re: Apache Arrow at JupyterCon

Wes McKinney Wed, 06 Sep 2017 16:47:20 -0700

Thanks Gary, that is helpful context. In light if this, it might be
worth writing some kind of a proposal for how to enable the Java
vector classes to be backed by some other kind of byte buffers. It
might be that an alternative version of portions of the Arrow Java
library (i.e. decoupled from Netty) might need to be created.


If it cannot be reconciled with the Netty AbstractByteBuf class then
this would be useful to know so that Arrow developers can plan
accordingly for the future.

On Wed, Sep 6, 2017 at 2:15 PM, Gary Wong <[email protected]> wrote:
> The ArrowBuf is inherited from AbstractByteBuf, the AbstractByteBuf is
> defined in the Netty library, it does more like a memory pool not a pure
> buffer so that's why ArrowBuf is not backed by ByteBuffer as now.
>
> I have ever tried to make ArrowBuf build on top of DurableBuffer of
> Mnemonic, but looks it is not very easy to decouple the refcount from other
> parts because the lifecycle of DurableBuffer could also be managed by
> JVM automatically instead of using refcount.
>
> I still want to figure out how gracefully to migrate the backend of
> ArrowBuf from Netty to Mnemonic. In addition, DurableBuffer could bring
> other benefits for Arrow e.g. persistent on any kind of memory service that
> could make use of SSD, NVMe, Memory and NAS and more. in this way, Arrow is
> able to break through the capacity limitation of system memory, avoid the
> SerDe for storage and link other durable objects with ease and etc.
>
>
>
>
> On Wed, Sep 6, 2017 at 10:40 AM, Wes McKinney <[email protected]> wrote:
>
>> It should be possible to have an ArrowBuf backed by a
>> MappedByteBuffer. Anyone reading is welcome to dig in and write a
>> patch for this.
>>
>> Semantically this is what we have done in C++ -- a memory map inherits
>> from arrow::Buffer, so we can slice and dice a memory map as we would
>> any other Buffer object
>>
>> https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L501
>>
>> On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
>> <[email protected]> wrote:
>> > This is a very interesting feature. It's very surprising that there is no
>> > ByteBuffer implementation backed on a MappedByteBuffer. As far as I
>> > understand, it should be trivial to implement (maybe not to pool) as
>> > usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer extends
>> > that. But I didn't find implementations when I goggled for it.
>> >
>> > 2017-09-03 16:12 GMT+02:00 Wes McKinney <[email protected]>:
>> >
>> >> I think ideally we would have a Java interface that would support all
>> of:
>> >>
>> >> - Memory mapped files
>> >> - Anonymous shared memory segments (e.g. POSIX shm)
>> >> - NVM / Mnemonic
>> >>
>> >> We already have the ability to do zero-copy reads from buffer-like
>> >> objects in C++ and IO interfaces that support zero copy (like memory
>> >> mapped files). We can do zero-copy reads from ArrowBuf in Java but we
>> >> are missing the interfaces to shared memory sources
>> >>
>> >> - Wes
>> >>
>> >> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <[email protected]>
>> wrote:
>> >> > Hi Wes,
>> >> >
>> >> > Thank you for the explanation. the usage of
>> >> > https://issues.apache.org/jira/browse/ARROW-721 could be directly
>> >> supported
>> >> > by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk
>> >> makes
>> >> > use of unsafe to expose a plain memory space for Arrow to use without
>> >> > performance penalties. that's why most of the big data frameworks take
>> >> the
>> >> > advantage of unsafe, please refer to
>> >> > https://mnemonic.apache.org/docs/domusecases.html for the use cases.
>> we
>> >> > could work on this ticket if you think that's exactly what you want.
>> >> >
>> >> > Regarding the NVM tech., that is what Mnemonic created for. it could
>> be
>> >> > used to directly persist Java generic objects and collection on NVM
>> with
>> >> no
>> >> > SerDe. so what kind of basic tools you mentioned? probably,  we can
>> help
>> >> > also identify the gaps for Mnemonic as well. Thanks!
>> >> >
>> >> > Very truly yours,
>> >> > Gary
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <[email protected]>
>> >> wrote:
>> >> >
>> >> >> hi Gary,
>> >> >>
>> >> >> The Java libraries are not yet capable of writing or zero-copy reads
>> >> >> of Arrow datasets to/from shared memory or memory-mapped files:
>> >> >> https://issues.apache.org/jira/browse/ARROW-721. We've developed
>> quite
>> >> >> a bit of technology on the C++ side for dealing with shared memory
>> IPC
>> >> >> but we need someone to help with that on the Java side.
>> >> >>
>> >> >> In the context of NVM technologies, it would be nice to be able to
>> >> >> persist a dataset to NVM and continue to do analytics on it, while
>> >> >> retaining a "handle" so that the dataset can be easily recovered in
>> >> >> the event of process failure. We may arrive at new use cases once
>> some
>> >> >> of the basic tools exist.
>> >> >>
>> >> >> - Wes
>> >> >>
>> >> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <[email protected]>
>> >> wrote:
>> >> >> > Thank you for sharing the videos. We are very interested in how to
>> >> >> support
>> >> >> > Arrow data format and collection very closely, could you please
>> help
>> >> to
>> >> >> > point out which interfaces to allow Mnemonic act as a memory
>> provider
>> >> for
>> >> >> > the user to store and access Arrow managed datasets ? Thanks!
>> >> >> >
>> >> >> > Very truly yours,
>> >> >> > Gary.
>> >> >> >
>> >> >> >
>> >> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <
>> [email protected]
>> >> >
>> >> >> > wrote:
>> >> >> >
>> >> >> >> Great presentation! Thank you for sharing.
>> >> >> >>
>> >> >> >>
>> >> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <[email protected]
>> >
>> >> >> wrote:
>> >> >> >>
>> >> >> >> > Absolutely. I will do that now
>> >> >> >> >
>> >> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <[email protected]>
>> >> >> wrote:
>> >> >> >> > > Thanks for sharing. Can we tweet those videos as well? I see
>> that
>> >> >> >> > https://twitter.com/apachearrow <https://twitter.com/
>> apachearrow>
>> >> >> only
>> >> >> >> > tweeted your slides.
>> >> >> >> > >
>> >> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <
>> [email protected]>
>> >> >> >> wrote:
>> >> >> >> > >>
>> >> >> >> > >> hi all,
>> >> >> >> > >>
>> >> >> >> > >> In case folks here are interested, I gave a keynote this
>> week at
>> >> >> >> > >> JupyterCon explaining my motivations for being involved in
>> >> Apache
>> >> >> >> > >> Arrow and how I see it fitting in with the data science
>> >> ecosystem
>> >> >> long
>> >> >> >> > >> term:
>> >> >> >> > >>
>> >> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
>> >> >> >> > >>
>> >> >> >> > >> I also gave an interview going a little deeper into some of
>> the
>> >> >> topics
>> >> >> >> > >> from the talk:
>> >> >> >> > >>
>> >> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
>> >> >> >> > >>
>> >> >> >> > >> I believe we have an exciting journey ahead of us, but it's
>> >> >> certainly
>> >> >> >> > >> going to take a lot of collaboration and community
>> development.
>> >> >> >> > >>
>> >> >> >> > >> - Wes
>> >> >> >> > >
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>>

Re: Apache Arrow at JupyterCon

Reply via email to