Re: Apache Arrow at JupyterCon

Wes McKinney Wed, 06 Sep 2017 10:42:07 -0700

It should be possible to have an ArrowBuf backed by a
MappedByteBuffer. Anyone reading is welcome to dig in and write a
patch for this.


Semantically this is what we have done in C++ -- a memory map inherits
from arrow::Buffer, so we can slice and dice a memory map as we would
any other Buffer object

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/file.cc#L501

On Mon, Sep 4, 2017 at 4:05 AM, Gonzalo Ortiz Jaureguizar
<golthir...@gmail.com> wrote:
> This is a very interesting feature. It's very surprising that there is no
> ByteBuffer implementation backed on a MappedByteBuffer. As far as I
> understand, it should be trivial to implement (maybe not to pool) as
> usually ByteBuf is backed on a ByteBuffer and MappedByteBuffer extends
> that. But I didn't find implementations when I goggled for it.
>
> 2017-09-03 16:12 GMT+02:00 Wes McKinney <wesmck...@gmail.com>:
>
>> I think ideally we would have a Java interface that would support all of:
>>
>> - Memory mapped files
>> - Anonymous shared memory segments (e.g. POSIX shm)
>> - NVM / Mnemonic
>>
>> We already have the ability to do zero-copy reads from buffer-like
>> objects in C++ and IO interfaces that support zero copy (like memory
>> mapped files). We can do zero-copy reads from ArrowBuf in Java but we
>> are missing the interfaces to shared memory sources
>>
>> - Wes
>>
>> On Thu, Aug 31, 2017 at 5:09 PM, Gang(Gary) Wang <ga...@apache.org> wrote:
>> > Hi Wes,
>> >
>> > Thank you for the explanation. the usage of
>> > https://issues.apache.org/jira/browse/ARROW-721 could be directly
>> supported
>> > by Mnemonic through DurableBuffer and DurableChunk, the DurableChunk
>> makes
>> > use of unsafe to expose a plain memory space for Arrow to use without
>> > performance penalties. that's why most of the big data frameworks take
>> the
>> > advantage of unsafe, please refer to
>> > https://mnemonic.apache.org/docs/domusecases.html for the use cases. we
>> > could work on this ticket if you think that's exactly what you want.
>> >
>> > Regarding the NVM tech., that is what Mnemonic created for. it could be
>> > used to directly persist Java generic objects and collection on NVM with
>> no
>> > SerDe. so what kind of basic tools you mentioned? probably,  we can help
>> > also identify the gaps for Mnemonic as well. Thanks!
>> >
>> > Very truly yours,
>> > Gary
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Aug 31, 2017 at 12:32 PM, Wes McKinney <wesmck...@gmail.com>
>> wrote:
>> >
>> >> hi Gary,
>> >>
>> >> The Java libraries are not yet capable of writing or zero-copy reads
>> >> of Arrow datasets to/from shared memory or memory-mapped files:
>> >> https://issues.apache.org/jira/browse/ARROW-721. We've developed quite
>> >> a bit of technology on the C++ side for dealing with shared memory IPC
>> >> but we need someone to help with that on the Java side.
>> >>
>> >> In the context of NVM technologies, it would be nice to be able to
>> >> persist a dataset to NVM and continue to do analytics on it, while
>> >> retaining a "handle" so that the dataset can be easily recovered in
>> >> the event of process failure. We may arrive at new use cases once some
>> >> of the basic tools exist.
>> >>
>> >> - Wes
>> >>
>> >> On Wed, Aug 30, 2017 at 6:19 PM, Gang(Gary) Wang <ga...@apache.org>
>> wrote:
>> >> > Thank you for sharing the videos. We are very interested in how to
>> >> support
>> >> > Arrow data format and collection very closely, could you please help
>> to
>> >> > point out which interfaces to allow Mnemonic act as a memory provider
>> for
>> >> > the user to store and access Arrow managed datasets ? Thanks!
>> >> >
>> >> > Very truly yours,
>> >> > Gary.
>> >> >
>> >> >
>> >> > On Wed, Aug 30, 2017 at 2:11 PM, Ivan Sadikov <ivan.sadi...@gmail.com
>> >
>> >> > wrote:
>> >> >
>> >> >> Great presentation! Thank you for sharing.
>> >> >>
>> >> >>
>> >> >> On Thu, 31 Aug 2017 at 8:02 AM, Wes McKinney <wesmck...@gmail.com>
>> >> wrote:
>> >> >>
>> >> >> > Absolutely. I will do that now
>> >> >> >
>> >> >> > On Wed, Aug 30, 2017 at 3:33 PM, Julian Hyde <jh...@apache.org>
>> >> wrote:
>> >> >> > > Thanks for sharing. Can we tweet those videos as well? I see that
>> >> >> > https://twitter.com/apachearrow <https://twitter.com/apachearrow>
>> >> only
>> >> >> > tweeted your slides.
>> >> >> > >
>> >> >> > >> On Aug 26, 2017, at 1:11 PM, Wes McKinney <wesmck...@gmail.com>
>> >> >> wrote:
>> >> >> > >>
>> >> >> > >> hi all,
>> >> >> > >>
>> >> >> > >> In case folks here are interested, I gave a keynote this week at
>> >> >> > >> JupyterCon explaining my motivations for being involved in
>> Apache
>> >> >> > >> Arrow and how I see it fitting in with the data science
>> ecosystem
>> >> long
>> >> >> > >> term:
>> >> >> > >>
>> >> >> > >> https://www.youtube.com/watch?v=wdmf1msbtVs
>> >> >> > >>
>> >> >> > >> I also gave an interview going a little deeper into some of the
>> >> topics
>> >> >> > >> from the talk:
>> >> >> > >>
>> >> >> > >> https://www.youtube.com/watch?v=Q7y9l-L8yiU
>> >> >> > >>
>> >> >> > >> I believe we have an exciting journey ahead of us, but it's
>> >> certainly
>> >> >> > >> going to take a lot of collaboration and community development.
>> >> >> > >>
>> >> >> > >> - Wes
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>

Re: Apache Arrow at JupyterCon

Reply via email to