Re: Distributed arrow?

Wes McKinney Wed, 09 Mar 2016 08:03:17 -0800

For a system to use Arrow, it only needs to be able to understand the
columnar memory layout (and typically the metadata -- still to be
defined). Separately, many of us are interested in developing tools to
share Arrow data through various shared memory mechanisms (for
example, memory-mapped files and other OS-level shared memory tools).
You wouldn't be required to use a particular memory-sharing protocol,
though, so that is the responsibility of the application developer.


- Wes

On Wed, Mar 9, 2016 at 4:02 AM, Yiannis Gkoufas <johngou...@gmail.com> wrote:
> Hi Venkat,
>
> as far as I understand, arrow works on infrastructures which already
> support RDMA and it's not included in the core arrow library.
> Can the arrow developers confirm that assumption?
>
> Thanks
>
> On 5 March 2016 at 01:54, Venkat Krishnamurthy <nivik...@gmail.com> wrote:
>
>> All
>>
>> I've been following along with great interest, and have a n00b question.
>>
>> What happens when any of the arrow-capable processing tools needs to work
>> with a data set or structure that is bigger than a single node's memory
>> capacity? Does arrow itself handle the distribution of the resulting
>> columnar representation over multiple nodes
>>
>> Or is this the responsibility of the framework/tool/data service that uses
>> arrow?
>>
>> (i know arrow is defined to be a library, but i did see mentions of scatter
>> gather reads and RDMA, so i'm somewhat confused)
>>
>> On the same topic, is there any intent of using a distributed shared memory
>> like http://grappa.io? the PGAS (partitioned global address space) model
>> seems like a natural fit for arrow's aspirations
>>
>> Thanks
>> Venkat
>>

Re: Distributed arrow?

Reply via email to