All I've been following along with great interest, and have a n00b question.
What happens when any of the arrow-capable processing tools needs to work with a data set or structure that is bigger than a single node's memory capacity? Does arrow itself handle the distribution of the resulting columnar representation over multiple nodes Or is this the responsibility of the framework/tool/data service that uses arrow? (i know arrow is defined to be a library, but i did see mentions of scatter gather reads and RDMA, so i'm somewhat confused) On the same topic, is there any intent of using a distributed shared memory like http://grappa.io? the PGAS (partitioned global address space) model seems like a natural fit for arrow's aspirations Thanks Venkat