Two proposals for expanding arrow Table API (virtual arrays and random access)

Radu Teodorescu Wed, 17 Jun 2020 12:49:16 -0700

Hi folks,
While I’ve been communicating with some members of this group in the past, this 
is my first official post so please excuse/correct/guide me as needed.


Logistics first:
I put most of the content of my proposals in google doc, but if more 
appropriate, we can keep the conversation going by email.
Also the two proposals are pretty independent, so if needed we can break it 
into two separate email threads, but for now I wanted to keep the spam low

Actual proposals:
Virtual Array - The idea is to be able to handle arrow Tables where some of the 
column data is not (yet) available in memory. For example a Table can map to a 
parquet file, create VirtualArrays for each column chunk and only read the 
actual content if and when the Array is touched.
Virtualize arrow Table 
<https://docs.google.com/document/d/1qXSHSgMZtjNGzWrqDxoBisSoR6gbnRiEztnYihNGLsI/edit?usp=sharing>
Random Access - I find that “application state” for most large scale systems is 
compatible with low level vectorized arrow representation and I propose a 
number of API expansions that would enable thread safe data mutation and 
efficient random access. 
Arrow arrays random access 
<https://docs.google.com/document/d/1tIsOhN6mfIAy6F8XRxeKRIqPBN0gKbcmrp2QJ4L3hJ8/edit?usp=sharing>
Please let me know what you think and what is the best course of action moving 
forward.
Thank you
Radu

Two proposals for expanding arrow Table API (virtual arrays and random access)

Reply via email to