Hi folks, While I’ve been communicating with some members of this group in the past, this is my first official post so please excuse/correct/guide me as needed.
Logistics first: I put most of the content of my proposals in google doc, but if more appropriate, we can keep the conversation going by email. Also the two proposals are pretty independent, so if needed we can break it into two separate email threads, but for now I wanted to keep the spam low Actual proposals: Virtual Array - The idea is to be able to handle arrow Tables where some of the column data is not (yet) available in memory. For example a Table can map to a parquet file, create VirtualArrays for each column chunk and only read the actual content if and when the Array is touched. Virtualize arrow Table <https://docs.google.com/document/d/1qXSHSgMZtjNGzWrqDxoBisSoR6gbnRiEztnYihNGLsI/edit?usp=sharing> Random Access - I find that “application state” for most large scale systems is compatible with low level vectorized arrow representation and I propose a number of API expansions that would enable thread safe data mutation and efficient random access. Arrow arrays random access <https://docs.google.com/document/d/1tIsOhN6mfIAy6F8XRxeKRIqPBN0gKbcmrp2QJ4L3hJ8/edit?usp=sharing> Please let me know what you think and what is the best course of action moving forward. Thank you Radu