Re: How to model massive nested data

2018-05-10 Thread Wes McKinney
hi Tyler, I am not sure the Arrow Java libraries have yet been used for interacting with larger than memory datasets, but this would be a good opportunity to try to get this working. In the C++ libraries, any Arrow data structures can easily reference memory-mapped data on disk; none of the data

Re: How to model massive nested data

2018-05-10 Thread Martin Durant
This is not directly relevant here, but has anyone looked into oamap ( https://github.com/diana-hep/oamap ), which is capable of using numba to compile python functions which traverse nested data structures down to the basic leaf nodes, without creating intermediate python objects. Then the person

Re: How to model massive nested data

2018-05-10 Thread Lukasz Cwik
Is it also possible to iterate over the iterator more then once. Can I have multiple iterators at different positions for iterator all working independently? On Thu, May 10, 2018 at 12:22 PM Tyler Akidau wrote: > Hello Arrow folks, > > I've been skimming through the Arrow docs and code trying to

How to model massive nested data

2018-05-10 Thread Tyler Akidau
Hello Arrow folks, I've been skimming through the Arrow docs and code trying to figure out how one might model nested data structures where the nested portions themselves might be massive (i.e., larger than available memory). AFAICT, the nesting constructs in Arrow appear to assume that you can al