Is it also possible to iterate over the iterator<T2> more then once. Can I have multiple iterators at different positions for iterator<T2> all working independently?
On Thu, May 10, 2018 at 12:22 PM Tyler Akidau <taki...@apache.org> wrote: > Hello Arrow folks, > > I've been skimming through the Arrow docs and code trying to figure out > how one might model nested data structures where the nested portions > themselves might be massive (i.e., larger than available memory). AFAICT, > the nesting constructs in Arrow appear to assume that you can always fit an > entire single record in memory. Am I right? > > Regardless, is there a recommended way of handling this use case? The > thing we want to be able to model is essentially (in Java): > Iterator<Pair<T1, Iterator<T2>>. So each row is a single T1 value > associated with an arbitrarily large list of T2 values. > > I could imagine perhaps flattening the hierarchy down into a schema that's > essentially Pair<T1, T2>, especially if the T1 and T2 values can be > optional. So say I had two rows with T1 values of A and B and T2 lists of > [1, 2] and [3] respectively (i.e., [<A, [1, 2]>, <B, [3]>]), then you could > just have rows and columns like this: > > T1 | T2 > --------------- > A | <null> > <null> | 1 > <null> | 2 > B | <null> > <null> | 3 > > And then you'd presumably need to write wrapper code on top of Arrow to > marshal all of this under an appropriate set of Interfaces. > > Is there a good way to handle this use case in Arrow as it exists today? > If not, do you have a sense for how hard would it be to add support for > something like this more natively? > > -Tyler > >