Re: Serializing nested pandas dataframes

2020-10-31 Thread Adam Lippai
Hi, this sounds really promising. I'm curious how JS handles structarrays, but in theory it should work. Best regards, Adam Lippai On Fri, Oct 30, 2020 at 3:07 PM Benjamin Kietzman wrote: > Hi Adam, > > Arrow does not support nesting tables inside other tables. However, a > record batch > is i

Re: Serializing nested pandas dataframes

2020-10-30 Thread Benjamin Kietzman
Hi Adam, Arrow does not support nesting tables inside other tables. However, a record batch is interchangeable with a struct array so you could achieve something similar by converting from a RecordBatch with columns `...c` to a StructArray with child arrays `...c`. In C++ we have /RecordBatch::{To

Re: Serializing nested pandas dataframes

2020-10-29 Thread Adam Lippai
This is what I want to extend for multiple tables: https://issues.apache.org/jira/browse/ARROW-10045?focusedCommentId=17207790&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17207790 I would need to come up with custom binary wrapper for multiple serialized pyarrow

Re: Serializing nested pandas dataframes

2020-10-29 Thread Adam Lippai
If I have a DataFrame with columns Date, Category, Value and group by Category I'll have multiple DataFrames with Date, Value columns. The result of the groupby is DataFrameGroupBy, which can't be serialized. This is why I tried to assemble a nested DataFrame instead (like the one in the SO link pr

Re: Serializing nested pandas dataframes

2020-10-29 Thread Joris Van den Bossche
Can you give a more specific example of what kind of hierarchical data you want to serialize? (eg the output of a groupby operation in pandas typically is still a dataframe that can be converted to pyarrow and serialized). In general, for hierarchical data we have the nested data types (eg struct