Re: Writing null structs to parquet

2020-07-30 Thread Micah Kornfield
Hi Radu, We are slowly working on the read support but no concrete dates. I think some people from ursalabs might get involved in the effort which might improve the velocity that this gets delivered. There are still some potential bugs on the write path for nested data. If you want to help contr

Re: Writing null structs to parquet

2020-07-30 Thread Radu Teodorescu
You’re a rock-star - your PR works for my reallife usecase as well - unfortunately this squashes my hopes of making my first arrow contribution today :) Now it breaks in supporting a combination of struct and list at read time, but that is clearly documented as not yet supported - it there any

Re: Writing null structs to parquet

2020-07-30 Thread Radu Teodorescu
Thank you Micah! I spent a bit of time trying to get to the bottom of it (I know parquet pretty well, but not that familiar with arrow parquet inner workings) so if manage to track down the issue I’ll circle back (I give myself a 30% chance of success given the allotted time and expertise leve

Re: Writing null structs to parquet

2020-07-29 Thread Micah Kornfield
I created https://issues.apache.org/jira/browse/ARROW-9598 to track. On Wed, Jul 29, 2020 at 9:13 PM Micah Kornfield wrote: > So I think the problem is within WriteLevelSpaced [1], specifically how we > calculate "min_spaced_def_level", seems incorrect (I think this only worked > for single nest

Re: Writing null structs to parquet

2020-07-29 Thread Micah Kornfield
So I think the problem is within WriteLevelSpaced [1], specifically how we calculate "min_spaced_def_level", seems incorrect (I think this only worked for single nested lists). This value probably needs to be calculated by walking up the tree to find the def level of the first repeated value. [1]

Re: Writing null structs to parquet

2020-07-29 Thread Micah Kornfield
Hi Radu, This appears to be a bug, would you mind filing a bug in JIRA? I'm looking into it to see if I can figure out what is going on. Thanks, Micah On Wed, Jul 29, 2020 at 1:07 PM Radu Teodorescu wrote: > Is the current version supposed to allow struct columns with null values > to be writt

Writing null structs to parquet

2020-07-29 Thread Radu Teodorescu
Is the current version supposed to allow struct columns with null values to be written to parquet: I narrowed it down to a two rows table with one column and two rows and the resulting parquet file is broken both according to parquet-tools as well as our own reader (it looks like a buffer is no