Hi Radu, We are slowly working on the read support but no concrete dates. I think some people from ursalabs might get involved in the effort which might improve the velocity that this gets delivered.
There are still some potential bugs on the write path for nested data. If you want to help contribute and help fix them it would be appreciated. I filed https://issues.apache.org/jira/browse/ARROW-9603 if its not clear feel free to ask for clarification. Thanks, Micah On Thu, Jul 30, 2020 at 9:35 AM Radu Teodorescu <radukay...@yahoo.com> wrote: > You’re a rock-star - your PR works for my reallife usecase as well - > unfortunately this squashes my hopes of making my first arrow contribution > today :) > > Now it breaks in supporting a combination of struct and list at read time, > but that is clearly documented as not yet supported - it there any timeline > for that? (I can work around it for now, but it would be nice to have at > some point) … maybe that can be my first contribution given enough time :). > > > > On Jul 30, 2020, at 9:26 AM, Radu Teodorescu > <radukay...@yahoo.com.INVALID> wrote: > > > > > > Thank you Micah! > > I spent a bit of time trying to get to the bottom of it (I know parquet > pretty well, but not that familiar with arrow parquet inner workings) so if > manage to track down the issue I’ll circle back (I give myself a 30% chance > of success given the allotted time and expertise level) > > > >> On Jul 30, 2020, at 12:31 AM, Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> > >> I created https://issues.apache.org/jira/browse/ARROW-9598 to track. > >> > >> On Wed, Jul 29, 2020 at 9:13 PM Micah Kornfield <emkornfi...@gmail.com> > >> wrote: > >> > >>> So I think the problem is within WriteLevelSpaced [1], specifically > how we > >>> calculate "min_spaced_def_level", seems incorrect (I think this only > worked > >>> for single nested lists). This value probably needs to be calculated > by > >>> walking up the tree to find the def level of the first repeated value. > >>> > >>> [1] > >>> > https://github.com/apache/arrow/blob/3586292d62c8c348e9fb85676eb524cde53179cf/cpp/src/parquet/column_writer.cc#L1141 > >>> > >>> On Wed, Jul 29, 2020 at 8:01 PM Micah Kornfield <emkornfi...@gmail.com > > > >>> wrote: > >>> > >>>> Hi Radu, > >>>> This appears to be a bug, would you mind filing a bug in JIRA? > >>>> > >>>> I'm looking into it to see if I can figure out what is going on. > >>>> > >>>> Thanks, > >>>> Micah > >>>> > >>>> On Wed, Jul 29, 2020 at 1:07 PM Radu Teodorescu > >>>> <radukay...@yahoo.com.invalid> wrote: > >>>> > >>>>> Is the current version supposed to allow struct columns with null > values > >>>>> to be written to parquet: > >>>>> > >>>>> I narrowed it down to a two rows table with one column and two rows > and > >>>>> the resulting parquet file is broken both according to parquet-tools > as > >>>>> well as our own reader (it looks like a buffer is not written in > full, but > >>>>> I haven’t dug much deeper) > >>>>> > >>>>> This is the table: > >>>>> > >>>>> struct: struct<int: int64> > >>>>> child 0, int: int64 > >>>>> ---- > >>>>> struct: > >>>>> [ > >>>>> -- is_valid: > >>>>> [ > >>>>> false, > >>>>> true > >>>>> ] > >>>>> -- child 0 type: int64 > >>>>> [ > >>>>> null, > >>>>> 2 > >>>>> ] > >>>>> ] > >>>>> > >>>>> and this is my repro table generation: > >>>>> > >>>>> std::shared_ptr<arrow::Table> generate_table2() { > >>>>> auto i64builder = std::make_shared<arrow::Int64Builder>(); > >>>>> const std::shared_ptr<arrow::DataType> structType = > >>>>> arrow::struct_({arrow::field("int", arrow::int64())}); > >>>>> arrow::StructBuilder structBuilder(structType, > >>>>> arrow::default_memory_pool(), { > >>>>> > std::static_pointer_cast<arrow::ArrayBuilder>(i64builder)}); > >>>>> PARQUET_THROW_NOT_OK(structBuilder.AppendNull()); > >>>>> PARQUET_THROW_NOT_OK(structBuilder.Append()); > >>>>> PARQUET_THROW_NOT_OK(i64builder->Append(2)); > >>>>> std::shared_ptr<arrow::Array> structArray; > >>>>> PARQUET_THROW_NOT_OK(structBuilder.Finish(&structArray)); > >>>>> std::shared_ptr<arrow::Schema> schema = > >>>>> arrow::schema({arrow::field("struct",structType)}); > >>>>> return arrow::Table::Make(schema, {structArray}); > >>>>> } > >>>>> Is this a bug, know limitation or am I doing something dumb? > >>>>> > >>>>> Thank you > >>>>> Radu > >>>>> > >>>>> > > > >