I created https://issues.apache.org/jira/browse/ARROW-9598 to track.
On Wed, Jul 29, 2020 at 9:13 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > So I think the problem is within WriteLevelSpaced [1], specifically how we > calculate "min_spaced_def_level", seems incorrect (I think this only worked > for single nested lists). This value probably needs to be calculated by > walking up the tree to find the def level of the first repeated value. > > [1] > https://github.com/apache/arrow/blob/3586292d62c8c348e9fb85676eb524cde53179cf/cpp/src/parquet/column_writer.cc#L1141 > > On Wed, Jul 29, 2020 at 8:01 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> Hi Radu, >> This appears to be a bug, would you mind filing a bug in JIRA? >> >> I'm looking into it to see if I can figure out what is going on. >> >> Thanks, >> Micah >> >> On Wed, Jul 29, 2020 at 1:07 PM Radu Teodorescu >> <radukay...@yahoo.com.invalid> wrote: >> >>> Is the current version supposed to allow struct columns with null values >>> to be written to parquet: >>> >>> I narrowed it down to a two rows table with one column and two rows and >>> the resulting parquet file is broken both according to parquet-tools as >>> well as our own reader (it looks like a buffer is not written in full, but >>> I haven’t dug much deeper) >>> >>> This is the table: >>> >>> struct: struct<int: int64> >>> child 0, int: int64 >>> ---- >>> struct: >>> [ >>> -- is_valid: >>> [ >>> false, >>> true >>> ] >>> -- child 0 type: int64 >>> [ >>> null, >>> 2 >>> ] >>> ] >>> >>> and this is my repro table generation: >>> >>> std::shared_ptr<arrow::Table> generate_table2() { >>> auto i64builder = std::make_shared<arrow::Int64Builder>(); >>> const std::shared_ptr<arrow::DataType> structType = >>> arrow::struct_({arrow::field("int", arrow::int64())}); >>> arrow::StructBuilder structBuilder(structType, >>> arrow::default_memory_pool(), { >>> std::static_pointer_cast<arrow::ArrayBuilder>(i64builder)}); >>> PARQUET_THROW_NOT_OK(structBuilder.AppendNull()); >>> PARQUET_THROW_NOT_OK(structBuilder.Append()); >>> PARQUET_THROW_NOT_OK(i64builder->Append(2)); >>> std::shared_ptr<arrow::Array> structArray; >>> PARQUET_THROW_NOT_OK(structBuilder.Finish(&structArray)); >>> std::shared_ptr<arrow::Schema> schema = >>> arrow::schema({arrow::field("struct",structType)}); >>> return arrow::Table::Make(schema, {structArray}); >>> } >>> Is this a bug, know limitation or am I doing something dumb? >>> >>> Thank you >>> Radu >>> >>>