I created https://issues.apache.org/jira/browse/ARROW-9598 to track.

On Wed, Jul 29, 2020 at 9:13 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> So I think the problem is within WriteLevelSpaced [1], specifically how we
> calculate "min_spaced_def_level", seems incorrect (I think this only worked
> for single nested lists).  This value probably needs to be calculated by
> walking up the tree to find the def level of the first repeated value.
>
> [1]
> https://github.com/apache/arrow/blob/3586292d62c8c348e9fb85676eb524cde53179cf/cpp/src/parquet/column_writer.cc#L1141
>
> On Wed, Jul 29, 2020 at 8:01 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> Hi Radu,
>> This appears to be a bug, would you mind filing a bug in JIRA?
>>
>> I'm looking into it to see if I can figure out what is going on.
>>
>> Thanks,
>> Micah
>>
>> On Wed, Jul 29, 2020 at 1:07 PM Radu Teodorescu
>> <radukay...@yahoo.com.invalid> wrote:
>>
>>> Is the current version supposed to allow struct columns with null values
>>> to be written to parquet:
>>>
>>> I narrowed it down to a two rows table with one column and two rows and
>>> the resulting parquet file is broken both according to parquet-tools as
>>> well as our own reader (it looks like a buffer is not written in full, but
>>> I haven’t dug much deeper)
>>>
>>> This is the table:
>>>
>>> struct: struct<int: int64>
>>>   child 0, int: int64
>>> ----
>>> struct:
>>>   [
>>>     -- is_valid:
>>>           [
>>>         false,
>>>         true
>>>       ]
>>>     -- child 0 type: int64
>>>       [
>>>         null,
>>>         2
>>>       ]
>>>   ]
>>>
>>> and this is my repro table generation:
>>>
>>> std::shared_ptr<arrow::Table> generate_table2() {
>>>     auto i64builder = std::make_shared<arrow::Int64Builder>();
>>>     const std::shared_ptr<arrow::DataType> structType =
>>> arrow::struct_({arrow::field("int", arrow::int64())});
>>>     arrow::StructBuilder structBuilder(structType,
>>> arrow::default_memory_pool(), {
>>>             std::static_pointer_cast<arrow::ArrayBuilder>(i64builder)});
>>>     PARQUET_THROW_NOT_OK(structBuilder.AppendNull());
>>>     PARQUET_THROW_NOT_OK(structBuilder.Append());
>>>     PARQUET_THROW_NOT_OK(i64builder->Append(2));
>>>     std::shared_ptr<arrow::Array> structArray;
>>>     PARQUET_THROW_NOT_OK(structBuilder.Finish(&structArray));
>>>     std::shared_ptr<arrow::Schema> schema =
>>> arrow::schema({arrow::field("struct",structType)});
>>>     return arrow::Table::Make(schema, {structArray});
>>> }
>>> Is this a bug, know limitation or am I doing something dumb?
>>>
>>> Thank you
>>> Radu
>>>
>>>

Reply via email to