Hi Ying,

Hmm, yes, this may be related to the null bitmaps, or the offsets.
Can you try to inspect or pretty-print the offsets arrays for the two
list arrays?

Regards

Antoine.


Le 10/02/2021 à 03:26, Ying Zhou a écrit :
> Hi,
> 
> This is an extremely weird phenomenon. There are two 2*1 tables that are 
> supposedly different when I got a confusing error message like this:
> 
> [ RUN      ] TestAdapterWriteNested.writeList
> /Users/karlkatzen/Documents/code/arrow-dev/arrow/cpp/src/arrow/testing/gtest_util.cc:459:
>  Failure
> Failed
> Unequal at absolute position 2
> Expected:
>   [
>     [
>       null,
>       1074834796,
>       null,
>       null
>     ],
>     null
>   ]
> Actual:
>   [
>     [
>       null,
>       1074834796,
>       null,
>       null
>     ],
>     null
>   ]
> [  FAILED  ] TestAdapterWriteNested.writeList (2 ms)
> 
> Here is the code that causes the issue:
> 
> TEST(TestAdapterWriteNested, writeList) {
>   std::shared_ptr<Schema> table_schema = schema({field("list", 
> list(int32()))});
>   int64_t num_rows = 2;
>   arrow::random::RandomArrayGenerator rand(kRandomSeed);
>   auto value_array = rand.ArrayOf(int32(), 2 * num_rows, 0.6);
>   std::shared_ptr<Array> array = rand.List(*value_array, num_rows + 1, 1);
>   std::shared_ptr<ChunkedArray> chunked_array = 
> std::make_shared<ChunkedArray>(array);
>   std::shared_ptr<Table> table = Table::Make(table_schema, {chunked_array});
>   AssertTableWriteReadEqual(table, table, kDefaultSmallMemStreamSize * 5);
> }
> 
> Here AssertTableWriteReadEqual is a function I use to test that 
> from_orc(to_orc(table_in)) == expected_table_out. The function did not have 
> issues before.
> 
> void AssertTableWriteReadEqual(const std::shared_ptr<Table>& input_table,
>                                const std::shared_ptr<Table>& 
> expected_output_table,
>                                const int64_t max_size = 
> kDefaultSmallMemStreamSize) {
>   std::shared_ptr<io::BufferOutputStream> buffer_output_stream =
>       io::BufferOutputStream::Create(max_size).ValueOrDie();
>   std::unique_ptr<adapters::orc::ORCFileWriter> writer =
>       adapters::orc::ORCFileWriter::Open(*buffer_output_stream).ValueOrDie();
>   ARROW_EXPECT_OK(writer->Write(*input_table));
>   ARROW_EXPECT_OK(writer->Close());
>   std::shared_ptr<Buffer> buffer = 
> buffer_output_stream->Finish().ValueOrDie();
>   std::shared_ptr<io::RandomAccessFile> in_stream(new 
> io::BufferReader(buffer));
>   std::unique_ptr<adapters::orc::ORCFileReader> reader;
>   ARROW_EXPECT_OK(
>       adapters::orc::ORCFileReader::Open(in_stream, default_memory_pool(), 
> &reader));
>   std::shared_ptr<Table> actual_output_table;
>   ARROW_EXPECT_OK(reader->Read(&actual_output_table));
>   AssertTablesEqual(*actual_output_table, *expected_output_table, false, 
> false);
> }
> 
> I strongly suspect that this is related to the null bitmaps. What do you guys 
> think?
> 
> Ying
> 

Reply via email to