Iceberg-arrow vectorized read bug

Lessard, Steve Wed, 26 Jun 2024 09:54:15 -0700

I have found unexpected behavior in iceberg-arrow’s vectorized read support. 
After quite a bit of digging and collaboration with Eduard Tudenhoefner we have 
determined that there is a bug in iceberg-arrow, but we have not been able to 
determine exactly what the bug is. Can you please help identify the root cause 
of the issue I originally reported as issue 
10275<https://github.com/apache/iceberg/issues/10275>?


Since I opened that issue I’ve learned a bit more about the issue and now have 
a clear reproduction case. The steps to reproduce the bug are:

  1.  Create a table
  2.  Add one row to the table
  3.  Alter the table’s schema by adding a new, optional column with no default 
value
  4.  Read all rows, all columns from the table
  5.  Blamo! The code currently in apache/iceberg will throw a 
NullPointerException

I have written a unit test that reproduces this bug. You can view the test at 
https://github.com/apache/iceberg/pull/10284/files#diff-c3da34dcdb02c2db690c86a2b8356a405c899dec410bdb0b9bcee79fd8c63dc7

Initially I tried to fix the bug by preventing the NullPointerException, but 
all the while I suspected that the NPE is just a symptom of a larger bug. When 
I submitted a pull request containing my fix for the NPE Eduard Tudenhoefner 
reviewed the PR and came to the same conclusion, the NPE is a symptom of a 
larger bug within iceberg-arrow. The problem is neither of us can identify the 
actual bug.

Again, I ask, can you please help identify the root cause of the issue I 
originally reported as issue 
10275<https://github.com/apache/iceberg/issues/10275>?

-Steve Lessard, Teradata

Iceberg-arrow vectorized read bug

Reply via email to