I have found unexpected behavior in iceberg-arrow’s vectorized read support. After quite a bit of digging and collaboration with Eduard Tudenhoefner we have determined that there is a bug in iceberg-arrow, but we have not been able to determine exactly what the bug is. Can you please help identify the root cause of the issue I originally reported as issue 10275<https://github.com/apache/iceberg/issues/10275>?
Since I opened that issue I’ve learned a bit more about the issue and now have a clear reproduction case. The steps to reproduce the bug are: 1. Create a table 2. Add one row to the table 3. Alter the table’s schema by adding a new, optional column with no default value 4. Read all rows, all columns from the table 5. Blamo! The code currently in apache/iceberg will throw a NullPointerException I have written a unit test that reproduces this bug. You can view the test at https://github.com/apache/iceberg/pull/10284/files#diff-c3da34dcdb02c2db690c86a2b8356a405c899dec410bdb0b9bcee79fd8c63dc7 Initially I tried to fix the bug by preventing the NullPointerException, but all the while I suspected that the NPE is just a symptom of a larger bug. When I submitted a pull request containing my fix for the NPE Eduard Tudenhoefner reviewed the PR and came to the same conclusion, the NPE is a symptom of a larger bug within iceberg-arrow. The problem is neither of us can identify the actual bug. Again, I ask, can you please help identify the root cause of the issue I originally reported as issue 10275<https://github.com/apache/iceberg/issues/10275>? -Steve Lessard, Teradata