Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/24141
Change subject: IMPALA-13273: Disallow writing NULL values in non-nullable columns ...................................................................... IMPALA-13273: Disallow writing NULL values in non-nullable columns Before this patch we didn't enforce NOT NULL constraints during writing. This caused issues with Iceberg tables with non-nullable columns. E.g.: create table t_ice_constr(c1 int not null) stored as iceberg; insert into t_ice_constr select null; select c1 from t_ice_constr; The above select returned a value instead of NULL, because the slot descriptor associated with column 'c1' was not nullable, so didn't even have a null indicator bit. The fix is to forbid writing NULLs in non-nullable columns in the first place. This is now enforced in the Parquet writer's FinalizeCurrentPage() function where we have statistics about the number of NULLs written. Schema evolution concerns * Iceberg allows making a required column optional (via UpdateSchema.makeColumnOptional()) This is a compatible change, because if a reader expects optional values then it is not a problem if the values are always there in the data files. * Iceberg has UpdateSchema.requireColumn(), but it is only allowed if users call allowIncompatibleChanges() as well, as it can break reading older data. * Iceberg also has UpdateSchema.addRequiredColumn() but users should also set a default value to not break readers. Iceberg only allows adding new required columns without default values if they explicitly call allowIncompatibleChanges(). * Iceberg says users should only call allowIncompatibleChanges() if they have validated that all of their old data files are compatible. E.g. if a column was optional (nullable), but all old data files contain values for that column. Testing * e2e tests added Change-Id: I1189da3094beee615a5d4600576febde4be8473d --- M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M common/thrift/Descriptors.thrift M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/IcebergColumn.java M fe/src/main/java/org/apache/impala/catalog/KuduColumn.java M fe/src/main/java/org/apache/impala/catalog/paimon/PaimonColumn.java M testdata/workloads/functional-query/queries/QueryTest/iceberg-create.test 12 files changed, 69 insertions(+), 47 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/24141/1 -- To view, visit http://gerrit.cloudera.org:8080/24141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1189da3094beee615a5d4600576febde4be8473d Gerrit-Change-Number: 24141 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
