Daniel Becker has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/22777 )
Change subject: IMPALA-13963: Crash when setting 'write.parquet.page-size-bytes' to a higher value ...................................................................... IMPALA-13963: Crash when setting 'write.parquet.page-size-bytes' to a higher value When setting the Iceberg table property 'write.parquet.page-size-bytes' to a higher value, inserting into the table crashes Impala: create table lineitem_iceberg_comment(l_comment STRING) stored as iceberg; alter table lineitem_iceberg_comment set tblproperties("write.parquet.page-size-bytes"="1048576"); insert into lineitem_iceberg_comment select l_comment from tpch_parquet.lineitem; The impala executors crash because of memory corruption caused by buffer overflow in HdfsParquetTableWriter::ColumnWriter::ProcessValue(). Before attempting to write the next value, it checks whether the total byte size would exceed 'plain_page_size_', but the buffer into which it writes ('values_buffer_') has length 'values_buffer_len_'. 'values_buffer_len_' is initialised in the constructor to 'DEFAULT_DATA_PAGE_SIZE', irrespective of the value of 'plain_page_size_'. However, it is intended to have at least the same size, as can be seen from the check in ProcessValue() or the GrowPageSize() method. The error does not usually surface because 'plain_page_size_' has the same default value, 'DEFAULT_DATA_PAGE_SIZE'. 'values_buffer_' is also used for DICTIONARY encoding, but that takes care of growing it as necessary. This change fixes the problem by initialising 'values_buffer_len_' to the value of 'plain_page_size_' in the constructor. This leads to exposing another bug: in BitWriter::PutValue(), when we check whether the next element fits in the buffer, we multiply 'max_bytes_' by 8, which overflows because 'max_bytes_' is a 32-bit int. This happens with values that we already use in our tests. This change changes the type of 'max_bytes_' to int64_t, so multiplying it by 8 (converting from bytes to bits) is now safe. Testing: - Added an EE test in iceberg-insert.test that reproduced the error. Change-Id: Icb94df8ac3087476ddf1613a1285297f23a54c76 --- M be/src/exec/parquet/hdfs-parquet-table-writer.cc M be/src/util/bit-stream-utils.h M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test 3 files changed, 17 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/22777/4 -- To view, visit http://gerrit.cloudera.org:8080/22777 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icb94df8ac3087476ddf1613a1285297f23a54c76 Gerrit-Change-Number: 22777 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Noemi Pap-Takacs <npaptak...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>