Daniel Becker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/22777


Change subject: IMPALA-13963: Crash when setting 
'write.parquet.page-size-bytes' to a higher value
......................................................................

IMPALA-13963: Crash when setting 'write.parquet.page-size-bytes' to a higher 
value

When setting the Iceberg table property 'write.parquet.page-size-bytes'
to a higher value, inserting into the table crashes Impala:

  create table lineitem_iceberg_comment stored as iceberg as
    select l_comment from tpch_parquet.lineitem union all
    select l_comment from tpch_parquet.lineitem;

  alter table lineitem_iceberg_comment
    set tblproperties("write.parquet.page-size-bytes"="6000000");

  insert into lineitem_iceberg_comment
    select l_comment from tpch_parquet.lineitem union all
    select l_comment from tpch_parquet.lineitem;

The impala executors crash because of memory corruption caused by buffer
overflow in HdfsParquetTableWriter::ColumnWriter::ProcessValue(). Before
attempting to write the next value, it checks whether the total byte
size would exceed 'plain_page_size_', but the buffer into which it
writes ('values_buffer_') has length 'values_buffer_len_'.
'values_buffer_len_' is initialised in the constructor to
'DEFAULT_DATA_PAGE_SIZE', irrespective of the value of
'plain_page_size_'. However, it is intended to have at least the same
size, as can be seen from the check in ProcessValue() or the
GrowPageSize() method. The error does not usually surface because
'plain_page_size_' has the same default value, 'DEFAULT_DATA_PAGE_SIZE'.

'values_buffer_' is also used for DICTIONARY encoding, but that takes
care of growing it as necessary.

This change fixes the problem by initialising 'values_buffer_len_' to
the value of 'plain_page_size_' in the constructor.

This leads to exposing another bug: in BitWriter::PutValue(), when we
check whether the next element fits in the buffer, we multiply
'max_bytes_' by 8, which overflows because 'max_bytes_' is a 32-bit int.
This happens with values that we already use in our tests.

This change changes the type of 'max_bytes_' to int64_t, so multiplying
it by 8 (converting from bytes to bits) is now safe.

Testing:
  - Added an EE test in iceberg-insert.test that reproduced the error.

Change-Id: Icb94df8ac3087476ddf1613a1285297f23a54c76
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/util/bit-stream-utils.h
M testdata/workloads/functional-query/queries/QueryTest/iceberg-insert.test
3 files changed, 19 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/22777/2
--
To view, visit http://gerrit.cloudera.org:8080/22777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icb94df8ac3087476ddf1613a1285297f23a54c76
Gerrit-Change-Number: 22777
Gerrit-PatchSet: 2
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>

Reply via email to