[
https://issues.apache.org/jira/browse/IMPALA-11134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953331#comment-17953331
]
ASF subversion and git services commented on IMPALA-11134:
----------------------------------------------------------
Commit a0b3ae4e028330d26893fe1baeb715f425444b75 in impala's branch
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a0b3ae4e0 ]
IMPALA-11396: Deflake test_low_mem_limit_orderby_all
test_low_mem_limit_orderby_all is flaking if test_mem_limit equals 100
and 120 in test vector. The minimum mem_limit to run this test is 120MB
+ 30MB = 150MB. Thus, this test vector expect one of
MEM_LIMIT_ERROR_MSGS will be thrown because mem_limit (test_mem_limit)
is not enough.
Parquet scan under this low mem_limit sometimes throws "Couldn't skip
rows in column" error instead. This possibly indicate memory exhaustion
happen while reading parquet page index or late materialization (see
IMPALA-5843, IMPALA-9873, IMPALA-11134). This patch attempt to deflake
the test by adding "Couldn't skip rows in column" into
MEM_LIMIT_ERROR_MSGS.
Change-Id: I43a953bc19b40256e3a8fe473b1498bbe477c54d
Reviewed-on: http://gerrit.cloudera.org:8080/22932
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Impala returns "Couldn't skip rows in file" error for old Parquet file
> ----------------------------------------------------------------------
>
> Key: IMPALA-11134
> URL: https://issues.apache.org/jira/browse/IMPALA-11134
> Project: IMPALA
> Issue Type: Bug
> Reporter: Zoltán Borók-Nagy
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Fix For: Impala 4.1.0
>
>
> Impala returns "Couldn't skip rows in file" error for old Parquet file
> written by an old Impala (e.g. Impala 2.5, 2.6)
> In DEBUG build Impala crashes by a DCHECK:
> {noformat}
> F0217 18:21:34.449540 24288 parquet-column-readers.cc:1611]
> d3407555528be8a8:5ea3fceb00000001] Check failed: num_buffered_values_ > 0 (-1
> vs. 0)
> {noformat}
> The problem is that in some old Parquet files there can be a mismatch between
> 'num_values' in a page and the encoded def/rep levels. There is usually one
> more def/rep levels encoded in these files.
> In SkipTopLevelRows() we skip values based on how many def levels left:
> https://github.com/apache/impala/blob/92ce6fe48e75d7780efe9a275122554e59aac916/be/src/exec/parquet/parquet-column-readers.cc#L1308-L1314
> Since there are more def levels than values, {{num_buferred_values_}} becomes
> {{-1}}. I looked at Parquet files written by newer Impala and the number of
> def levels matches the number of values.
> The workaround is fairly easy, we could also take the value of
> num_buferred_values_ into account when calculating 'read_count', i.e.
> min(min(num_buffered_values_, num_rows - i), repeated_run_length); so we can
> deal with such files.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]