[jira] [Commented] (IMPALA-11134) Impala returns "Couldn't skip rows in file" error for old Parquet file

ASF subversion and git services (Jira) Wed, 21 May 2025 23:28:04 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-11134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953331#comment-17953331
 ]


ASF subversion and git services commented on IMPALA-11134:
----------------------------------------------------------

Commit a0b3ae4e028330d26893fe1baeb715f425444b75 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a0b3ae4e0 ]

IMPALA-11396: Deflake test_low_mem_limit_orderby_all

test_low_mem_limit_orderby_all is flaking if test_mem_limit equals 100
and 120 in test vector. The minimum mem_limit to run this test is 120MB
+ 30MB = 150MB. Thus, this test vector expect one of
MEM_LIMIT_ERROR_MSGS will be thrown because mem_limit (test_mem_limit)
is not enough.

Parquet scan under this low mem_limit sometimes throws "Couldn't skip
rows in column" error instead. This possibly indicate memory exhaustion
happen while reading parquet page index or late materialization (see
IMPALA-5843, IMPALA-9873, IMPALA-11134). This patch attempt to deflake
the test by adding "Couldn't skip rows in column" into
MEM_LIMIT_ERROR_MSGS.

Change-Id: I43a953bc19b40256e3a8fe473b1498bbe477c54d
Reviewed-on: http://gerrit.cloudera.org:8080/22932
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Impala returns "Couldn't skip rows in file" error for old Parquet file
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-11134
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11134
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>             Fix For: Impala 4.1.0
>
>
> Impala returns "Couldn't skip rows in file" error for old Parquet file 
> written by an old Impala (e.g. Impala 2.5, 2.6)
> In DEBUG build Impala crashes by a DCHECK:
> {noformat}
> F0217 18:21:34.449540 24288 parquet-column-readers.cc:1611] 
> d3407555528be8a8:5ea3fceb00000001] Check failed: num_buffered_values_ > 0 (-1 
> vs. 0)
> {noformat}
> The problem is that in some old Parquet files there can be a mismatch between 
> 'num_values' in a page and the encoded def/rep levels. There is usually one 
> more def/rep levels encoded in these files.
> In SkipTopLevelRows() we skip values based on how many def levels left:
> https://github.com/apache/impala/blob/92ce6fe48e75d7780efe9a275122554e59aac916/be/src/exec/parquet/parquet-column-readers.cc#L1308-L1314
> Since there are more def levels than values, {{num_buferred_values_}} becomes 
> {{-1}}. I looked at Parquet files written by newer Impala and the number of 
> def levels matches the number of values.
> The workaround is fairly easy, we could also take the value of 
> num_buferred_values_ into account when calculating 'read_count', i.e. 
> min(min(num_buffered_values_, num_rows - i), repeated_run_length); so we can 
> deal with such files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-11134) Impala returns "Couldn't skip rows in file" error for old Parquet file

Reply via email to