[ 
https://issues.apache.org/jira/browse/IMPALA-14619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18044724#comment-18044724
 ] 

ASF subversion and git services commented on IMPALA-14619:
----------------------------------------------------------

Commit d54b75ccf14a42471214926b2ba7e217cf7e3f1f in impala's branch 
refs/heads/master from Xuebin Su
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d54b75ccf ]

IMPALA-14619: Reset levels_readahead_ for late materialization

Previously, `BaseScalarColumnReader::levels_readahead_` was not reset
when the reader did not do page filtering. If a query selected the last
row containing a collection value in a row group, `levels_readahead_`
would be set and would not be reset when advancing to the next row
group without page filtering. As a result, trying to skip collection
values at the start of the next row group would cause a check failure.

This patch fixes the failure by resetting `levels_readahead_` in
`BaseScalarColumnReader::Reset()`, which is always called when advancing
to the next row group.

`levels_readahead_` is also moved out of the "Members used for page
filtering" section as the variable is also used in late materialization.

Testing:
- Added an E2E test for the fix.

Change-Id: Idac138ffe4e1a9260f9080a97a1090b467781d00
Reviewed-on: http://gerrit.cloudera.org:8080/23779
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Impala query fails on collection with late materialization enabled
> ------------------------------------------------------------------
>
>                 Key: IMPALA-14619
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14619
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 5.0.0
>            Reporter: Michael Smith
>            Assignee: Xuebin Su
>            Priority: Critical
>             Fix For: Impala 5.0.0
>
>
> I have an example where reading data from a parquet file in S3 results in
> {code}
> Query 6a4eda2d908edc3d:5877957900000000 failed: Error in skipping 3072 values 
> to row -1 in column list of file 
> s3a://table/date=2025-11-21/part-0004.snappy.parquet. Detail:
> {code}
> Running it with debug Impala fails at
> {code}
> parquet-column-readers.h:603] 1c41522501b199dc:252ddea100000001] Check 
> failed: def_level_ != ParquetLevel::INVALID_LEVEL (-1 vs. -1)
> {code}
> I don't have this as a publicly sharable example yet, but I've shared it with 
> a few people to help triage why it's failing.
> This started to happen after IMPALA-3841.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to