Joe McDonnell created IMPALA-13898: -------------------------------------- Summary: Tuple cache produces incorrect result when querying scale_db.num_partitions_1234_blocks_per_partition_1 Key: IMPALA-13898 URL: https://issues.apache.org/jira/browse/IMPALA-13898 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 5.0.0 Reporter: Joe McDonnell
Tuple caching generates the same key for these two queries: {noformat} select * from scale_db.num_partitions_1234_blocks_per_partition_1 where j=1 select * from scale_db.num_partitions_1234_blocks_per_partition_1 where j=1 or j=2;{noformat} This is a scenario from catalog_service/test_large_num_partitions.py. It is a correctness issue. scale_db.num_partitions_1234_blocks_per_partition_1 is an exotic table where all of the partitions point to the same location / file. It also only has partition columns, so the contents of the file don't matter. This means that j=1 and j=2 both point to the same file. The partition information is not included in the key, so the two are indistinguishable. We'll need to expand what we put in the cache key to handle this scenario. -- This message was sent by Atlassian Jira (v8.20.10#820010)