Joe McDonnell created IMPALA-13898:
--------------------------------------

             Summary: Tuple cache produces incorrect result when querying 
scale_db.num_partitions_1234_blocks_per_partition_1
                 Key: IMPALA-13898
                 URL: https://issues.apache.org/jira/browse/IMPALA-13898
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 5.0.0
            Reporter: Joe McDonnell


Tuple caching generates the same key for these two queries:
{noformat}
select * from scale_db.num_partitions_1234_blocks_per_partition_1 where j=1

select * from scale_db.num_partitions_1234_blocks_per_partition_1 where j=1 or 
j=2;{noformat}
This is a scenario from catalog_service/test_large_num_partitions.py. It is a 
correctness issue.

scale_db.num_partitions_1234_blocks_per_partition_1 is an exotic table where 
all of the partitions point to the same location / file. It also only has 
partition columns, so the contents of the file don't matter. This means that 
j=1 and j=2 both point to the same file. The partition information is not 
included in the key, so the two are indistinguishable. We'll need to expand 
what we put in the cache key to handle this scenario.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to