[ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939580#comment-14939580 ]
Jesus Camacho Rodriguez commented on HIVE-11634: ------------------------------------------------ 1. That's what I thought, no problem. 2. OK, this case should be solved or studied as part of this JIRA. 3. That's fine; we can create a new JIRA case for that. But maybe I would remove then the changes in PcrExprProcFactory.java and I would follow-up in the new JIRA, as that code is not working as expected. What do you think? 4. Maybe I didn't explain it properly. The idea is that you would only prepend non-partition columns, and without clustering them, but iff the NDV in the IN clause is reduced. In any case, we can create a new JIRA for this too, and maybe assign it to me? As I see it, the modification to the original optimization that you have just created should not be too complicated. > Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) > ------------------------------------------------------------------ > > Key: HIVE-11634 > URL: https://issues.apache.org/jira/browse/HIVE-11634 > Project: Hive > Issue Type: Bug > Components: CBO > Reporter: Hari Sankar Sivarama Subramaniyan > Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, > HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, > HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, > HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, > HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, > HIVE-11634.96.patch, HIVE-11634.97.patch > > > Currently, we do not support partition pruning for the following scenario > {code} > create table pcr_t1 (key int, value string) partitioned by (ds string); > insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src > where key < 20 order by key; > explain extended select ds from pcr_t1 where struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > If we run the above query, we see that all the partitions of table pcr_t1 are > present in the filter predicate where as we can prune partition > (ds='2000-04-10'). > The optimization is to rewrite the above query into the following. > {code} > explain extended select ds from pcr_t1 where (struct(ds)) IN > (struct('2000-04-08'), struct('2000-04-09')) and struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) > is used by partition pruner to prune the columns which otherwise will not be > pruned. > This is an extension of the idea presented in HIVE-11573. -- This message was sent by Atlassian JIRA (v6.3.4#6332)