[ https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046080#comment-16046080 ]
Vineet Garg commented on HIVE-6348: ----------------------------------- [~ashutoshc] Plan generated after subquery remove rule/de-correlation doesn't generate HiveSortLimit on HiveSortLimit e.g. for query {code:sql} select * from part where p_size IN (select p_size from part p where p.p_type <> part.p_name order by p_size) {code} plan just after decorrelation looks like {code:sql} HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8]) HiveProject(p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8], BLOCK__OFFSET__INSIDE__FILE=[$9], INPUT__FILE__NAME=[$10], ROW__ID=[$11]) LogicalJoin(condition=[AND(<>($1, $13), =($5, $12))], joinType=[inner]) HiveTableScan(table=[[default.part]], table:alias=[part]) HiveAggregate(group=[{0, 1}]) HiveProject(p_size=[$0], p_type0=[$1]) HiveProject(p_size=[$0], p_type0=[$13]) HiveSortLimit(sort0=[$0], dir0=[ASC-nulls-first]) HiveProject(p_size=[$5], p_partkey=[$0], p_name=[$1], p_mfgr=[$2], p_brand=[$3], p_type=[$4], p_size1=[$5], p_container=[$6], p_retailprice=[$7], p_comment=[$8], block__offset__inside__file=[$9], input__file__name=[$10], row__id=[$11], p_type0=[$4]) LogicalFilter(condition=[IS NOT NULL($4)]) HiveTableScan(table=[[default.part]], table:alias=[p]) {code} So you have one sort limit on right side of join. One possible rule could be if top project doesn't project any column/expression from right side then remove HiveSortLimit from right side of join. > Order by/Sort by in subquery > ---------------------------- > > Key: HIVE-6348 > URL: https://issues.apache.org/jira/browse/HIVE-6348 > Project: Hive > Issue Type: Bug > Reporter: Gunther Hagleitner > Assignee: Rui Li > Priority: Minor > Labels: sub-query > Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch, HIVE-6348.3.patch > > > select * from (select * from foo order by c asc) bar order by c desc; > in hive sorts the data set twice. The optimizer should probably remove any > order by/sort by in the sub query unless you use 'limit '. Could even go so > far as barring it at the semantic level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)