[ 
https://issues.apache.org/jira/browse/HIVE-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055603#comment-16055603
 ] 

Rui Li commented on HIVE-6348:
------------------------------

Hi [~ashutoshc], [~vgarg], either doing this in CBO or not is OK to me. But I'm 
not sure how to handle cases like input20.q. Let's consider the following two 
queries:
The orderBy should be removed in this one:
{code}
from (select key from src order by key) tmap insert overwrite table dest select 
tmap.key;
{code}
While it shouldn't be removed in this one:
{code}
from (select key,value from src order by key,value) tmap insert overwrite table 
dest1 reduce tmap.key, tmap.value using 'python input20_script.py';
{code}
The two queries have very similar ASTs. Any suggestions how can we distinguish 
them? Maybe we can skip the optimization in case of scripts and UDFs. Is it 
correct to expect a specific order from a sub query in the first place?

> Order by/Sort by in subquery
> ----------------------------
>
>                 Key: HIVE-6348
>                 URL: https://issues.apache.org/jira/browse/HIVE-6348
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Rui Li
>            Priority: Minor
>              Labels: sub-query
>         Attachments: HIVE-6348.1.patch, HIVE-6348.2.patch, HIVE-6348.3.patch
>
>
> select * from (select * from foo order by c asc) bar order by c desc;
> in hive sorts the data set twice. The optimizer should probably remove any 
> order by/sort by in the sub query unless you use 'limit '. Could even go so 
> far as barring it at the semantic level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to