[jira] [Commented] (HIVE-15544) Support scalar subqueries

Vineet Garg (JIRA) Thu, 05 Jan 2017 13:09:11 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802524#comment-15802524
 ]


Vineet Garg commented on HIVE-15544:
------------------------------------

Doing a quick search in Postgres code shows that they handle this case during 
execution instead of planning. Postgres avoid re-writing scalar subqueries into 
join and execute them separately as subplan and so are able to handle this at 
runtime

Also according to MariaDB's this documentation page 
(https://mariadb.com/kb/en/sql-99/scalar-subqueries/) {noformat}It is an error 
if, at runtime, the DBMS discovers that a scalar subquery returns more than one 
row{noformat}

Current patch leverages Calcite's functionality to plan scalar subqueries, 
which re-write them into LEFT JOIN. Calcite tries to introduce SINGLE_VALUE 
function to make sure subquery expression produces single value but HIVE avoids 
that since it doesn't support SINGLE_VALUE.

I am not yet sure how HIVE is going to handle this yet. Most likely this will 
have to be handled at runtime but that'll probably involve introducing new 
execution operator to execute subplans.

> Support scalar subqueries
> -------------------------
>
>                 Key: HIVE-15544
>                 URL: https://issues.apache.org/jira/browse/HIVE-15544
>             Project: Hive
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Vineet Garg
>            Assignee: Vineet Garg
>              Labels: sub-query
>         Attachments: HIVE-15544.1.patch
>
>
> Currently HIVE only support IN/EXISTS/NOT IN/NOT EXISTS subqueries. HIVE 
> doesn't allow sub-queries such as:
> {code}
> explain select  a.ca_state state, count(*) cnt
>  from customer_address a
>      ,customer c
>      ,store_sales s
>      ,date_dim d
>      ,item i
>  where       a.ca_address_sk = c.c_current_addr_sk
>       and c.c_customer_sk = s.ss_customer_sk
>       and s.ss_sold_date_sk = d.d_date_sk
>       and s.ss_item_sk = i.i_item_sk
>       and d.d_month_seq = 
>            (select distinct (d_month_seq)
>             from date_dim
>                where d_year = 2000
>               and d_moy = 2 )
>       and i.i_current_price > 1.2 * 
>              (select avg(j.i_current_price) 
>            from item j 
>            where j.i_category = i.i_category)
>  group by a.ca_state
>  having count(*) >= 10
>  order by cnt 
>  limit 100;
> {code}
> We initially plan to support such scalar subqueries in filter i.e. WHERE and 
> HAVING



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15544) Support scalar subqueries

Reply via email to