[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate

Chaoyu Tang (JIRA) Tue, 05 May 2015 10:52:54 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528903#comment-14528903
 ]


Chaoyu Tang commented on HIVE-9534:
-----------------------------------

Actually as of Oracle 11.2 (the version I tested was Oracle Database 11g 
Express Edition Release 11.2.0.2.0 - 64bit Production), the window function is 
not supported. Though the query "select avg(distinct col1) over() from 
testwindow;", where avg acts on all rows, works, query with windowing_clause 
does actually not work. for example:
{code}
Query: 
select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 
following) from testwindow;
---
Error:
ORA-30487: ORDER BY not allowed here
30487. 00000 -  "ORDER BY not allowed here"
*Cause:    DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY
{code}
Based on Oracle document 
http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm
{code}
Note that the DISTINCT keyword is not supported in windowing functions except 
for MAX and MIN.
{code}

Based on Hive plan for "select avg(distinct col1) over() from testwindow;" it 
looks like Hive was computing the distinct value of col1:
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: testwindow
            Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column 
stats: NONE
            Select Operator
              expressions: col1 (type: int)
              outputColumnNames: col1
              Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE 
Column stats: NONE
              Group By Operator
                aggregations: avg(DISTINCT col1)
                keys: col1 (type: int)
                mode: hash
                outputColumnNames: _col0, _col1
                Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE 
Column stats: NONE
                Reduce Output Operator
                  key expressions: _col0 (type: int)
                  sort order: +
                  Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE 
Column stats: NONE
      Reduce Operator Tree:
        Group By Operator
          aggregations: avg(DISTINCT KEY._col0:0._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column 
stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}



> incorrect result set for query that projects a windowed aggregate
> -----------------------------------------------------------------
>
>                 Key: HIVE-9534
>                 URL: https://issues.apache.org/jira/browse/HIVE-9534
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>            Reporter: N Campbell
>            Assignee: Chaoyu Tang
>
> Result set returned by Hive has one row instead of 5
> {code}
> select avg(distinct tsint.csint) over () from tsint 
> create table  if not exists TSINT (RNUM int , CSINT smallint)
>  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' 
>  STORED AS TEXTFILE;
> 0|\N
> 1|-1
> 2|0
> 3|1
> 4|10
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9534) incorrect result set for query that projects a windowed aggregate

Reply via email to