[ https://issues.apache.org/jira/browse/HIVE-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14528903#comment-14528903 ]
Chaoyu Tang commented on HIVE-9534: ----------------------------------- Actually as of Oracle 11.2 (the version I tested was Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production), the window function is not supported. Though the query "select avg(distinct col1) over() from testwindow;", where avg acts on all rows, works, query with windowing_clause does actually not work. for example: {code} Query: select avg(distinct col1) over(order by col2 rows between 1 preceding and 1 following) from testwindow; --- Error: ORA-30487: ORDER BY not allowed here 30487. 00000 - "ORDER BY not allowed here" *Cause: DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY {code} Based on Oracle document http://docs.oracle.com/cd/E11882_01/server.112/e25554/analysis.htm {code} Note that the DISTINCT keyword is not supported in windowing functions except for MAX and MIN. {code} Based on Hive plan for "select avg(distinct col1) over() from testwindow;" it looks like Hive was computing the distinct value of col1: {code} STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: testwindow Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: col1 (type: int) outputColumnNames: col1 Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: avg(DISTINCT col1) keys: col1 (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Statistics: Num rows: 6 Data size: 18 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Group By Operator aggregations: avg(DISTINCT KEY._col0:0._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} > incorrect result set for query that projects a windowed aggregate > ----------------------------------------------------------------- > > Key: HIVE-9534 > URL: https://issues.apache.org/jira/browse/HIVE-9534 > Project: Hive > Issue Type: Bug > Components: SQL > Reporter: N Campbell > Assignee: Chaoyu Tang > > Result set returned by Hive has one row instead of 5 > {code} > select avg(distinct tsint.csint) over () from tsint > create table if not exists TSINT (RNUM int , CSINT smallint) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' > STORED AS TEXTFILE; > 0|\N > 1|-1 > 2|0 > 3|1 > 4|10 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)