LAST analytical windowing functions to Hive.

Harish Butani (JIRA) Wed, 30 Jan 2013 18:43:19 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567271#comment-13567271
 ]


Harish Butani commented on HIVE-896:
------------------------------------

Yes, exactly. Will start to introduce the new Spec classes as noted in the 
DataStruct attachment, and refactor the Def classes to remove the antlr 
dependency. 

But before doing this had to handle the following issue. So the plan we 
generate has the form 
... -> ReduceSink -> Extract -> PTF Op -> ...
The Reduce Sink RowResolver contains the Virtual Columns from its input 
Operators. During translation we set the RowResolver of the Extract Op to be 
the same as the Reduce Sink RR; and this same RR was used to setup the 
ExprNodeDescs in PTF translation. But at runtime the Extract Op doesn't contain 
the Virtual Columns and so the internal column names can be different. For e.g. 
in our testJoinWithLeadLag testCase, which is a self join on part and also has 
a Windowing expression. The RR of the RS op at translation time looks something 
like this:
  (_co1,_col2,..,_col7, 
_col8(vc=true),_col9(vc=true),_col10,_col11,.._col15(vc=true),_col16(vc=true),..)
At runtime the Virtual columns are removed and all the columns after _col7 are 
shifted 1 or 2 positions. So in child Operators ColumnExprNodeDescs are no 
longer referring to the right columns.
We were handling this issue by recreating the ExprNodeDescs from the ASTNodes 
at runtime. 
So to avoid carrying forward the ASTNodes we now build a new RR for the Extract 
Op, with the Virtual Columns removed. We hand this to the PTFTranslator as the 
starting RR to use to translate a PTF Chain. 

With the above change, now it should be possible to use the ExprNodeDescs 
created during translation in the execution of the PTF Op. So will now start a 
sequence of steps to move to the new data structures and avoid recreation of 
ExprNodeDescs at runtime. 

I apologize if I am not being clear. This is a little hard to explain w/o 
walking through an example. Happy to go over this in detail offline.

                
> Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
> ---------------------------------------------------------------
>
>                 Key: HIVE-896
>                 URL: https://issues.apache.org/jira/browse/HIVE-896
>             Project: Hive
>          Issue Type: New Feature
>          Components: OLAP, UDF
>            Reporter: Amr Awadallah
>            Priority: Minor
>         Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, 
> Hive-896.2.patch.txt
>
>
> Windowing functions are very useful for click stream processing and similar 
> time-series/sliding-window analytics.
> More details at:
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
> -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-896) Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.

Reply via email to