[ https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567271#comment-13567271 ]
Harish Butani commented on HIVE-896: ------------------------------------ Yes, exactly. Will start to introduce the new Spec classes as noted in the DataStruct attachment, and refactor the Def classes to remove the antlr dependency. But before doing this had to handle the following issue. So the plan we generate has the form ... -> ReduceSink -> Extract -> PTF Op -> ... The Reduce Sink RowResolver contains the Virtual Columns from its input Operators. During translation we set the RowResolver of the Extract Op to be the same as the Reduce Sink RR; and this same RR was used to setup the ExprNodeDescs in PTF translation. But at runtime the Extract Op doesn't contain the Virtual Columns and so the internal column names can be different. For e.g. in our testJoinWithLeadLag testCase, which is a self join on part and also has a Windowing expression. The RR of the RS op at translation time looks something like this: (_co1,_col2,..,_col7, _col8(vc=true),_col9(vc=true),_col10,_col11,.._col15(vc=true),_col16(vc=true),..) At runtime the Virtual columns are removed and all the columns after _col7 are shifted 1 or 2 positions. So in child Operators ColumnExprNodeDescs are no longer referring to the right columns. We were handling this issue by recreating the ExprNodeDescs from the ASTNodes at runtime. So to avoid carrying forward the ASTNodes we now build a new RR for the Extract Op, with the Virtual Columns removed. We hand this to the PTFTranslator as the starting RR to use to translate a PTF Chain. With the above change, now it should be possible to use the ExprNodeDescs created during translation in the execution of the PTF Op. So will now start a sequence of steps to move to the new data structures and avoid recreation of ExprNodeDescs at runtime. I apologize if I am not being clear. This is a little hard to explain w/o walking through an example. Happy to go over this in detail offline. > Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive. > --------------------------------------------------------------- > > Key: HIVE-896 > URL: https://issues.apache.org/jira/browse/HIVE-896 > Project: Hive > Issue Type: New Feature > Components: OLAP, UDF > Reporter: Amr Awadallah > Priority: Minor > Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, > Hive-896.2.patch.txt > > > Windowing functions are very useful for click stream processing and similar > time-series/sliding-window analytics. > More details at: > http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709 > http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059 > http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032 > -- amr -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira