[ 
https://issues.apache.org/jira/browse/HIVE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567856#comment-13567856
 ] 

Prajakta Kalmegh commented on HIVE-896:
---------------------------------------

This is not exactly a bug. In the existing trunk, the ExtractOperator is 
followed by a FileSinkOperator and hence does not have this problem. For 
queries like below:

select p1.p_mfgr, p1.p_name, 
p1.p_size 
from part p1 join part p2 on p1.p_partkey = p2.p_partkey 
distribute by p1.p_mfgr 
sort by p1.p_name;

a SelectOperator after JoinOperator solves this problem by filtering the 
virtual columns (VCs) and setting up a correct RR for ReduceSinkOperator. We 
cannot insert a SelectOperator in our case as the PTF chain is a black-box for 
us. 

In queries with the PTFOperator, we use the RowResolver of the ExtractOperator 
to construct ExprNodeDescs during translation. The problem here is: if we do 
not filter out the VCs from the ExtractOperator and use them during 
translation, the ColumnPrunerTableScanProc adds these VCs in the newVirtualCols 
List. This causes a non-empty virtualCols on TableScanDesc. During runtime, in 
the MapOperator the 'hasVC' boolean is set to true eventually resulting in a 
ClassCastException in ReduceSinkOperator during row evaluation. This problem 
occurs particularly for queries involving join with PTF (We can walk through 
some examples offline to explain why this is not a problem for queries with a 
PTF and no join). So currently, we are filtering the VCs and setting up a new 
RowResolver for ExtractOperator during translation so that the columns at 
runtime match with those during translation. 
                
> Add LEAD/LAG/FIRST/LAST analytical windowing functions to Hive.
> ---------------------------------------------------------------
>
>                 Key: HIVE-896
>                 URL: https://issues.apache.org/jira/browse/HIVE-896
>             Project: Hive
>          Issue Type: New Feature
>          Components: OLAP, UDF
>            Reporter: Amr Awadallah
>            Priority: Minor
>         Attachments: DataStructs.pdf, HIVE-896.1.patch.txt, 
> Hive-896.2.patch.txt
>
>
> Windowing functions are very useful for click stream processing and similar 
> time-series/sliding-window analytics.
> More details at:
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1006709
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007059
> http://download-west.oracle.com/docs/cd/B13789_01/server.101/b10736/analysis.htm#i1007032
> -- amr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to