Zsolt Miskolczi created HIVE-28694: -------------------------------------- Summary: Implement Lineage information for windowing functions Key: HIVE-28694 URL: https://issues.apache.org/jira/browse/HIVE-28694 Project: Hive Issue Type: Task Components: HiveServer2 Affects Versions: 4.0.1 Reporter: Zsolt Miskolczi
Source of this ticket: [https://jira.cloudera.com/browse/BUG-111368] At the current implementation, Generator.java uses the default (I would say, it is the default behaviour if the functionality is not implemented) implementation to lineage information: [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/lineage/Generator.java#L165] That implementation just picks up ALL the available columns in input and that's all. Windowing functions have two expressions, partition, and order. None of them are analysed. The expected behaviour would be to include columns only that are affected in the windowing function. Some examples to reproduce the current behaviour: {code:java} create table source_tbl2(col_001 int, col_002 int, col_003 int, p1 int); create view b_v_4 as select * from (select col_001, row_number() over (partition by src.p1) as r_num from source_tbl2 src) v1; create view b_v_5 as select * from (select col_001, row_number() over (order by src.p1) as r_num from source_tbl2 src) v1; create view b_v_6 as select * from (select col_001, rank() over (partition by src.p1) as r_num from source_tbl2 src) v1; create view b_v_7 as select * from (select col_001, avg(src.col_002) over (partition by src.p1) as r_num from source_tbl2 src) v1; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)