[ https://issues.apache.org/jira/browse/HIVE-24245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247120#comment-17247120 ]
László Bodor commented on HIVE-24245: ------------------------------------- merged to master, thanks [~pgaref] for the review and [~chiran54321] for the repro steps! > Vectorized PTF with count and distinct over partition producing incorrect > results. > ---------------------------------------------------------------------------------- > > Key: HIVE-24245 > URL: https://issues.apache.org/jira/browse/HIVE-24245 > Project: Hive > Issue Type: Bug > Components: Hive, PTF-Windowing, Vectorization > Affects Versions: 3.1.0, 3.1.2 > Reporter: Chiran Ravani > Assignee: László Bodor > Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Vectorized PTF for count and distinct over partition is broken. It produces > incorrect results. > Below is the test case. > {code} > CREATE TABLE bigd781b_new ( > id int, > txt1 string, > txt2 string, > cda_date int, > cda_job_name varchar(12)); > INSERT INTO bigd781b_new VALUES > (1,'2010005759','7164335675012038',20200528,'load1'), > (2,'2010005759','7164335675012038',20200528,'load2'); > {code} > Running below query produces incorrect results > {code} > SELECT > txt1, > txt2, > count(distinct txt1) over(partition by txt1) as n, > count(distinct txt2) over(partition by txt2) as m > FROM bigd781b_new > {code} > as below. > {code} > +-------------+-------------------+----+----+ > | txt1 | txt2 | n | m | > +-------------+-------------------+----+----+ > | 2010005759 | 7164335675012038 | 2 | 2 | > | 2010005759 | 7164335675012038 | 2 | 2 | > +-------------+-------------------+----+----+ > {code} > While the correct output would be > {code} > +-------------+-------------------+----+----+ > | txt1 | txt2 | n | m | > +-------------+-------------------+----+----+ > | 2010005759 | 7164335675012038 | 1 | 1 | > | 2010005759 | 7164335675012038 | 1 | 1 | > +-------------+-------------------+----+----+ > {code} > The problem does not appear after setting below property > set hive.vectorized.execution.ptf.enabled=false; -- This message was sent by Atlassian Jira (v8.3.4#803005)