[ 
https://issues.apache.org/jira/browse/HIVE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-9000:
------------------------
    Description: 
LAST_VALUE Windowing function has been returning bad results, as far as I can 
tell from day 1.

And, it seems like the tests are also asserting that LAST_VALUE gives the wrong 
result.

Here's the test output:
https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587

The query is:
{code}
select t, s, i, last_value(i) over (partition by t order by s) from over10k 
where (s = 'oscar allen' or s = 'oscar carson') and t = 10
{code}

The result is:
{code}
t              s                    i          last_value(i)
-------------------------------------------------------
10      oscar allen     65662   65662
10      oscar carson    65549   65549
{code}

{{LAST_VALUE( i )}} should have returned 65549 in both records, instead it 
simply ends up returning i.

Another way you can make sure LAST_VALUE is bad is to verify it's result 
against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value 
should always be more (in terms of the specified 'order by s') than the lead by 
1. While this doesn't directly apply to the above query, if the result set had 
more rows, you would clearly see records where lead is higher than last_value 
which is semantically incorrect.

  was:
LAST_VALUE Windowing function has been returning bad results, as far as I can 
tell from day 1.

And, it seems like the tests are also asserting that LAST_VALUE gives the wrong 
result.

Here's the test output:
https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587

The query is:
{code}
select t, s, i, last_value(i) over (partition by t order by s) 
{code}

The result is:
{code}
t              s                    i          last_value(i)
-------------------------------------------------------
10      oscar allen     65662   65662
10      oscar carson    65549   65549
{code}

{{LAST_VALUE( i )}} should have returned 65549 in both records, instead it 
simply ends up returning i.

Another way you can make sure LAST_VALUE is bad is to verify it's result 
against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last value 
should always be more (in terms of the specified 'order by s') than the lead by 
1. While this doesn't directly apply to the above query, if the result set had 
more rows, you would clearly see records where lead is higher than last_value 
which is semantically incorrect.


> LAST_VALUE Window function returns wrong results
> ------------------------------------------------
>
>                 Key: HIVE-9000
>                 URL: https://issues.apache.org/jira/browse/HIVE-9000
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing
>    Affects Versions: 0.13.1
>            Reporter: Mark Grover
>            Priority: Critical
>             Fix For: 0.14.1
>
>
> LAST_VALUE Windowing function has been returning bad results, as far as I can 
> tell from day 1.
> And, it seems like the tests are also asserting that LAST_VALUE gives the 
> wrong result.
> Here's the test output:
> https://github.com/apache/hive/blob/branch-0.14/ql/src/test/results/clientpositive/windowing_navfn.q.out#L587
> The query is:
> {code}
> select t, s, i, last_value(i) over (partition by t order by s) from over10k 
> where (s = 'oscar allen' or s = 'oscar carson') and t = 10
> {code}
> The result is:
> {code}
> t              s                    i          last_value(i)
> -------------------------------------------------------
> 10    oscar allen     65662   65662
> 10    oscar carson    65549   65549
> {code}
> {{LAST_VALUE( i )}} should have returned 65549 in both records, instead it 
> simply ends up returning i.
> Another way you can make sure LAST_VALUE is bad is to verify it's result 
> against LEAD(i,1) over (partition by t order by s). LAST_VALUE being last 
> value should always be more (in terms of the specified 'order by s') than the 
> lead by 1. While this doesn't directly apply to the above query, if the 
> result set had more rows, you would clearly see records where lead is higher 
> than last_value which is semantically incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to