[ 
https://issues.apache.org/jira/browse/HIVE-24762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288201#comment-17288201
 ] 

László Bodor commented on HIVE-24762:
-------------------------------------

while investigating HIVE-24804, I found that string-based windows should not be 
supported, so instead of handling this as a correctness issue, we should throw 
an error compile-time
in postgres:
{code}
create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name 
varchar(10), p_retailprice integer, rowindex varchar(10));

select p_mfgr, p_name, rowindex,
count(*) over(partition by p_mfgr order by p_name range between 1 preceding and 
current row) as cs1,
count(*) over(partition by p_mfgr order by p_name range between 3 preceding and 
current row) as cs2
from vector_ptf_part_simple_text;
{code}
*RANGE with offset PRECEDING/FOLLOWING is not supported for column type text*

>  StringValueBoundaryScanner ignores boundary which leads to incorrect results
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-24762
>                 URL: https://issues.apache.org/jira/browse/HIVE-24762
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L901
> {code}
>   public boolean isDistanceGreater(Object v1, Object v2, int amt) {
>     ...
>     return s1 != null && s2 != null && s1.compareTo(s2) > 0;
> {code}
> Like other boundary scanners, StringValueBoundaryScanner should take amt into 
> account, otherwise it'll result in the same range regardless of the given 
> window size. This typically affects queries where the range is defined on a 
> string column:
> {code}
> select p_mfgr, p_name, p_retailprice,
> count(*) over(partition by p_mfgr order by p_name range between 1 preceding 
> and current row) as cs1,
> count(*) over(partition by p_mfgr order by p_name range between 3 preceding 
> and current row) as cs2
> from vector_ptf_part_simple_orc;
> {code} 
> with "> 0" cs1 and cs2 will be calculated on the same window, so cs1 == cs2, 
> but actually it should be different, this is the correct result (see "almond 
> antique olive coral navajo"):
> {code}
> +-----------------+---------------------------------------------+------+------+
> |     p_mfgr      |                   p_name                    | cs1  | cs2  
> |
> +-----------------+---------------------------------------------+------+------+
> | Manufacturer#1  | almond antique burnished rose metallic      | 2    | 2    
> |
> | Manufacturer#1  | almond antique burnished rose metallic      | 2    | 2    
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6    | 6    
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6    | 6    
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6    | 6    
> |
> | Manufacturer#1  | almond antique chartreuse lavender yellow   | 6    | 6    
> |
> | Manufacturer#1  | almond antique salmon chartreuse burlywood  | 1    | 1    
> |
> | Manufacturer#1  | almond aquamarine burnished black steel     | 1    | 8    
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle     | 4    | 4    
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle     | 4    | 4    
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle     | 4    | 4    
> |
> | Manufacturer#1  | almond aquamarine pink moccasin thistle     | 4    | 4    
> |
> | Manufacturer#2  | almond antique violet chocolate turquoise   | 1    | 1    
> |
> | Manufacturer#2  | almond antique violet turquoise frosted     | 3    | 3    
> |
> | Manufacturer#2  | almond antique violet turquoise frosted     | 3    | 3    
> |
> | Manufacturer#2  | almond antique violet turquoise frosted     | 3    | 3    
> |
> | Manufacturer#2  | almond aquamarine midnight light salmon     | 1    | 5    
> |
> | Manufacturer#2  | almond aquamarine rose maroon antique       | 2    | 2    
> |
> | Manufacturer#2  | almond aquamarine rose maroon antique       | 2    | 2    
> |
> | Manufacturer#2  | almond aquamarine sandy cyan gainsboro      | 3    | 3    
> |
> | Manufacturer#3  | almond antique chartreuse khaki white       | 1    | 1    
> |
> | Manufacturer#3  | almond antique forest lavender goldenrod    | 4    | 5    
> |
> | Manufacturer#3  | almond antique forest lavender goldenrod    | 4    | 5    
> |
> | Manufacturer#3  | almond antique forest lavender goldenrod    | 4    | 5    
> |
> | Manufacturer#3  | almond antique forest lavender goldenrod    | 4    | 5    
> |
> | Manufacturer#3  | almond antique metallic orange dim          | 1    | 1    
> |
> | Manufacturer#3  | almond antique misty red olive              | 1    | 1    
> |
> | Manufacturer#3  | almond antique olive coral navajo           | 1    | 3    
> |
> | Manufacturer#4  | almond antique gainsboro frosted violet     | 1    | 1    
> |
> | Manufacturer#4  | almond antique violet mint lemon            | 1    | 1    
> |
> | Manufacturer#4  | almond aquamarine floral ivory bisque       | 2    | 4    
> |
> | Manufacturer#4  | almond aquamarine floral ivory bisque       | 2    | 4    
> |
> | Manufacturer#4  | almond aquamarine yellow dodger mint        | 1    | 1    
> |
> | Manufacturer#4  | almond azure aquamarine papaya violet       | 1    | 1    
> |
> | Manufacturer#5  | almond antique blue firebrick mint          | 1    | 1    
> |
> | Manufacturer#5  | almond antique medium spring khaki          | 2    | 2    
> |
> | Manufacturer#5  | almond antique medium spring khaki          | 2    | 2    
> |
> | Manufacturer#5  | almond antique sky peru orange              | 1    | 1    
> |
> | Manufacturer#5  | almond aquamarine dodger light gainsboro    | 1    | 5    
> |
> | Manufacturer#5  | almond azure blanched chiffon midnight      | 1    | 1    
> |
> +-----------------+---------------------------------------------+------+------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to