[ https://issues.apache.org/jira/browse/HIVE-24762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288201#comment-17288201 ]
László Bodor commented on HIVE-24762: ------------------------------------- while investigating HIVE-24804, I found that string-based windows should not be supported, so instead of handling this as a correctness issue, we should throw an error compile-time in postgres: {code} create table vector_ptf_part_simple_text(p_mfgr varchar(10), p_name varchar(10), p_retailprice integer, rowindex varchar(10)); select p_mfgr, p_name, rowindex, count(*) over(partition by p_mfgr order by p_name range between 1 preceding and current row) as cs1, count(*) over(partition by p_mfgr order by p_name range between 3 preceding and current row) as cs2 from vector_ptf_part_simple_text; {code} *RANGE with offset PRECEDING/FOLLOWING is not supported for column type text* > StringValueBoundaryScanner ignores boundary which leads to incorrect results > ----------------------------------------------------------------------------- > > Key: HIVE-24762 > URL: https://issues.apache.org/jira/browse/HIVE-24762 > Project: Hive > Issue Type: Bug > Reporter: László Bodor > Assignee: László Bodor > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/ValueBoundaryScanner.java#L901 > {code} > public boolean isDistanceGreater(Object v1, Object v2, int amt) { > ... > return s1 != null && s2 != null && s1.compareTo(s2) > 0; > {code} > Like other boundary scanners, StringValueBoundaryScanner should take amt into > account, otherwise it'll result in the same range regardless of the given > window size. This typically affects queries where the range is defined on a > string column: > {code} > select p_mfgr, p_name, p_retailprice, > count(*) over(partition by p_mfgr order by p_name range between 1 preceding > and current row) as cs1, > count(*) over(partition by p_mfgr order by p_name range between 3 preceding > and current row) as cs2 > from vector_ptf_part_simple_orc; > {code} > with "> 0" cs1 and cs2 will be calculated on the same window, so cs1 == cs2, > but actually it should be different, this is the correct result (see "almond > antique olive coral navajo"): > {code} > +-----------------+---------------------------------------------+------+------+ > | p_mfgr | p_name | cs1 | cs2 > | > +-----------------+---------------------------------------------+------+------+ > | Manufacturer#1 | almond antique burnished rose metallic | 2 | 2 > | > | Manufacturer#1 | almond antique burnished rose metallic | 2 | 2 > | > | Manufacturer#1 | almond antique chartreuse lavender yellow | 6 | 6 > | > | Manufacturer#1 | almond antique chartreuse lavender yellow | 6 | 6 > | > | Manufacturer#1 | almond antique chartreuse lavender yellow | 6 | 6 > | > | Manufacturer#1 | almond antique chartreuse lavender yellow | 6 | 6 > | > | Manufacturer#1 | almond antique salmon chartreuse burlywood | 1 | 1 > | > | Manufacturer#1 | almond aquamarine burnished black steel | 1 | 8 > | > | Manufacturer#1 | almond aquamarine pink moccasin thistle | 4 | 4 > | > | Manufacturer#1 | almond aquamarine pink moccasin thistle | 4 | 4 > | > | Manufacturer#1 | almond aquamarine pink moccasin thistle | 4 | 4 > | > | Manufacturer#1 | almond aquamarine pink moccasin thistle | 4 | 4 > | > | Manufacturer#2 | almond antique violet chocolate turquoise | 1 | 1 > | > | Manufacturer#2 | almond antique violet turquoise frosted | 3 | 3 > | > | Manufacturer#2 | almond antique violet turquoise frosted | 3 | 3 > | > | Manufacturer#2 | almond antique violet turquoise frosted | 3 | 3 > | > | Manufacturer#2 | almond aquamarine midnight light salmon | 1 | 5 > | > | Manufacturer#2 | almond aquamarine rose maroon antique | 2 | 2 > | > | Manufacturer#2 | almond aquamarine rose maroon antique | 2 | 2 > | > | Manufacturer#2 | almond aquamarine sandy cyan gainsboro | 3 | 3 > | > | Manufacturer#3 | almond antique chartreuse khaki white | 1 | 1 > | > | Manufacturer#3 | almond antique forest lavender goldenrod | 4 | 5 > | > | Manufacturer#3 | almond antique forest lavender goldenrod | 4 | 5 > | > | Manufacturer#3 | almond antique forest lavender goldenrod | 4 | 5 > | > | Manufacturer#3 | almond antique forest lavender goldenrod | 4 | 5 > | > | Manufacturer#3 | almond antique metallic orange dim | 1 | 1 > | > | Manufacturer#3 | almond antique misty red olive | 1 | 1 > | > | Manufacturer#3 | almond antique olive coral navajo | 1 | 3 > | > | Manufacturer#4 | almond antique gainsboro frosted violet | 1 | 1 > | > | Manufacturer#4 | almond antique violet mint lemon | 1 | 1 > | > | Manufacturer#4 | almond aquamarine floral ivory bisque | 2 | 4 > | > | Manufacturer#4 | almond aquamarine floral ivory bisque | 2 | 4 > | > | Manufacturer#4 | almond aquamarine yellow dodger mint | 1 | 1 > | > | Manufacturer#4 | almond azure aquamarine papaya violet | 1 | 1 > | > | Manufacturer#5 | almond antique blue firebrick mint | 1 | 1 > | > | Manufacturer#5 | almond antique medium spring khaki | 2 | 2 > | > | Manufacturer#5 | almond antique medium spring khaki | 2 | 2 > | > | Manufacturer#5 | almond antique sky peru orange | 1 | 1 > | > | Manufacturer#5 | almond aquamarine dodger light gainsboro | 1 | 5 > | > | Manufacturer#5 | almond azure blanched chiffon midnight | 1 | 1 > | > +-----------------+---------------------------------------------+------+------+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)