[ https://issues.apache.org/jira/browse/FLINK-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
luoyuxia updated FLINK-28212: ----------------------------- Description: Can be reproduced by following sql {code:java} CREATE TABLE alltypesorc( ctinyint TINYINT, csmallint SMALLINT, cint INT, cbigint BIGINT, cfloat FLOAT, cdouble DOUBLE, cstring1 STRING, cstring2 STRING, ctimestamp1 TIMESTAMP, ctimestamp2 TIMESTAMP, cboolean1 BOOLEAN, cboolean2 BOOLEAN); select a.ctinyint, a.cint, count(a.cdouble) over(partition by a.ctinyint order by a.cint desc rows between 1 preceding and 1 following) from alltypesorc {code} Then it will throw the exception "caused by: java.lang.IndexOutOfBoundsException: index (7) must be less than size (1)". The reson is for such sql, Hive dialect will generate a RelNode: {code:java} LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, _o__c2]) LogicalProject(ctinyint=[$0], cint=[$2], _o__c2=[$12]) LogicalProject(ctinyint=[$0], csmallint=[$1], cint=[$2], cbigint=[$3], cfloat=[$4], cdouble=[$5], cstring1=[$6], cstring2=[$7], ctimestamp1=[$8], ctimestamp2=[$9], cboolean1=[$10], cboolean2=[$11], _o__col13=[COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)]) LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code} Note: the first ProjectNode from down to top conatins all fields. And as the "{*}1{*} PRECEDING AND *1* FOLLOWING" in windows will be converted to field access. So, the window will be like {code:java} COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN $12 PRECEDING AND $12 FOLLOWING{code} But the in rule "ProjectWindowTransposeRule", the uncesscassy field(not refered by the top project and window) will be removed, so the the input of the window will only contains 3 fields (ctinyint, cint, cdouble). Finally, in RelExplainUtil, when explain boundString, it won't find {*}$12{*}, so the exception throws. {code:java} val ref = bound.getOffset.asInstanceOf[RexInputRef] // ref.getIndex will be 12, but input size of the window is 3 val boundIndex = ref.getIndex - calcOriginInputRows(window) // the window's constants only contains one single element "1" val offset = window.constants.get(boundIndex).getValue2 val offsetKind = if (bound.isPreceding) "PRECEDING" else "FOLLOWING" s"$offset $offsetKind" {code} was: Can be reproduced by following sql {code:java} CREATE TABLE alltypesorc( ctinyint TINYINT, csmallint SMALLINT, cint INT, cbigint BIGINT, cfloat FLOAT, cdouble DOUBLE, cstring1 STRING, cstring2 STRING, ctimestamp1 TIMESTAMP, ctimestamp2 TIMESTAMP, cboolean1 BOOLEAN, cboolean2 BOOLEAN); select a.ctinyint, a.cint, count(a.cdouble) over(partition by a.ctinyint order by a.cint desc rows between 1 preceding and 1 following) from alltypesorc {code} Then it will throw Caused by: java.lang.IndexOutOfBoundsException: index (7) must be less than size (1) > IndexOutOfBoundsException is thrown when project contains window which > dosen't refer all fields of input when using Hive dialect > -------------------------------------------------------------------------------------------------------------------------------- > > Key: FLINK-28212 > URL: https://issues.apache.org/jira/browse/FLINK-28212 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive > Reporter: luoyuxia > Priority: Major > Fix For: 1.16.0 > > > Can be reproduced by following sql > {code:java} > CREATE TABLE alltypesorc( > ctinyint TINYINT, > csmallint SMALLINT, > cint INT, > cbigint BIGINT, > cfloat FLOAT, > cdouble DOUBLE, > cstring1 STRING, > cstring2 STRING, > ctimestamp1 TIMESTAMP, > ctimestamp2 TIMESTAMP, > cboolean1 BOOLEAN, > cboolean2 BOOLEAN); > select a.ctinyint, a.cint, count(a.cdouble) > over(partition by a.ctinyint order by a.cint desc > rows between 1 preceding and 1 following) > from alltypesorc {code} > Then it will throw the exception "caused by: > java.lang.IndexOutOfBoundsException: index (7) must be less than size (1)". > > The reson is for such sql, Hive dialect will generate a RelNode: > > {code:java} > LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, _o__c2]) > LogicalProject(ctinyint=[$0], cint=[$2], _o__c2=[$12]) > LogicalProject(ctinyint=[$0], csmallint=[$1], cint=[$2], cbigint=[$3], > cfloat=[$4], cdouble=[$5], cstring1=[$6], cstring2=[$7], ctimestamp1=[$8], > ctimestamp2=[$9], cboolean1=[$10], cboolean2=[$11], _o__col13=[COUNT($5) OVER > (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 > FOLLOWING)]) > LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code} > Note: the first ProjectNode from down to top conatins all fields. > > And as the "{*}1{*} PRECEDING AND *1* FOLLOWING" in windows will be > converted to field access. So, the window will be like > > {code:java} > COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN $12 > PRECEDING AND $12 FOLLOWING{code} > > But the in rule "ProjectWindowTransposeRule", the uncesscassy field(not > refered by the top project and window) will be removed, > so the the input of the window will only contains 3 fields (ctinyint, cint, > cdouble). > Finally, in RelExplainUtil, when explain boundString, it won't find > {*}$12{*}, so the exception throws. > > {code:java} > val ref = bound.getOffset.asInstanceOf[RexInputRef] > // ref.getIndex will be 12, but input size of the window is 3 > val boundIndex = ref.getIndex - calcOriginInputRows(window) > // the window's constants only contains one single element "1" > val offset = window.constants.get(boundIndex).getValue2 > val offsetKind = if (bound.isPreceding) "PRECEDING" else "FOLLOWING" > s"$offset $offsetKind" {code} > > -- This message was sent by Atlassian Jira (v8.20.7#820007)