This has nothing to do with the "Split" function. It is all about indexing in arrays.
The list of Dialect-specific operators in calcite includes OFFSET and ORDINAL for BigQuery. https://calcite.apache.org/docs/reference.html#dialect-specific-operators So you want to be able to write a[x] and it should be interpreted as a[OFFSET (x)] when using BigQuery? Mihai -----Original Message----- From: P.F. ZHAN Sent: Tuesday, August 08, 2023 11:37 AM To: [email protected] Subject: Re: [Discussion] Split function has some wrong and strange behaviors Hi, mbudiu. f.checkScalar can not reproduce this problem, it happens when converting SqlNode to RelNode. Maybe you could check this test case: > @Test void testIndexWithoutOperatorBigQuery() { String query = "SELECT > SPLIT('h,e,l,l,o')[1] FROM \"employee\""; String error = "BigQuery > requires an array subscript operator to index an array"; > sql(query).withBigQuery().withLibrary(SqlLibrary.BIG_QUERY).throws_(er > ror) > ; > } > I checked the doc reference of Calcite, the array index does start with 1. However, for BigQuery, if omit the array subscript operators, it is equivalent to offset(0), that is to say split('aaa-bbb-ccc', '-')[0] is equivalent to split('aaa-bbb-ccc', '-')[offset(0)]. The usage of omitting the array subscript operators doesn't appear on the BigQuery doc, but it works in this query engine. This seems by design for Calcite. However, my question is "can we remove this limitation and support the usage of omitting the array subscript operators to get the same result with BigQuery?" In other words, split('aaa-bbb-ccc', '-')[offset(0)] could be replaced by split('aaa-bbb-ccc', '-')[0] ? On Wed, Aug 9, 2023 at 2:13 AM <[email protected]> wrote: > I added the following tests to SqlOperatorTest and they passed: > > f.checkScalar("split('aaa-bbb-ccc', '-')[1]", "aaa", "VARCHAR"); > f.checkNull("split('aaa-bbb-ccc', '-')[0]"); > > Please note that in Calcite SQL the array indexes start from 1: > https://calcite.apache.org/docs/reference.html#value-constructors > > If you try to index out of bounds (with index 0) you will get a null > value. In BigQuery it seems to be illegal to index with an integer > https://cloud.google.com/bigquery/docs/arrays, you seem to need to use > either OFFSET or ORDINAL. > > But I don't see any exception, certainly not a parse exception. > > I don't understand what you mean by "Calcite does not support split > function without array subscript operators." > > Mihai > > -----Original Message----- > From: P.F. ZHAN > Sent: Tuesday, August 08, 2023 10:53 AM > To: [email protected] > Subject: Re: [Discussion] Split function has some wrong and strange > behaviors > > Sorry, > I executed the following four queries in BigQuery, and the results > displayed after the token "=>" > > query1. select split('aaa-bbb-ccc', '-')[0] => aaa query2. select > split('aaa-bbb-ccc', '-')[offset(0)] => aaa query3. select > split('aaa-bbb-ccc', '-')[3] => Array index 3 is out of bounds > (overflow) query4. select split('aaa-bbb-ccc', '-')[offset(3)] => > Array index 3 is out of bounds (overflow) > > If we use Calcite to parse query1 and query3, it will throw exceptions. > That's to say, Calcite does not support split function without array > subscript operators. > I wonder if we should support this kind of usage of this function? > > > > On Wed, Aug 9, 2023 at 1:23 AM <[email protected]> wrote: > > > The mailing list strips images from emails. > > > > > > > > Mihai > > > > > > > > From: P.F. ZHAN > > Sent: Tuesday, August 08, 2023 10:23 AM > > To: [email protected] > > Subject: Re: [Discussion] Split function has some wrong and strange > > behaviors > > > > > > > > Sorry, maybe I didn't describe it very well, which caused you to not > > understand what I meant. > > > > Here, I gives some query results to describe this problem. > > > > > > > > > > > > > > > > > > > > > > Calcite does not support split function without array subscript > > operator, I wonder should we support this kind of usage of this function? > > > > > > > > > > > > The BigQuery doc: > > > > > > https://cloud.google.com/bigquery/docs/reference/standard-sql/operat > > or > > s#array_subscript_operator > > > > > > > > > > > > On Wed, Aug 9, 2023 at 12:43 AM <[email protected] > > <mailto:[email protected]> > > > wrote: > > > > I don' t understand from this message what the problem with split is. > > > > Mihai > > > > -----Original Message----- > > From: P.F. ZHAN > > Sent: Tuesday, August 08, 2023 8:21 AM > > To: [email protected] <mailto:[email protected]> > > Subject: Re: [Discussion] Split function has some wrong and strange > > behaviors > > > > I rechecked Calcite's implementation of the split function today > > again, and I found that this behavior appears to be a by design behavior. > > > > > > > > > org.apache.calcite.rel.rel2sql.RelToSqlConverterTest#testIndexOper > > > at > > > or > > > sBigQuery > > > > > > org.apache.calcite.rel.rel2sql.RelToSqlConverterTest#testIndexWithou > > tO > > peratorBigQuery > > > > > > I also checked the BigQuery doc which only mentions these standard > > usages in our test case of *testIndexOperatorsBigQuery*, but in fact > > when we use SQL like > > > > > select split('aaa-bbb-ccc', '-')[0] > > > > the results are available in BigQuery and the behavior is the same > > as > > > > > select split('aaa-bbb-ccc', '-')[ offset(0)] > > > > Maybe Calcite should support this kind of usage rather than treat > > it as a limitation. > > > > > > > > On Tue, Aug 8, 2023 at 10:31 AM P.F. ZHAN <[email protected] <mailto: > > [email protected]> > wrote: > > > > > Thank you mbudiu, Ran Tao and Lake Shen. > > > > > > I have tried your suggestions and debugged the code, this kind of > > > sql seems not support > > > > > >> select split('aaa-bbb-ccc', '-')[1] > > >> > > > > > > I have checked it in both Bigquery and Spark, it works. This looks > > > like a bug at present. Can I open a jira issue to follow this problem? > > > > > > > > > On Tue, Aug 8, 2023 at 1:09 AM <[email protected] <mailto: > > [email protected]> > wrote: > > > > > >> See also this thread: > > >> https://issues.apache.org/jira/browse/CALCITE-5760 > > >> > > >> You can run CalciteSqlOperatorTest instead. > > >> If you want to quickly debug only this test in the IDE you can > > >> copy-paste this test into the CalciteSqlOperatorTest class and > > >> run it from there, then delete it when you are done. > > >> > > >> Mihai > > >> > > >> -----Original Message----- > > >> From: Ran Tao > > >> Sent: Monday, August 07, 2023 4:04 AM > > >> To: [email protected] <mailto:[email protected]> > > >> Subject: Re: [Discussion] Split function has some wrong and > > >> strange behaviors > > >> > > >> > I found that when I use "*./gradlew build*" to build the > > >> > project happens > > >> with some errors, but I execute this case in IDEA alone and it is > > >> successful. > > >> > > >> The default `fixture()` does not execute in unit tests (but works > > >> in ./gradlew build), you should use > > >> Fixture.forOperators(true) to execute in unit tests. You can > > >> check this usage, hope it helps you. > > >> > > >> > > >> Best Regards, > > >> Ran Tao > > >> https://github.com/chucheng92 > > >> > > >> > > >> P.F. ZHAN <[email protected] <mailto:[email protected]> > > > 于2023年8月7日周一 15:22写道: > > >> > > >> > Sorry, I made a mistake, used Chinese to describe my question. > > >> > > > >> > Hi, When I was testing the split function with a case as follows: > > >> > org.apache.calcite.test.SqlOperatorTest#testSplitFunction > > >> > > > >> > > f.checkScalar("SPLIT('h-e-l-l-o', '-')[0]", "h", "VARCHAR"); > > >> > > > > >> > > > >> > I found that when I use "*./gradlew build*" to build the > > >> > project happens with some errors, but I execute this case in > > >> > IDEA alone and it is successful. > > >> > > > >> > FAILURE 0.3sec, org.apache.calcite.test.CalciteSqlOperatorTest > > > >> > > testSplitFunction() > > >> > > java.lang.AssertionError: Query: values > > >> > > (SPLIT('h-e-l-l-o', > > >> '-')[0]) > > >> > > Expected: is "h" > > >> > > but: was null > > >> > > at > > >> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) > > >> > > at > > >> > > > > >> > org.apache.calcite.sql.test.ResultCheckers.compareResultSetWith > > >> > Ma > > >> > tc > > >> > her > > >> > (ResultCheckers.java:252) > > >> > > at > > >> > > > > >> > org.apache.calcite.sql.test.ResultCheckers$MatcherResultChecker > > >> > .c > > >> > he > > >> > ckR > > >> > esult(ResultCheckers.java:321) > > >> > > at > > >> > > > > >> > org.apache.calcite.test.SqlOperatorTest$TesterImpl.check(SqlOpe > > >> > ra > > >> > to > > >> > rTe > > >> > st.java:12150) > > >> > > at > > >> > org.apache.calcite.sql.test.SqlTester.check(SqlTester.java:160) > > >> > > at > > >> > > > > >> > org.apache.calcite.test.SqlOperatorFixtureImpl.lambda$checkScal > > >> > ar > > >> > $2 > > >> > (Sq > > >> > lOperatorFixtureImpl.java:224) > > >> > > at > > >> > > > > >> > org.apache.calcite.sql.test.AbstractSqlTester.forEachQuery(Abst > > >> > ra > > >> > ct > > >> > Sql > > >> > Tester.java:446) > > >> > > at > > >> > > > > >> > org.apache.calcite.test.SqlOperatorFixtureImpl.checkScalar(SqlO > > >> > pe > > >> > ra > > >> > tor > > >> > FixtureImpl.java:223) > > >> > > at > > >> > > > > >> > org.apache.calcite.sql.test.SqlOperatorFixture.checkScalar(SqlO > > >> > pe > > >> > ra > > >> > tor > > >> > Fixture.java:238) > > >> > > at > > >> > > > > >> > org.apache.calcite.test.SqlOperatorTest.testSplitFunction(SqlOp > > >> > er > > >> > at > > >> > orT > > >> > est.java:7657) > > >> > > at > > >> > > > > >> > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(Re > > >> > fl > > >> > ec > > >> > tio > > >> > nUtils.java:727) > > >> > > > > >> > > > >> > I also add a case in the file big-query.iq > > >> > <http://big-query.iq> > > >> > > > >> > SELECT SPLIT("h,e,l,l,o", ',')[1] as result; > > >> > > +------- + > > >> > > | result | > > >> > > +------- + > > >> > > | e | > > >> > > +------- + > > >> > > (1 row) > > >> > > > > >> > > !ok > > >> > > > > >> > > > >> > The error log is as follows: > > >> > > > >> > > FAILURE 27.6sec, org.apache.calcite.test.BabelQuidemTest > > > >> > > test(String)[3], [3] sql/big-query.iq <http://big-query.iq> > > >> > > org.opentest4j.AssertionFailedError: Files differ: > > >> > > ~/calcite/babel/build/quidem/test/sql/big-query.iq < > > http://big-query.iq> > > >> > > ~/calcite/babel/build/resources/test/sql/big-query.iq < > > http://big-query.iq> > > >> > > 759c759 > > >> > > < +------- + > > >> > > --- > > >> > > > +--------+ > > >> > > 761,763c761,763 > > >> > > < +------- + > > >> > > < | e | > > >> > > < +------- + > > >> > > --- > > >> > > > +--------+ > > >> > > > | h | > > >> > > > +--------+ > > >> > > > > >> > > at > > >> > > org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) > > >> > > at > > org.junit.jupiter.api.Assertions.fail(Assertions.java:135) > > >> > > at > > >> > org.apache.calcite.test.QuidemTest.checkRun(QuidemTest.java:183 > > >> > ) > > >> > > at > > >> > > org.apache.calcite.test.QuidemTest.test(QuidemTest.java:236) > > >> > > > > >> > > > > >> > On Mon, Aug 7, 2023 at 3:13 PM P.F. ZHAN <[email protected] > <mailto: > > [email protected]> > wrote: > > >> > > > >> > > 大家好。我在测试 split 函数功能的时候,添加了个 case > > >> > > org.apache.calcite.test.SqlOperatorTest#testSplitFunction > > >> > > > > >> > >> f.checkScalar("SPLIT('h-e-l-l-o', '-')[0]", "h", "VARCHAR"); > > >> > >> > > >> > > > > >> > > 我发现我使用 gradlew build 整个项目时会出错,但我单独在 IDEA 里面执行这个 case 是成功的。 > > >> > > > > >> > >> FAILURE 0.3sec, org.apache.calcite.test.CalciteSqlOperatorTest > > > > >> > >> testSplitFunction() > > >> > >> java.lang.AssertionError: Query: values > > >> > >> (SPLIT('h-e-l-l-o', > > >> '-')[0]) > > >> > >> Expected: is "h" > > >> > >> but: was null > > >> > >> at > > >> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) > > >> > >> at > > >> > >> > > >> > org.apache.calcite.sql.test.ResultCheckers.compareResultSetWith > > >> > Ma > > >> > tc > > >> > her > > >> > (ResultCheckers.java:252) > > >> > >> at > > >> > >> > > >> > org.apache.calcite.sql.test.ResultCheckers$MatcherResultChecker > > >> > .c > > >> > he > > >> > ckR > > >> > esult(ResultCheckers.java:321) > > >> > >> at > > >> > >> > > >> > org.apache.calcite.test.SqlOperatorTest$TesterImpl.check(SqlOpe > > >> > ra > > >> > to > > >> > rTe > > >> > st.java:12150) > > >> > >> at > > >> > org.apache.calcite.sql.test.SqlTester.check(SqlTester.java:160) > > >> > >> at > > >> > >> > > >> > org.apache.calcite.test.SqlOperatorFixtureImpl.lambda$checkScal > > >> > ar > > >> > $2 > > >> > (Sq > > >> > lOperatorFixtureImpl.java:224) > > >> > >> at > > >> > >> > > >> > org.apache.calcite.sql.test.AbstractSqlTester.forEachQuery(Abst > > >> > ra > > >> > ct > > >> > Sql > > >> > Tester.java:446) > > >> > >> at > > >> > >> > > >> > org.apache.calcite.test.SqlOperatorFixtureImpl.checkScalar(SqlO > > >> > pe > > >> > ra > > >> > tor > > >> > FixtureImpl.java:223) > > >> > >> at > > >> > >> > > >> > org.apache.calcite.sql.test.SqlOperatorFixture.checkScalar(SqlO > > >> > pe > > >> > ra > > >> > tor > > >> > Fixture.java:238) > > >> > >> at > > >> > >> > > >> > org.apache.calcite.test.SqlOperatorTest.testSplitFunction(SqlOp > > >> > er > > >> > at > > >> > orT > > >> > est.java:7657) > > >> > >> at > > >> > >> > > >> > org.junit.platform.commons.util.ReflectionUtils.invokeMethod(Re > > >> > fl > > >> > ec > > >> > tio > > >> > nUtils.java:727) > > >> > >> > > >> > > > > >> > > 我在big-query.iq <http://big-query.iq> 中加入下面这段 > > >> > > > > >> > >> SELECT SPLIT("h,e,l,l,o", ',')[1] as result; > > >> > >> +------- + > > >> > >> | result | > > >> > >> +------- + > > >> > >> | e | > > >> > >> +------- + > > >> > >> (1 row) > > >> > >> > > >> > >> !ok > > >> > >> > > >> > > > > >> > > 报错如下: > > >> > > > > >> > >> FAILURE 27.6sec, org.apache.calcite.test.BabelQuidemTest > > > >> > >> test(String)[3], [3] sql/big-query.iq <http://big-query.iq> > > >> > >> org.opentest4j.AssertionFailedError: Files differ: > > >> > >> ~/calcite/babel/build/quidem/test/sql/big-query.iq < > > http://big-query.iq> > > >> > >> ~/calcite/babel/build/resources/test/sql/big-query.iq < > > http://big-query.iq> > > >> > >> 759c759 > > >> > >> < +------- + > > >> > >> --- > > >> > >> > +--------+ > > >> > >> 761,763c761,763 > > >> > >> < +------- + > > >> > >> < | e | > > >> > >> < +------- + > > >> > >> --- > > >> > >> > +--------+ > > >> > >> > | h | > > >> > >> > +--------+ > > >> > >> > > >> > >> at > > >> > >> org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) > > >> > >> at > > org.junit.jupiter.api.Assertions.fail(Assertions.java:135) > > >> > >> at > > >> > >> org.apache.calcite.test.QuidemTest.checkRun(QuidemTest.java:183) > > >> > >> at > > >> > >> org.apache.calcite.test.QuidemTest.test(QuidemTest.java:236) > > >> > >> > > >> > > > > >> > > > >> > > >> > > > > > >
