You should probably start a new discussion with this title. I don't know if this can be done.
Mihai -----Original Message----- From: P.F. ZHAN Sent: Tuesday, August 08, 2023 4:29 PM To: [email protected] Subject: Re: [Discussion] Split function has some wrong and strange behaviors Hi mbudiu, thanks for your patience. Yes, a[x] interpret as a[OFFSET(x)] for BigQuery. I tried this, and it is okay too. On Wed, Aug 9, 2023 at 02:48 <[email protected]> wrote: > This has nothing to do with the "Split" function. It is all about > indexing in arrays. > > The list of Dialect-specific operators in calcite includes OFFSET and > ORDINAL for BigQuery. > https://calcite.apache.org/docs/reference.html#dialect-specific-operat > ors > > So you want to be able to write a[x] and it should be interpreted as > a[OFFSET (x)] when using BigQuery? > > Mihai > > -----Original Message----- > From: P.F. ZHAN > Sent: Tuesday, August 08, 2023 11:37 AM > To: [email protected] > Subject: Re: [Discussion] Split function has some wrong and strange > behaviors > > Hi, mbudiu. > > f.checkScalar can not reproduce this problem, it happens when > converting SqlNode to RelNode. > Maybe you could check this test case: > > > @Test void testIndexWithoutOperatorBigQuery() { String query = > > "SELECT SPLIT('h,e,l,l,o')[1] FROM \"employee\""; String error = > > "BigQuery requires an array subscript operator to index an array"; > > sql(query).withBigQuery().withLibrary(SqlLibrary.BIG_QUERY).throws_( > > er > > ror) > > ; > > } > > > > I checked the doc reference of Calcite, the array index does start with 1. > However, for BigQuery, if omit the array subscript operators, it is > equivalent to offset(0), that is to say split('aaa-bbb-ccc', '-')[0] > is equivalent to split('aaa-bbb-ccc', '-')[offset(0)]. The usage of > omitting the array subscript operators doesn't appear on the BigQuery > doc, but it works in this query engine. > > This seems by design for Calcite. However, my question is "can we > remove this limitation and support the usage of omitting the array > subscript operators to get the same result with BigQuery?" In other > words, split('aaa-bbb-ccc', '-')[offset(0)] could be replaced by > split('aaa-bbb-ccc', '-')[0] ? > > On Wed, Aug 9, 2023 at 2:13 AM <[email protected]> wrote: > > > I added the following tests to SqlOperatorTest and they passed: > > > > f.checkScalar("split('aaa-bbb-ccc', '-')[1]", "aaa", "VARCHAR"); > > f.checkNull("split('aaa-bbb-ccc', '-')[0]"); > > > > Please note that in Calcite SQL the array indexes start from 1: > > https://calcite.apache.org/docs/reference.html#value-constructors > > > > If you try to index out of bounds (with index 0) you will get a null > > value. In BigQuery it seems to be illegal to index with an integer > > https://cloud.google.com/bigquery/docs/arrays, you seem to need to > > use either OFFSET or ORDINAL. > > > > But I don't see any exception, certainly not a parse exception. > > > > I don't understand what you mean by "Calcite does not support split > > function without array subscript operators." > > > > Mihai > > > > -----Original Message----- > > From: P.F. ZHAN > > Sent: Tuesday, August 08, 2023 10:53 AM > > To: [email protected] > > Subject: Re: [Discussion] Split function has some wrong and strange > > behaviors > > > > Sorry, > > I executed the following four queries in BigQuery, and the results > > displayed after the token "=>" > > > > query1. select split('aaa-bbb-ccc', '-')[0] => aaa query2. select > > split('aaa-bbb-ccc', '-')[offset(0)] => aaa query3. select > > split('aaa-bbb-ccc', '-')[3] => Array index 3 is out of bounds > > (overflow) query4. select split('aaa-bbb-ccc', '-')[offset(3)] => > > Array index 3 is out of bounds (overflow) > > > > If we use Calcite to parse query1 and query3, it will throw exceptions. > > That's to say, Calcite does not support split function without array > > subscript operators. > > I wonder if we should support this kind of usage of this function? > > > > > > > > On Wed, Aug 9, 2023 at 1:23 AM <[email protected]> wrote: > > > > > The mailing list strips images from emails. > > > > > > > > > > > > Mihai > > > > > > > > > > > > From: P.F. ZHAN > > > Sent: Tuesday, August 08, 2023 10:23 AM > > > To: [email protected] > > > Subject: Re: [Discussion] Split function has some wrong and > > > strange behaviors > > > > > > > > > > > > Sorry, maybe I didn't describe it very well, which caused you to > > > not understand what I meant. > > > > > > Here, I gives some query results to describe this problem. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Calcite does not support split function without array subscript > > > operator, I wonder should we support this kind of usage of this > function? > > > > > > > > > > > > > > > > > > The BigQuery doc: > > > > > > > > > https://cloud.google.com/bigquery/docs/reference/standard-sql/oper > > > at > > > or > > > s#array_subscript_operator > > > > > > > > > > > > > > > > > > On Wed, Aug 9, 2023 at 12:43 AM <[email protected] > > > <mailto:[email protected]> > > > > wrote: > > > > > > I don' t understand from this message what the problem with split is. > > > > > > Mihai > > > > > > -----Original Message----- > > > From: P.F. ZHAN > > > Sent: Tuesday, August 08, 2023 8:21 AM > > > To: [email protected] <mailto:[email protected]> > > > Subject: Re: [Discussion] Split function has some wrong and > > > strange behaviors > > > > > > I rechecked Calcite's implementation of the split function today > > > again, and I found that this behavior appears to be a by design > behavior. > > > > > > > > > > > > org.apache.calcite.rel.rel2sql.RelToSqlConverterTest#testIndexOp > > > > er > > > > at > > > > or > > > > sBigQuery > > > > > > > > > org.apache.calcite.rel.rel2sql.RelToSqlConverterTest#testIndexWith > > > ou > > > tO > > > peratorBigQuery > > > > > > > > > I also checked the BigQuery doc which only mentions these standard > > > usages in our test case of *testIndexOperatorsBigQuery*, but in > > > fact when we use SQL like > > > > > > > select split('aaa-bbb-ccc', '-')[0] > > > > > > the results are available in BigQuery and the behavior is the same > > > as > > > > > > > select split('aaa-bbb-ccc', '-')[ offset(0)] > > > > > > Maybe Calcite should support this kind of usage rather than treat > > > it as a limitation. > > > > > > > > > > > > On Tue, Aug 8, 2023 at 10:31 AM P.F. ZHAN <[email protected] <mailto: > > > [email protected]> > wrote: > > > > > > > Thank you mbudiu, Ran Tao and Lake Shen. > > > > > > > > I have tried your suggestions and debugged the code, this kind > > > > of sql seems not support > > > > > > > >> select split('aaa-bbb-ccc', '-')[1] > > > >> > > > > > > > > I have checked it in both Bigquery and Spark, it works. This > > > > looks like a bug at present. Can I open a jira issue to follow > > > > this > problem? > > > > > > > > > > > > On Tue, Aug 8, 2023 at 1:09 AM <[email protected] <mailto: > > > [email protected]> > wrote: > > > > > > > >> See also this thread: > > > >> https://issues.apache.org/jira/browse/CALCITE-5760 > > > >> > > > >> You can run CalciteSqlOperatorTest instead. > > > >> If you want to quickly debug only this test in the IDE you can > > > >> copy-paste this test into the CalciteSqlOperatorTest class and > > > >> run it from there, then delete it when you are done. > > > >> > > > >> Mihai > > > >> > > > >> -----Original Message----- > > > >> From: Ran Tao > > > >> Sent: Monday, August 07, 2023 4:04 AM > > > >> To: [email protected] <mailto:[email protected]> > > > >> Subject: Re: [Discussion] Split function has some wrong and > > > >> strange behaviors > > > >> > > > >> > I found that when I use "*./gradlew build*" to build the > > > >> > project happens > > > >> with some errors, but I execute this case in IDEA alone and it > > > >> is successful. > > > >> > > > >> The default `fixture()` does not execute in unit tests (but > > > >> works in ./gradlew build), you should use > > > >> Fixture.forOperators(true) to execute in unit tests. You can > > > >> check this usage, hope it helps you. > > > >> > > > >> > > > >> Best Regards, > > > >> Ran Tao > > > >> https://github.com/chucheng92 > > > >> > > > >> > > > >> P.F. ZHAN <[email protected] <mailto:[email protected]> > > > > 于2023年8月7日周一 15:22写道: > > > >> > > > >> > Sorry, I made a mistake, used Chinese to describe my question. > > > >> > > > > >> > Hi, When I was testing the split function with a case as follows: > > > >> > org.apache.calcite.test.SqlOperatorTest#testSplitFunction > > > >> > > > > >> > > f.checkScalar("SPLIT('h-e-l-l-o', '-')[0]", "h", > > > >> > > "VARCHAR"); > > > >> > > > > > >> > > > > >> > I found that when I use "*./gradlew build*" to build the > > > >> > project happens with some errors, but I execute this case in > > > >> > IDEA alone and it is successful. > > > >> > > > > >> > FAILURE 0.3sec, org.apache.calcite.test.CalciteSqlOperatorTest > > > > >> > > testSplitFunction() > > > >> > > java.lang.AssertionError: Query: values > > > >> > > (SPLIT('h-e-l-l-o', > > > >> '-')[0]) > > > >> > > Expected: is "h" > > > >> > > but: was null > > > >> > > at > > > >> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) > > > >> > > at > > > >> > > > > > >> > org.apache.calcite.sql.test.ResultCheckers.compareResultSetWi > > > >> > th > > > >> > Ma > > > >> > tc > > > >> > her > > > >> > (ResultCheckers.java:252) > > > >> > > at > > > >> > > > > > >> > org.apache.calcite.sql.test.ResultCheckers$MatcherResultCheck > > > >> > er > > > >> > .c > > > >> > he > > > >> > ckR > > > >> > esult(ResultCheckers.java:321) > > > >> > > at > > > >> > > > > > >> > org.apache.calcite.test.SqlOperatorTest$TesterImpl.check(SqlO > > > >> > pe > > > >> > ra > > > >> > to > > > >> > rTe > > > >> > st.java:12150) > > > >> > > at > > > >> > org.apache.calcite.sql.test.SqlTester.check(SqlTester.java:16 > > > >> > 0) > > > >> > > at > > > >> > > > > > >> > org.apache.calcite.test.SqlOperatorFixtureImpl.lambda$checkSc > > > >> > al > > > >> > ar > > > >> > $2 > > > >> > (Sq > > > >> > lOperatorFixtureImpl.java:224) > > > >> > > at > > > >> > > > > > >> > org.apache.calcite.sql.test.AbstractSqlTester.forEachQuery(Ab > > > >> > st > > > >> > ra > > > >> > ct > > > >> > Sql > > > >> > Tester.java:446) > > > >> > > at > > > >> > > > > > >> > org.apache.calcite.test.SqlOperatorFixtureImpl.checkScalar(Sq > > > >> > lO > > > >> > pe > > > >> > ra > > > >> > tor > > > >> > FixtureImpl.java:223) > > > >> > > at > > > >> > > > > > >> > org.apache.calcite.sql.test.SqlOperatorFixture.checkScalar(Sq > > > >> > lO > > > >> > pe > > > >> > ra > > > >> > tor > > > >> > Fixture.java:238) > > > >> > > at > > > >> > > > > > >> > org.apache.calcite.test.SqlOperatorTest.testSplitFunction(Sql > > > >> > Op > > > >> > er > > > >> > at > > > >> > orT > > > >> > est.java:7657) > > > >> > > at > > > >> > > > > > >> > org.junit.platform.commons.util.ReflectionUtils.invokeMethod( > > > >> > Re > > > >> > fl > > > >> > ec > > > >> > tio > > > >> > nUtils.java:727) > > > >> > > > > > >> > > > > >> > I also add a case in the file big-query.iq > > > >> > <http://big-query.iq> > > > >> > > > > >> > SELECT SPLIT("h,e,l,l,o", ',')[1] as result; > > > >> > > +------- + > > > >> > > | result | > > > >> > > +------- + > > > >> > > | e | > > > >> > > +------- + > > > >> > > (1 row) > > > >> > > > > > >> > > !ok > > > >> > > > > > >> > > > > >> > The error log is as follows: > > > >> > > > > >> > > FAILURE 27.6sec, org.apache.calcite.test.BabelQuidemTest > > > > >> > > test(String)[3], [3] sql/big-query.iq <http://big-query.iq> > > > >> > > org.opentest4j.AssertionFailedError: Files differ: > > > >> > > ~/calcite/babel/build/quidem/test/sql/big-query.iq < > > > http://big-query.iq> > > > >> > > ~/calcite/babel/build/resources/test/sql/big-query.iq < > > > http://big-query.iq> > > > >> > > 759c759 > > > >> > > < +------- + > > > >> > > --- > > > >> > > > +--------+ > > > >> > > 761,763c761,763 > > > >> > > < +------- + > > > >> > > < | e | > > > >> > > < +------- + > > > >> > > --- > > > >> > > > +--------+ > > > >> > > > | h | > > > >> > > > +--------+ > > > >> > > > > > >> > > at > > > >> > > > org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) > > > >> > > at > > > org.junit.jupiter.api.Assertions.fail(Assertions.java:135) > > > >> > > at > > > >> > org.apache.calcite.test.QuidemTest.checkRun(QuidemTest.java:1 > > > >> > 83 > > > >> > ) > > > >> > > at > > > >> > > org.apache.calcite.test.QuidemTest.test(QuidemTest.java:236 > > > >> > > ) > > > >> > > > > > >> > > > > > >> > On Mon, Aug 7, 2023 at 3:13 PM P.F. ZHAN <[email protected] > > <mailto: > > > [email protected]> > wrote: > > > >> > > > > >> > > 大家好。我在测试 split 函数功能的时候,添加了个 case > > > >> > > org.apache.calcite.test.SqlOperatorTest#testSplitFunction > > > >> > > > > > >> > >> f.checkScalar("SPLIT('h-e-l-l-o', '-')[0]", "h", > > > >> > >> "VARCHAR"); > > > >> > >> > > > >> > > > > > >> > > 我发现我使用 gradlew build 整个项目时会出错,但我单独在 IDEA 里面执行这个 case 是成功的。 > > > >> > > > > > >> > >> FAILURE 0.3sec, > org.apache.calcite.test.CalciteSqlOperatorTest > > > > > > >> > >> testSplitFunction() > > > >> > >> java.lang.AssertionError: Query: values > > > >> > >> (SPLIT('h-e-l-l-o', > > > >> '-')[0]) > > > >> > >> Expected: is "h" > > > >> > >> but: was null > > > >> > >> at > > > >> org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) > > > >> > >> at > > > >> > >> > > > >> > org.apache.calcite.sql.test.ResultCheckers.compareResultSetWi > > > >> > th > > > >> > Ma > > > >> > tc > > > >> > her > > > >> > (ResultCheckers.java:252) > > > >> > >> at > > > >> > >> > > > >> > org.apache.calcite.sql.test.ResultCheckers$MatcherResultCheck > > > >> > er > > > >> > .c > > > >> > he > > > >> > ckR > > > >> > esult(ResultCheckers.java:321) > > > >> > >> at > > > >> > >> > > > >> > org.apache.calcite.test.SqlOperatorTest$TesterImpl.check(SqlO > > > >> > pe > > > >> > ra > > > >> > to > > > >> > rTe > > > >> > st.java:12150) > > > >> > >> at > > > >> > org.apache.calcite.sql.test.SqlTester.check(SqlTester.java:16 > > > >> > 0) > > > >> > >> at > > > >> > >> > > > >> > org.apache.calcite.test.SqlOperatorFixtureImpl.lambda$checkSc > > > >> > al > > > >> > ar > > > >> > $2 > > > >> > (Sq > > > >> > lOperatorFixtureImpl.java:224) > > > >> > >> at > > > >> > >> > > > >> > org.apache.calcite.sql.test.AbstractSqlTester.forEachQuery(Ab > > > >> > st > > > >> > ra > > > >> > ct > > > >> > Sql > > > >> > Tester.java:446) > > > >> > >> at > > > >> > >> > > > >> > org.apache.calcite.test.SqlOperatorFixtureImpl.checkScalar(Sq > > > >> > lO > > > >> > pe > > > >> > ra > > > >> > tor > > > >> > FixtureImpl.java:223) > > > >> > >> at > > > >> > >> > > > >> > org.apache.calcite.sql.test.SqlOperatorFixture.checkScalar(Sq > > > >> > lO > > > >> > pe > > > >> > ra > > > >> > tor > > > >> > Fixture.java:238) > > > >> > >> at > > > >> > >> > > > >> > org.apache.calcite.test.SqlOperatorTest.testSplitFunction(Sql > > > >> > Op > > > >> > er > > > >> > at > > > >> > orT > > > >> > est.java:7657) > > > >> > >> at > > > >> > >> > > > >> > org.junit.platform.commons.util.ReflectionUtils.invokeMethod( > > > >> > Re > > > >> > fl > > > >> > ec > > > >> > tio > > > >> > nUtils.java:727) > > > >> > >> > > > >> > > > > > >> > > 我在big-query.iq <http://big-query.iq> 中加入下面这段 > > > >> > > > > > >> > >> SELECT SPLIT("h,e,l,l,o", ',')[1] as result; > > > >> > >> +------- + > > > >> > >> | result | > > > >> > >> +------- + > > > >> > >> | e | > > > >> > >> +------- + > > > >> > >> (1 row) > > > >> > >> > > > >> > >> !ok > > > >> > >> > > > >> > > > > > >> > > 报错如下: > > > >> > > > > > >> > >> FAILURE 27.6sec, org.apache.calcite.test.BabelQuidemTest > > > >> > >> > test(String)[3], [3] sql/big-query.iq <http://big-query.iq> > > > >> > >> org.opentest4j.AssertionFailedError: Files differ: > > > >> > >> ~/calcite/babel/build/quidem/test/sql/big-query.iq < > > > http://big-query.iq> > > > >> > >> ~/calcite/babel/build/resources/test/sql/big-query.iq < > > > http://big-query.iq> > > > >> > >> 759c759 > > > >> > >> < +------- + > > > >> > >> --- > > > >> > >> > +--------+ > > > >> > >> 761,763c761,763 > > > >> > >> < +------- + > > > >> > >> < | e | > > > >> > >> < +------- + > > > >> > >> --- > > > >> > >> > +--------+ > > > >> > >> > | h | > > > >> > >> > +--------+ > > > >> > >> > > > >> > >> at > > > >> > >> > org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:38) > > > >> > >> at > > > org.junit.jupiter.api.Assertions.fail(Assertions.java:135) > > > >> > >> at > > > >> > >> > org.apache.calcite.test.QuidemTest.checkRun(QuidemTest.java:183) > > > >> > >> at > > > >> > >> org.apache.calcite.test.QuidemTest.test(QuidemTest.java:23 > > > >> > >> 6) > > > >> > >> > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > > > > >
