For udf pushdown, I think we can refer to spark sql jdbc[1]. I tried it with spark sql, which pushed my desired udf down to the corresponding jdbc data source, which required some minor changes.
Best wishes, Cancai Cai [1] https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc Bertil Chapuis <bchap...@gmail.com> 于2024年2月27日周二 14:28写道: > Having the ability to write tests with testcontainers would greatly help > at improving the PostGIS dialect. > > Until recently, the official docker image for PostGIS lacked support for > arm64. However, one can easily change the underlying image used by > testcontainers [1]. I guess that spawning a Spark SQL container shouldn’t > be too difficult. > > As a side note, I recently opened CALCITE-6281 [2]. From what I > understand, it is currently not possible to push down UDFs (such as the ST > functions) to the database. More work in operator tables seems to be > required and additional tests would probably help at clarifying the > situation. > > [1] > https://github.com/bchapuis/calcite-postgis-tests/blob/f7d7df1dbff67d5beed2872f44f92b85ed18c18f/src/test/java/com/github/bchapuis/calcite_postgis_tests/AbstractPostgisTest.java#L38 > [2] https://issues.apache.org/jira/browse/CALCITE-6271 > > > > > On 26 Feb 2024, at 23:48, Julian Hyde <jh...@apache.org> wrote: > > > > I have logged https://issues.apache.org/jira/browse/CALCITE-6281. > > > > On Mon, Feb 26, 2024 at 12:06 PM Hanumath Maduri > > <hanumathmad...@gmail.com> wrote: > >> > >> It seems to have support for PostGIS, but I didn't test it though. > >> > >> > https://github.com/testcontainers/testcontainers-java/blob/main/modules/postgresql/src/main/java/org/testcontainers/containers/PostgisContainerProvider.java > >> https://testcontainers.com/modules/postgis/ > >> > >> Thanks > >> Hanu > >> > >> On Mon, Feb 26, 2024 at 11:20 AM Julian Hyde <jh...@apache.org> wrote: > >> > >>> One more thing. The geospatial library, including Bertil's proposed > >>> PostGIS dialect [5], would also benefit from this effort. We would > >>> want to ensure that "ST_Contains(ST_Point(0.0, 0.0), ST_Point(0.0, > >>> 0.0))" returns the same value on PostGIS as Calcite. Does > >>> testContainers support PostGIS? > >>> > >>> Julian > >>> > >>> [5] https://issues.apache.org/jira/browse/CALCITE-6239 > >>> > >>> On Mon, Feb 26, 2024 at 11:07 AM Julian Hyde <jh...@apache.org> wrote: > >>>> > >>>> Thank you, everyone for your thoughts so far. (And let's keep > talking!) > >>>> > >>>> I don't yet see a full, perfect solution. But I see a number of > >>>> techniques/tools we can use. For example, I was not aware of > >>>> testContainers but it seems to very quickly address our most common > >>>> use cases (MySQL + Postgres). Also, while I like Quidem it probably > >>>> should not be our first solution to this problem. > >>>> > >>>> I have two concerns about testContainers: > >>>> 1. it makes our test suite heavier: it adds latency, memory > >>>> requirements, flakiness, and the requirement for Docker; > >>>> 2. it doesn't solve other important dialects, such as BigQuery. > >>>> > >>>> To solve (1), we could add the tests to an optional module, such as > >>>> 'plus'. To solve (2) we need to provide another way to get a > >>>> connection to a reference database. (We need that for the other huge > >>>> project, the converse of this project, to test whether Calcite's > >>>> generated SQL is compliant [1].) > >>>> > >>>> I think we should start by merging Stamatis' PR into the 'plus' > >>>> module, and add some tests for CONCAT, which we have already > >>>> implemented, and has different behaviors in MySQL, Postgres, Oracle > >>>> [2][3]. Let's enable that test in CI and make sure that it doesn't > >>>> introduce flakiness. > >>>> > >>>> Next, we could make that test invoke tests that are already defined in > >>>> SqlOperatorTest, in addition to/instead of, the CSV data in Stamatis' > >>>> PR. > >>>> > >>>> Next, devise a way to test against Spark SQL and BigQuery, and check > >>>> our implementation of SOUNDEX (different in Spark than > >>>> Postgres/Oracle) and SUBSTR/SUBSTRING (different on Postgres/BigQuery) > >>>> [4]. > >>>> > >>>> After that, Quidem, somehow. > >>>> > >>>> Julian > >>>> > >>>> [1] https://issues.apache.org/jira/browse/CALCITE-5529 > >>>> [2] https://issues.apache.org/jira/browse/CALCITE-5771 > >>>> [3] https://issues.apache.org/jira/browse/CALCITE-5745 > >>>> [4] https://issues.apache.org/jira/browse/CALCITE-4427 > >>>> > >>>> On Mon, Feb 26, 2024 at 10:14 AM Mihai Budiu <mbu...@gmail.com> > wrote: > >>>>> > >>>>> I have already contributed an implementation of Sql Logic Test for > >>> Calcite, it is part of the "plus" project of Calcite. > >>>>> > >>>>> SLT is completely complementary with the problem described here. SLT > >>> tests only the very core of the SQL language, which should be common > to all > >>> implementations. It only uses 3 data types (integer, string, decimal), > and > >>> it never calls any non-standard functions. > >>>>> > >>>>> If you configure Calcite in a specific way, it can pass all SLT > tests. > >>> But some optimization passes are unsound, and some of these are used > in the > >>> default configuration (when you use Calcite through a JDBC connection). > >>> That's the reason why currently the SLT tests are not being regularly > run. > >>>>> > >>>>> In our project we took a very similar approach to the one described > by > >>> Julian: when we adopt a SQL function from a dialect (we attempt to > support > >>> functions from many dialects), we test it thoroughly on the original > >>> database and write all the same tests using our Calcite > implementation. If > >>> something doesn't match, I file an issue with Calcite. > >>>>> > >>>>> We test > >>>>> > >>>>> * > >>>>> compile-time evaluation (for constant expressions) > >>>>> * > >>>>> run-time evaluation (including of the same expressions) > >>>>> * > >>>>> tests that should fail at compilation time (e.g. type errors) > >>>>> * > >>>>> tests that should fail at runtime (e.g., division by zero) > >>>>> > >>>>> We try to take the tests from the official test suites, e.g.: > Postgres > >>> regress: > >>> > https://github.com/postgres/postgres/tree/bcd5b4bcbefc8b99be3678cf9c6c990a0c28f68e/src/test/regress/expected > >>>>> We have written parsers that read the expected output from the > >>> official tests so that we can validate the results (for positive tests) > >>> while reusing as much as possible the original tests without > introducing > >>> bugs in the translation of the tests. > >>>>> > >>>>> As Yiwen Wu says, in general it is impossible to execute a function > >>> for a specific dialect on a different runtime and get the same result. > But > >>> this does not mean we shouldn't implement the functions precisely. In > our > >>> runtime we reimplement all the functions in Rust, and we can support > >>> functions from different runtimes with no problem. But we still need > the > >>> Java implementation to match the Rust implementation and with the > original > >>> database implementation. > >>>>> > >>>>> Quidem may be the best way to do it, but it needs negative tests too. > >>>>> > >>>>> Mihai > >>>>> > >>>>> ________________________________ > >>>>> From: Stamatis Zampetakis <zabe...@gmail.com> > >>>>> Sent: Monday, February 26, 2024 5:31 AM > >>>>> To: dev@calcite.apache.org <dev@calcite.apache.org> > >>>>> Subject: Re: [DISCUSS] Ensuring that Calcite is consistent with other > >>> SQL systems > >>>>> > >>>>> The Quidem approach definitely makes sense. > >>>>> > >>>>> Alternatively, since we are focusing on the behavior of functions > with > >>>>> certain inputs it may be simpler to go with a Java parameterized test > >>>>> backed by a CSV/JSON file ([1] outlines the general idea). > >>>>> As others mentioned, using the org.testcontainers framework can > >>>>> significantly alleviate the burden of setting up and managing > >>>>> different DBMS. > >>>>> > >>>>> Best, > >>>>> Stamatis > >>>>> > >>>>> [1] https://github.com/apache/calcite/pull/3704 > >>>>> > >>>>> > >>>>> On Mon, Feb 26, 2024 at 4:01 AM Yiwen Wu <yiwenwu1...@gmail.com> > >>> wrote: > >>>>>> > >>>>>> I think it is difficult and impossible to make calcite function > >>> results > >>>>>> completely consistent with other SQL systems. In practice, when > >>>>>> calculations are pushed down to different data source adapters for > >>>>>> execution, there is no guarantee that the final results will be > >>> completely > >>>>>> consistent in calcite. > >>>>>> > >>>>>> I think, for the built-in functions defined in > >>> `SqlStdOperatorTable`, we > >>>>>> can choose to follow one of the definitive SQL systems, such as > >>> Oracle. For > >>>>>> the engine-related extension functions in `SqlLibraryOperators`, we > >>> can > >>>>>> follow the behavior related to the specific engine. If an already > >>> defined > >>>>>> function behaves inconsistently with the new engine, we can add a > new > >>>>>> Function to the definition, such as the `CONCAT` function. > >>>>>> > >>>>>> At the same time, I agree it is a great suggestion to add execution > >>> tests > >>>>>> of different engines in Quidem, which is very effective in verifying > >>> the > >>>>>> engine execution behavior. > >>>>>> > >>>>>> Cancai Cai <caic68...@gmail.com> 于2024年2月26日周一 09:59写道: > >>>>>> > >>>>>>> Thank you very much to the calcite community for raising these > >>> questions. > >>>>>>> This is what I have been doubting. I am very sorry that this doubt > >>> has been > >>>>>>> discussed for so long. > >>>>>>> > >>>>>>> Maybe we also need to consider another issue, that is, the > >>> database version > >>>>>>> issue. Versions like mysql and postgres are very stable, but > >>> components > >>>>>>> like spark still seem to have function bugs. exists, then how > >>> should we > >>>>>>> consider version issues? > >>>>>>> > >>>>>>> I don't know what I can do. Recently I am sorting out some > >>> documents about > >>>>>>> the use of some functions of mysql and postgres in calcite. I > >>> don't know if > >>>>>>> this is helpful. > >>>>>>> > >>>>>>> Best wishes, > >>>>>>> Cancai Cai > >>>>>>> > >>> > >