Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

Cancai Cai Tue, 27 Feb 2024 18:15:01 -0800

For udf pushdown, I think we can refer to spark sql jdbc[1]. I tried it
with spark sql, which pushed my desired udf down to the corresponding jdbc
data source, which required some minor changes.


Best wishes,
Cancai Cai

[1]
https://github.com/apache/spark/tree/master/sql/core/src/main/scala/org/apache/spark/sql/jdbc

Bertil Chapuis <bchap...@gmail.com> 于2024年2月27日周二 14:28写道：

> Having the ability to write tests with testcontainers would greatly help
> at improving the PostGIS dialect.
>
> Until recently, the official docker image for PostGIS lacked support for
> arm64. However, one can easily change the underlying image used by
> testcontainers [1]. I guess that spawning a Spark SQL container shouldn’t
> be too difficult.
>
> As a side note, I recently opened CALCITE-6281 [2]. From what I
> understand, it is currently not possible to push down UDFs (such as the ST
> functions) to the database. More work in operator tables seems to be
> required and additional tests would probably help at clarifying the
> situation.
>
> [1]
> https://github.com/bchapuis/calcite-postgis-tests/blob/f7d7df1dbff67d5beed2872f44f92b85ed18c18f/src/test/java/com/github/bchapuis/calcite_postgis_tests/AbstractPostgisTest.java#L38
> [2] https://issues.apache.org/jira/browse/CALCITE-6271
>
>
>
> > On 26 Feb 2024, at 23:48, Julian Hyde <jh...@apache.org> wrote:
> >
> > I have logged https://issues.apache.org/jira/browse/CALCITE-6281.
> >
> > On Mon, Feb 26, 2024 at 12:06 PM Hanumath Maduri
> > <hanumathmad...@gmail.com> wrote:
> >>
> >> It seems to have support for PostGIS, but I didn't test it though.
> >>
> >>
> https://github.com/testcontainers/testcontainers-java/blob/main/modules/postgresql/src/main/java/org/testcontainers/containers/PostgisContainerProvider.java
> >> https://testcontainers.com/modules/postgis/
> >>
> >> Thanks
> >> Hanu
> >>
> >> On Mon, Feb 26, 2024 at 11:20 AM Julian Hyde <jh...@apache.org> wrote:
> >>
> >>> One more thing. The geospatial library, including Bertil's proposed
> >>> PostGIS dialect [5], would also benefit from this effort. We would
> >>> want to ensure that "ST_Contains(ST_Point(0.0, 0.0), ST_Point(0.0,
> >>> 0.0))" returns the same value on PostGIS as Calcite. Does
> >>> testContainers support PostGIS?
> >>>
> >>> Julian
> >>>
> >>> [5] https://issues.apache.org/jira/browse/CALCITE-6239
> >>>
> >>> On Mon, Feb 26, 2024 at 11:07 AM Julian Hyde <jh...@apache.org> wrote:
> >>>>
> >>>> Thank you, everyone for your thoughts so far. (And let's keep
> talking!)
> >>>>
> >>>> I don't yet see a full, perfect solution. But I see a number of
> >>>> techniques/tools we can use. For example, I was not aware of
> >>>> testContainers but it seems to very quickly address our most common
> >>>> use cases (MySQL + Postgres). Also, while I like Quidem it probably
> >>>> should not be our first solution to this problem.
> >>>>
> >>>> I have two concerns about testContainers:
> >>>> 1. it makes our test suite heavier: it adds latency, memory
> >>>> requirements, flakiness, and the requirement for Docker;
> >>>> 2. it doesn't solve other important dialects, such as BigQuery.
> >>>>
> >>>> To solve (1), we could add the tests to an optional module, such as
> >>>> 'plus'. To solve (2) we need to provide another way to get a
> >>>> connection to a reference database. (We need that for the other huge
> >>>> project, the converse of this project, to test whether Calcite's
> >>>> generated SQL is compliant [1].)
> >>>>
> >>>> I think we should start by merging Stamatis' PR into the 'plus'
> >>>> module, and add some tests for CONCAT, which we have already
> >>>> implemented, and has different behaviors in MySQL, Postgres, Oracle
> >>>> [2][3]. Let's enable that test in CI and make sure that it doesn't
> >>>> introduce flakiness.
> >>>>
> >>>> Next, we could make that test invoke tests that are already defined in
> >>>> SqlOperatorTest, in addition to/instead of, the CSV data in Stamatis'
> >>>> PR.
> >>>>
> >>>> Next, devise a way to test against Spark SQL and BigQuery, and check
> >>>> our implementation of SOUNDEX (different in Spark than
> >>>> Postgres/Oracle) and SUBSTR/SUBSTRING (different on Postgres/BigQuery)
> >>>> [4].
> >>>>
> >>>> After that, Quidem, somehow.
> >>>>
> >>>> Julian
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/CALCITE-5529
> >>>> [2] https://issues.apache.org/jira/browse/CALCITE-5771
> >>>> [3] https://issues.apache.org/jira/browse/CALCITE-5745
> >>>> [4] https://issues.apache.org/jira/browse/CALCITE-4427
> >>>>
> >>>> On Mon, Feb 26, 2024 at 10:14 AM Mihai Budiu <mbu...@gmail.com>
> wrote:
> >>>>>
> >>>>> I have already contributed an implementation of Sql Logic Test for
> >>> Calcite, it is part of the "plus" project of Calcite.
> >>>>>
> >>>>> SLT is completely complementary with the problem described here. SLT
> >>> tests only the very core of the SQL language, which should be common
> to all
> >>> implementations. It only uses 3 data types (integer, string, decimal),
> and
> >>> it never calls any non-standard functions.
> >>>>>
> >>>>> If you configure Calcite in a specific way, it can pass all SLT
> tests.
> >>> But some optimization passes are unsound, and some of these are used
> in the
> >>> default configuration (when you use Calcite through a JDBC connection).
> >>> That's the reason why currently the SLT tests are not being regularly
> run.
> >>>>>
> >>>>> In our project we took a very similar approach to the one described
> by
> >>> Julian: when we adopt a SQL function from a dialect (we attempt to
> support
> >>> functions from many dialects), we test it thoroughly on the original
> >>> database and write all the same tests using our Calcite
> implementation. If
> >>> something doesn't match, I file an issue with Calcite.
> >>>>>
> >>>>> We test
> >>>>>
> >>>>>  *
> >>>>> compile-time evaluation (for constant expressions)
> >>>>>  *
> >>>>> run-time evaluation (including of the same expressions)
> >>>>>  *
> >>>>> tests that should fail at compilation time (e.g. type errors)
> >>>>>  *
> >>>>> tests that should fail at runtime (e.g., division by zero)
> >>>>>
> >>>>> We try to take the tests from the official test suites, e.g.:
> Postgres
> >>> regress:
> >>>
> https://github.com/postgres/postgres/tree/bcd5b4bcbefc8b99be3678cf9c6c990a0c28f68e/src/test/regress/expected
> >>>>> We have written parsers that read the expected output from the
> >>> official tests so that we can validate the results (for positive tests)
> >>> while reusing as much as possible the original tests without
> introducing
> >>> bugs in the translation of the tests.
> >>>>>
> >>>>> As Yiwen Wu says, in general it is impossible to execute a function
> >>> for a specific dialect on a different runtime and get the same result.
> But
> >>> this does not mean we shouldn't implement the functions precisely. In
> our
> >>> runtime we reimplement all the functions in Rust, and we can support
> >>> functions from different runtimes with no problem. But we still need
> the
> >>> Java implementation to match the Rust implementation and with the
> original
> >>> database implementation.
> >>>>>
> >>>>> Quidem may be the best way to do it, but it needs negative tests too.
> >>>>>
> >>>>> Mihai
> >>>>>
> >>>>> ________________________________
> >>>>> From: Stamatis Zampetakis <zabe...@gmail.com>
> >>>>> Sent: Monday, February 26, 2024 5:31 AM
> >>>>> To: dev@calcite.apache.org <dev@calcite.apache.org>
> >>>>> Subject: Re: [DISCUSS] Ensuring that Calcite is consistent with other
> >>> SQL systems
> >>>>>
> >>>>> The Quidem approach definitely makes sense.
> >>>>>
> >>>>> Alternatively, since we are focusing on the behavior of functions
> with
> >>>>> certain inputs it may be simpler to go with a Java parameterized test
> >>>>> backed by a CSV/JSON file ([1] outlines the general idea).
> >>>>> As others mentioned, using the org.testcontainers framework can
> >>>>> significantly alleviate the burden of setting up and managing
> >>>>> different DBMS.
> >>>>>
> >>>>> Best,
> >>>>> Stamatis
> >>>>>
> >>>>> [1] https://github.com/apache/calcite/pull/3704
> >>>>>
> >>>>>
> >>>>> On Mon, Feb 26, 2024 at 4:01 AM Yiwen Wu <yiwenwu1...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>> I think it is difficult and impossible to make calcite function
> >>> results
> >>>>>> completely consistent with other SQL systems. In practice, when
> >>>>>> calculations are pushed down to different data source adapters for
> >>>>>> execution, there is no guarantee that the final results will be
> >>> completely
> >>>>>> consistent in calcite.
> >>>>>>
> >>>>>> I think, for the built-in functions defined in
> >>> `SqlStdOperatorTable`, we
> >>>>>> can choose to follow one of the definitive SQL systems, such as
> >>> Oracle. For
> >>>>>> the engine-related extension functions in `SqlLibraryOperators`, we
> >>> can
> >>>>>> follow the behavior related to the specific engine. If an already
> >>> defined
> >>>>>> function behaves inconsistently with the new engine, we can add a
> new
> >>>>>> Function to the definition, such as the `CONCAT` function.
> >>>>>>
> >>>>>> At the same time, I agree it is a great suggestion to add execution
> >>> tests
> >>>>>> of different engines in Quidem, which is very effective in verifying
> >>> the
> >>>>>> engine execution behavior.
> >>>>>>
> >>>>>> Cancai Cai <caic68...@gmail.com> 于2024年2月26日周一 09:59写道：
> >>>>>>
> >>>>>>> Thank you very much to the calcite community for raising these
> >>> questions.
> >>>>>>> This is what I have been doubting. I am very sorry that this doubt
> >>> has been
> >>>>>>> discussed for so long.
> >>>>>>>
> >>>>>>> Maybe we also need to consider another issue, that is, the
> >>> database version
> >>>>>>> issue. Versions like mysql and postgres are very stable, but
> >>> components
> >>>>>>> like spark still seem to have function bugs. exists, then how
> >>> should we
> >>>>>>> consider version issues?
> >>>>>>>
> >>>>>>> I don't know what I can do. Recently I am sorting out some
> >>> documents about
> >>>>>>> the use of some functions of mysql and postgres in calcite. I
> >>> don't know if
> >>>>>>> this is helpful.
> >>>>>>>
> >>>>>>> Best wishes,
> >>>>>>> Cancai Cai
> >>>>>>>
> >>>
>
>

Re: [DISCUSS] Ensuring that Calcite is consistent with other SQL systems

Reply via email to