Hi Xuanwo, Renjie I think sqllogictests is a good replacement on the JSON spec, and I’m definitely not trying to recommend on using JSON spec as I think it is very be too complex to execute.
As Renjie pointed out, sqllogictests only suitable when sql engine is supported, but right now not all of the client implementation has engine integration, like iceberg-go. Although sqllogictests could still be useful for provisioning tables and validating results via Iceberg Spark and can be extended later once engine integration is added. Few concerns I have right now are: - How complex would it be to integrate sqllogictest into non-Rust clients? - Should we centralize the shared Docker images and test suites, or let each client repo manage their own setup with flexibility to evolve as the development progresses? Will do some experiments with the sqllogictests offline and happy to discuss more! Best, Leon On Mon, Jun 9, 2025 at 2:34 AM Renjie Liu <liurenjie2...@gmail.com> wrote: > Hi, Leon: > > Thanks for raising this. > > In rust we also have similar plan to do integration tests against rust and > java implementation: https://github.com/apache/iceberg-rust/pull/581 > > This approach is pure data driven, as Xuanwo mentioned, motivated by > sqllogictests. That's to say, we will define a set of sql statements, and > they can be executed by spark sql and rust engine(datafusion in this > case). The downside of this method is that it requires integration with a > sql engine. Luckily in rust we have datafusion, but I'm not sure if this is > the case for python and go. > > On Sat, Jun 7, 2025 at 9:47 AM Xuanwo <xua...@apache.org> wrote: > >> Thank you Leon for starting this. >> >> It's very important for open formats like Iceberg to be interoperable >> across different implementations. And it's on the top list of iceberg-rust. >> >> My only concern is about the JSON spec. I'm thinking of if it's a good >> idea for us to adopt sqllogictests format: >> https://sqlite.org/sqllogictest/doc/trunk/about.wiki and >> https://github.com/risinglightdb/sqllogictest-rs. >> >> It's used by sqlite first and now is widely borrowed by many other SQL >> engines to build their test suites. >> >> It's something like: >> >> statement ok >> INSERT INTO a VALUES (42, 84); >> >> query II >> SELECT * FROM a; >> ---- >> 42 84 >> >> Basicly, we have a way to define the SQL we are using, what's resutl we >> are expecting and a way to hint. >> >> What do you think? >> >> >> On Sat, Jun 7, 2025, at 07:46, Leon Lin wrote: >> >> Hi Kevin, >> >> Thanks for bringing up the Arrow integration tests as a reference! I’ve >> looked into that setup as well. However, I found it difficult to apply the >> same model to Iceberg since Arrow and Iceberg are very different. Arrow >> tests are centered around in-memory serialization and deserialization using >> JSON-defined schema types, whereas Iceberg operates on persisted table >> state and requires more extensive infrastructure, like a catalog and >> storage, to run the integration tests. >> >> One of the alternative approaches listed in the doc has a similar >> producer / consumer strategy as Arrow, which is defining producer and >> consumer spec files in JSON that describe the actions clients should >> perform. Each client would then implement a runner that parses and executes >> those actions. However, mapping out every Iceberg capability with its >> inputs and expected outputs becomes quite complex, and I’m concerned it >> won’t scale well over time. >> >> Feel free to leave comments in the doc and let me know what you think. >> I’m happy to explore and experiment with other ideas! >> >> Thanks, >> Leon >> >> On Fri, Jun 6, 2025 at 12:39 PM Kevin Liu <kevinjq...@apache.org> wrote: >> >> Hi Leon, >> >> Thanks for starting this thread! I think this is a great idea. Happy to >> support this in any way I can. >> >> Matt Topol and I have previously discussed cross-client testing regarding >> the iceberg-go and iceberg-python implementations. There are a class of >> bugs that can be caught in this way. We somewhat do this today by copying >> over the integration test suite from iceberg-python to iceberg-go. I think >> even supporting a single verification step, through Spark, can provide us a >> lot of value in terms of testing for correctness. >> >> BTW, Matt mentioned that the Arrow ecosystem has similar integration >> tests across its clients. I haven't been able to look further, but he >> pointed me to >> https://github.com/apache/arrow/tree/main/dev/archery/archery/integration >> >> Looking forward to this! >> >> Best, >> Kevin Liu >> >> On Thu, Jun 5, 2025 at 4:56 PM Leon Lin <lianglin....@gmail.com> wrote: >> >> Hello all, >> >> I would like to start a discussion on standardizing the cross client >> integration testing in iceberg projects. With all the active development >> among the different client implementations (python, rust, go, etc), it will >> be important to make sure the implementations are interoperable between one >> another, making sure tables created by one client can be read and write by >> another client without any incompatibilities and help detect divergence >> between implementations early. >> >> There is already some great work done in PyIceberg to verify >> compatibility with iceberg java implementation with Spark, we could easily >> extend this to do two steps verification. I’ve outlined the details in the >> doc attached below. But the idea is to: >> >> - Write tables using PySpark and verify them with client-side read >> tests. >> - Write using the client and validate using PySpark scripts with >> assertions. >> >> While a full matrix testing would be ideal to verify interoperability >> between any combination of clients, but I haven’t able to find any clean >> way to do this without adding too much complexity or operational burden. >> I’d really appreciate any thoughts or ideas from the community, and I’m >> happy to contribute to moving this forward. >> >> Best, >> Leon Lin >> >> *References:* >> https://github.com/apache/iceberg-python/blob/main/tests/conftest.py#L2429 >> Issue: https://github.com/apache/iceberg/issues/13229 >> Standardize Cross Client Integration Testing >> <https://drive.google.com/open?id=1vZfVzGZucsDc35uoRrbn5CKHGd0muzcY7FAwmKt-KNI> >> >> Xuanwo >> >> https://xuanwo.io/ >> >>