Hi Leon, Thanks for starting this thread! I think this is a great idea. Happy to support this in any way I can.
Matt Topol and I have previously discussed cross-client testing regarding the iceberg-go and iceberg-python implementations. There are a class of bugs that can be caught in this way. We somewhat do this today by copying over the integration test suite from iceberg-python to iceberg-go. I think even supporting a single verification step, through Spark, can provide us a lot of value in terms of testing for correctness. BTW, Matt mentioned that the Arrow ecosystem has similar integration tests across its clients. I haven't been able to look further, but he pointed me to https://github.com/apache/arrow/tree/main/dev/archery/archery/integration Looking forward to this! Best, Kevin Liu On Thu, Jun 5, 2025 at 4:56 PM Leon Lin <lianglin....@gmail.com> wrote: > Hello all, > > I would like to start a discussion on standardizing the cross client > integration testing in iceberg projects. With all the active development > among the different client implementations (python, rust, go, etc), it will > be important to make sure the implementations are interoperable between one > another, making sure tables created by one client can be read and write by > another client without any incompatibilities and help detect divergence > between implementations early. > > There is already some great work done in PyIceberg to verify compatibility > with iceberg java implementation with Spark, we could easily extend this to > do two steps verification. I’ve outlined the details in the doc attached > below. But the idea is to: > > - Write tables using PySpark and verify them with client-side read > tests. > - Write using the client and validate using PySpark scripts with > assertions. > > While a full matrix testing would be ideal to verify interoperability > between any combination of clients, but I haven’t able to find any clean > way to do this without adding too much complexity or operational burden. > I’d really appreciate any thoughts or ideas from the community, and I’m > happy to contribute to moving this forward. > > Best, > Leon Lin > > *References:* > https://github.com/apache/iceberg-python/blob/main/tests/conftest.py#L2429 > Issue: https://github.com/apache/iceberg/issues/13229 > Standardize Cross Client Integration Testing > <https://drive.google.com/open?id=1vZfVzGZucsDc35uoRrbn5CKHGd0muzcY7FAwmKt-KNI> >