Hello all, I would like to start a discussion on standardizing the cross client integration testing in iceberg projects. With all the active development among the different client implementations (python, rust, go, etc), it will be important to make sure the implementations are interoperable between one another, making sure tables created by one client can be read and write by another client without any incompatibilities and help detect divergence between implementations early.
There is already some great work done in PyIceberg to verify compatibility with iceberg java implementation with Spark, we could easily extend this to do two steps verification. I’ve outlined the details in the doc attached below. But the idea is to: - Write tables using PySpark and verify them with client-side read tests. - Write using the client and validate using PySpark scripts with assertions. While a full matrix testing would be ideal to verify interoperability between any combination of clients, but I haven’t able to find any clean way to do this without adding too much complexity or operational burden. I’d really appreciate any thoughts or ideas from the community, and I’m happy to contribute to moving this forward. Best, Leon Lin *References:* https://github.com/apache/iceberg-python/blob/main/tests/conftest.py#L2429 Issue: https://github.com/apache/iceberg/issues/13229 Standardize Cross Client Integration Testing <https://drive.google.com/open?id=1vZfVzGZucsDc35uoRrbn5CKHGd0muzcY7FAwmKt-KNI>