Hello all,

I would like to start a discussion on standardizing the cross client
integration testing in iceberg projects. With all the active development
among the different client implementations (python, rust, go, etc), it will
be important to make sure the implementations are interoperable between one
another, making sure tables created by one client can be read and write by
another client without any incompatibilities and help detect divergence
between implementations early.

There is already some great work done in PyIceberg to verify compatibility
with iceberg java implementation with Spark, we could easily extend this to
do two steps verification. I’ve outlined the details in the doc attached
below. But the idea is to:

   - Write tables using PySpark and verify them with client-side read tests.
   - Write using the client and validate using PySpark scripts with
   assertions.

While a full matrix testing would be ideal to verify interoperability
between any combination of clients, but I haven’t able to find any clean
way to do this without adding too much complexity or operational burden.
I’d really appreciate any thoughts or ideas from the community, and I’m
happy to contribute to moving this forward.

Best,
Leon Lin

*References:*
https://github.com/apache/iceberg-python/blob/main/tests/conftest.py#L2429
Issue: https://github.com/apache/iceberg/issues/13229
 Standardize Cross Client Integration Testing
<https://drive.google.com/open?id=1vZfVzGZucsDc35uoRrbn5CKHGd0muzcY7FAwmKt-KNI>

Reply via email to