Hi Leon,

Thanks for starting this thread! I think this is a great idea. Happy to
support this in any way I can.

Matt Topol and I have previously discussed cross-client testing regarding
the iceberg-go and iceberg-python implementations. There are a class of
bugs that can be caught in this way. We somewhat do this today by copying
over the integration test suite from iceberg-python to iceberg-go. I think
even supporting a single verification step, through Spark, can provide us a
lot of value in terms of testing for correctness.

BTW,  Matt mentioned that the Arrow ecosystem has similar integration tests
across its clients. I haven't been able to look further, but he pointed me
to https://github.com/apache/arrow/tree/main/dev/archery/archery/integration

Looking forward to this!

Best,
Kevin Liu

On Thu, Jun 5, 2025 at 4:56 PM Leon Lin <lianglin....@gmail.com> wrote:

> Hello all,
>
> I would like to start a discussion on standardizing the cross client
> integration testing in iceberg projects. With all the active development
> among the different client implementations (python, rust, go, etc), it will
> be important to make sure the implementations are interoperable between one
> another, making sure tables created by one client can be read and write by
> another client without any incompatibilities and help detect divergence
> between implementations early.
>
> There is already some great work done in PyIceberg to verify compatibility
> with iceberg java implementation with Spark, we could easily extend this to
> do two steps verification. I’ve outlined the details in the doc attached
> below. But the idea is to:
>
>    - Write tables using PySpark and verify them with client-side read
>    tests.
>    - Write using the client and validate using PySpark scripts with
>    assertions.
>
> While a full matrix testing would be ideal to verify interoperability
> between any combination of clients, but I haven’t able to find any clean
> way to do this without adding too much complexity or operational burden.
> I’d really appreciate any thoughts or ideas from the community, and I’m
> happy to contribute to moving this forward.
>
> Best,
> Leon Lin
>
> *References:*
> https://github.com/apache/iceberg-python/blob/main/tests/conftest.py#L2429
> Issue: https://github.com/apache/iceberg/issues/13229
>  Standardize Cross Client Integration Testing
> <https://drive.google.com/open?id=1vZfVzGZucsDc35uoRrbn5CKHGd0muzcY7FAwmKt-KNI>
>

Reply via email to