Hi all,

Today Iceberg has five language implementations (Java, Python, Rust, Go,
C++), each in its own repository. We naturally see divergence in how the
spec is interpreted across them, even as we take a lot of care writing it
as expressively as possible.

One way I'd like to propose to improve on this is by having a shared
fixture repository, modeled on arrow-testing and parquet-testing. The
repository will be focused on hosting fixtures for the edge cases where
that divergence shows up. These fixtures will act as additive checks to the
subprojects' existing test frameworks, give the community a single place to
anchor spec discussions about literal values, and make it cheaper for
subprojects to validate their interpretation of the spec against a known
set of values.

To POC this, I seeded a fork with existing, known issues, and integrating
the tests also surfaced new issues. Here is what that looks like in my
forks:
- iceberg-testing (proposed fixture repository):
https://github.com/sungwy/iceberg-testing
- pyiceberg against it: https://github.com/sungwy/iceberg-python/pull/1
- iceberg-rust against it: https://github.com/sungwy/iceberg-rust/pull/2

Let me know your thoughts. Here's a detailed doc [1] where we can discuss
specifics regarding the project scope and layout for those interested.

Thanks,
Sung

[1]
https://docs.google.com/document/d/1diwxjG24IMW9jSkkyG8fet1YFDorDU1PNOFJ6tSWEWQ/edit?tab=t.0

Reply via email to