Hi Matt,

If you want to work on getting this change in, I'd be happy to review it. I
think it is fine to support older, incorrectly written data. I looked
briefly at the PR and I think it needs to be updated to at least add tests
and to justify why the changes are correct. It looks like the repetition
and definition level thresholds are calculated based on the outermost level
and then left un-adjusted, rather than calculating from the correct path.

Ryan

On Mon, Feb 3, 2025 at 2:30 PM Matt Wallace <matt.wall...@imc.com.invalid>
wrote:

> I’d like to open discussion about the handling and support for so-called
> “2-level” lists in Parquet files by the Iceberg libraries.  This issue has
> been raised in https://github.com/apache/iceberg/issues/9497 and a PR was
> submitted at https://github.com/apache/iceberg/pull/9515.  However, this
> PR was not merged because it was brought up that the Iceberg specification
> says that 2-level lists are not supported by Iceberg.   The Parquet spec
> indicates that 3-level lists should be used for writing new Parquet files,
> but it also says that libraries may implement backwards compatibility.  Is
> there any strong reason not to do this?
>
> I have tested the fix proposed in PR 9515 and it works for me.  A strength
> of the Iceberg spec is that it doesn't require re-writing Parquet files in
> order to efficiently store metadata about these Parquet files.  However, by
> not supporting 2-level lists, Iceberg is cutting off support for a large
> subset of existing Parquet data.
>
> Thanks,
>
> Matt
>
>
> ________________________________
>
> The information in this e-mail is intended only for the person or entity
> to which it is addressed.
>
> It may contain confidential and /or privileged material, the disclosure of
> which is prohibited. Any unauthorized copying, disclosure or distribution
> of the information in this email outside your company is strictly forbidden.
>
> If you are not the intended recipient (or have received this email in
> error), please contact the sender immediately and permanently delete all
> copies of this email and any attachments from your computer system and
> destroy any hard copies. Although the information in this email has been
> compiled with great care, neither IMC nor any of its related entities shall
> accept any responsibility for any errors, omissions or other inaccuracies
> in this information or for the consequences thereof, nor shall it be bound
> in any way by the contents of this e-mail or its attachments.
>
> Messages and attachments are scanned for all known viruses. Always scan
> attachments before opening them.
>

Reply via email to