Re: [DISCUSS] Introducing Iceberg Features ?

Jean-Baptiste Onofré Tue, 15 Apr 2025 23:24:20 -0700

Hi Xuanwo

Thanks for the feedback. Fair enough.


Regards
JB

Le mer. 16 avr. 2025 à 05:44, Xuanwo <[email protected]> a écrit :

> Hi, JB
>
> Thank you for starting this discussion. Based on my experience with
> Parquet, when a specification allows readers and writers to freely choose
> which features to use, it often leads to the entire ecosystem relying on
> only the minimal feature set. As a result, many valuable features are
> overlooked. For example, Bloom filters in Parquet are extremely useful, but
> they are rarely supported by writers, which in turn leads to minimal
> support from readers as well.
>
> So I personally support the ON/OFF method, which means the engine must
> fully implement v3.
>
> On Wed, Apr 16, 2025, at 03:18, Jean-Baptiste Onofré wrote:
>
> Thanks for your feedback.
>
> I got your points. My question was more about the features that an engine
> (reader/writer) should support: for v3 it means that an engine will have to
> implement/support all features from v3 (required features). They can stay
> on v2 or fully update to v3. That makes sense to me for the engine. My
> question came because v3 includes a lot of changes, some requiring “checks”
> on metadata (a bit complex for the reader/writer).
>
> Thanks for the feedback again !
>
> Regards
> JB
>
> Le mar. 15 avr. 2025 à 20:54, Russell Spitzer <[email protected]>
> a écrit :
>
> I'm not a big fan of this, I am currently a strong supporter of the V3 is
> V3 approach. This is one of the reasons we decided to make row-lineage
> mandatory, we want to avoid encouraging engines from selectively adopting
> requirements.
>
> On Tue, Apr 15, 2025 at 1:42 PM Fokko Driesprong <[email protected]> wrote:
>
> Hey JB,
>
> Thanks for raising this. This would be another way of indicating (next to
> the format version) what's supported. At first glance, I'm reluctant to add
> this. For two reasons:
>
>    1. Because of the added complexity, both from a technical perspective,
>    and because it also might confuse downstream users, for example, an engine
>    does support Iceberg V3, but not variant type.
>    2. As you indicated, this is similar to what Delta has. One issue that
>    they are experiencing is that the users expect that you should also be able
>    to disable features. For example, when you have row-lineage enabled, and
>    you want to read the table with an engine that does not support
>    row-lineage, there is an expectation to disable row-lineage. This is
>    different from what we support today with the format-version which only
>    allows upgrades (and not downgrades), this will also add a lot of
>    complexity to the codebase.
>
> Curious to learn what others think.
>
> Kind regards,
> Fokko
>
> Op ma 14 apr 2025 om 19:56 schreef Brian Hulette <[email protected]>:
>
> As a consumer of Iceberg metadata I think something like this might be
> helpful. We used approach #2 for adding partial Iceberg V2 support to
> BigQuery external tables, but this was more straightforward as we just had
> to detect the existence of delete files. With V3 we will have to be very
> confident that we can detect all of the unsupported features before we add
> support for any one of them.
>
> That being said I don't think that will be *that* difficult. Would it be
> very hard for metadata producers to populate this?
>
> On Mon, Apr 14, 2025 at 8:48 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
>
> Hi folks,
>
> I started to work on multi args transforms, and you probably saw
> Fokko's proposal about the way to deal with source-id/source-ids to
> ensure backward compatibility.
>
> While working on the changes on iceberg-core/iceberg-java, I'm
> wondering if we should not introduce Iceberg Features on metadata.
> Let me explain what I have in mind.
> In Table Spec V3, we have new functionalities: new types (timestamp
> nz, variant, ...), default values, row lineage, etc.
> For readers/writers, there are two ways to know if functionalities are
> available or not:
> 1. Reading the table version spec (v2, v3)
> 2. Reading if metadata contains some fields (for instance, regarding
> multi args transforms, we have source-id / source-ids).
> It means that we already have to "parse" the metadata and likely
> implement "complex" logic.
>
> In addition of table spec version, I wonder if we should not introduce
> Iceberg Features in metadata, clearly listing/describing the supported
> features, decoupled from table spec version:
>
> "features": ["row_lineage","variant","default_value"]
>
> Reader/writer can just check the features to know how to behave. We
> would like more flexible to support features, unbinding from the table
> spec version.
>
> Afaik, Delta has something similar.
>
> Long term, it could be extended to Data File format API proposed by
> Peter, e.g. some features related to data files (that would be a
> different layer, but similar idea).
>
> Thoughts ?
>
> Regards
> JB
>
> Xuanwo
>
> https://xuanwo.io/
>
>

Re: [DISCUSS] Introducing Iceberg Features ?

Reply via email to