Re: [DISCUSS] Introducing Iceberg Features ?

Eduard Tudenhöfner Wed, 16 Apr 2025 01:35:01 -0700

I fully agree with what Fokko said and I'm concerned that this adds a lot
of new complexity and also leads to engines only supporting a minimal set
of features for a given Spec version, which makes it even harder for users
to know what subset of features a V3 compliant engine actually supports.


Eduard

On Wed, Apr 16, 2025 at 8:23 AM Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Hi Xuanwo
>
> Thanks for the feedback. Fair enough.
>
> Regards
> JB
>
> Le mer. 16 avr. 2025 à 05:44, Xuanwo <xua...@apache.org> a écrit :
>
>> Hi, JB
>>
>> Thank you for starting this discussion. Based on my experience with
>> Parquet, when a specification allows readers and writers to freely choose
>> which features to use, it often leads to the entire ecosystem relying on
>> only the minimal feature set. As a result, many valuable features are
>> overlooked. For example, Bloom filters in Parquet are extremely useful, but
>> they are rarely supported by writers, which in turn leads to minimal
>> support from readers as well.
>>
>> So I personally support the ON/OFF method, which means the engine must
>> fully implement v3.
>>
>> On Wed, Apr 16, 2025, at 03:18, Jean-Baptiste Onofré wrote:
>>
>> Thanks for your feedback.
>>
>> I got your points. My question was more about the features that an engine
>> (reader/writer) should support: for v3 it means that an engine will have to
>> implement/support all features from v3 (required features). They can stay
>> on v2 or fully update to v3. That makes sense to me for the engine. My
>> question came because v3 includes a lot of changes, some requiring “checks”
>> on metadata (a bit complex for the reader/writer).
>>
>> Thanks for the feedback again !
>>
>> Regards
>> JB
>>
>> Le mar. 15 avr. 2025 à 20:54, Russell Spitzer <russell.spit...@gmail.com>
>> a écrit :
>>
>> I'm not a big fan of this, I am currently a strong supporter of the V3 is
>> V3 approach. This is one of the reasons we decided to make row-lineage
>> mandatory, we want to avoid encouraging engines from selectively adopting
>> requirements.
>>
>> On Tue, Apr 15, 2025 at 1:42 PM Fokko Driesprong <fo...@apache.org>
>> wrote:
>>
>> Hey JB,
>>
>> Thanks for raising this. This would be another way of indicating (next to
>> the format version) what's supported. At first glance, I'm reluctant to add
>> this. For two reasons:
>>
>>    1. Because of the added complexity, both from a technical
>>    perspective, and because it also might confuse downstream users, for
>>    example, an engine does support Iceberg V3, but not variant type.
>>    2. As you indicated, this is similar to what Delta has. One issue
>>    that they are experiencing is that the users expect that you should also 
>> be
>>    able to disable features. For example, when you have row-lineage enabled,
>>    and you want to read the table with an engine that does not support
>>    row-lineage, there is an expectation to disable row-lineage. This is
>>    different from what we support today with the format-version which only
>>    allows upgrades (and not downgrades), this will also add a lot of
>>    complexity to the codebase.
>>
>> Curious to learn what others think.
>>
>> Kind regards,
>> Fokko
>>
>> Op ma 14 apr 2025 om 19:56 schreef Brian Hulette <bhule...@apache.org>:
>>
>> As a consumer of Iceberg metadata I think something like this might be
>> helpful. We used approach #2 for adding partial Iceberg V2 support to
>> BigQuery external tables, but this was more straightforward as we just had
>> to detect the existence of delete files. With V3 we will have to be very
>> confident that we can detect all of the unsupported features before we add
>> support for any one of them.
>>
>> That being said I don't think that will be *that* difficult. Would it be
>> very hard for metadata producers to populate this?
>>
>> On Mon, Apr 14, 2025 at 8:48 AM Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>> Hi folks,
>>
>> I started to work on multi args transforms, and you probably saw
>> Fokko's proposal about the way to deal with source-id/source-ids to
>> ensure backward compatibility.
>>
>> While working on the changes on iceberg-core/iceberg-java, I'm
>> wondering if we should not introduce Iceberg Features on metadata.
>> Let me explain what I have in mind.
>> In Table Spec V3, we have new functionalities: new types (timestamp
>> nz, variant, ...), default values, row lineage, etc.
>> For readers/writers, there are two ways to know if functionalities are
>> available or not:
>> 1. Reading the table version spec (v2, v3)
>> 2. Reading if metadata contains some fields (for instance, regarding
>> multi args transforms, we have source-id / source-ids).
>> It means that we already have to "parse" the metadata and likely
>> implement "complex" logic.
>>
>> In addition of table spec version, I wonder if we should not introduce
>> Iceberg Features in metadata, clearly listing/describing the supported
>> features, decoupled from table spec version:
>>
>> "features": ["row_lineage","variant","default_value"]
>>
>> Reader/writer can just check the features to know how to behave. We
>> would like more flexible to support features, unbinding from the table
>> spec version.
>>
>> Afaik, Delta has something similar.
>>
>> Long term, it could be extended to Data File format API proposed by
>> Peter, e.g. some features related to data files (that would be a
>> different layer, but similar idea).
>>
>> Thoughts ?
>>
>> Regards
>> JB
>>
>> Xuanwo
>>
>> https://xuanwo.io/
>>
>>

Re: [DISCUSS] Introducing Iceberg Features ?

Reply via email to