Re: [DISCUSS] Introducing Iceberg Features ?

Xuanwo Tue, 15 Apr 2025 20:45:37 -0700

Hi, JB

Thank you for starting this discussion. Based on my experience with Parquet, 
when a specification allows readers and writers to freely choose which features 
to use, it often leads to the entire ecosystem relying on only the minimal 
feature set. As a result, many valuable features are overlooked. For example, 
Bloom filters in Parquet are extremely useful, but they are rarely supported by 
writers, which in turn leads to minimal support from readers as well.


So I personally support the ON/OFF method, which means the engine must fully 
implement v3.

On Wed, Apr 16, 2025, at 03:18, Jean-Baptiste Onofré wrote:
> Thanks for your feedback. 
> 
> I got your points. My question was more about the features that an engine 
> (reader/writer) should support: for v3 it means that an engine will have to 
> implement/support all features from v3 (required features). They can stay on 
> v2 or fully update to v3. That makes sense to me for the engine. My question 
> came because v3 includes a lot of changes, some requiring “checks” on 
> metadata (a bit complex for the reader/writer).  
> 
> Thanks for the feedback again !
> 
> Regards
> JB
> 
> Le mar. 15 avr. 2025 à 20:54, Russell Spitzer <[email protected]> a 
> écrit :
>> I'm not a big fan of this, I am currently a strong supporter of the V3 is V3 
>> approach. This is one of the reasons we decided to make row-lineage 
>> mandatory, we want to avoid encouraging engines from selectively adopting 
>> requirements.
>> 
>> On Tue, Apr 15, 2025 at 1:42 PM Fokko Driesprong <[email protected]> wrote:
>>> Hey JB,
>>> 
>>> Thanks for raising this. This would be another way of indicating (next to 
>>> the format version) what's supported. At first glance, I'm reluctant to add 
>>> this. For two reasons: 
>>>  1. Because of the added complexity, both from a technical perspective, and 
>>> because it also might confuse downstream users, for example, an engine does 
>>> support Iceberg V3, but not variant type.
>>>  2. As you indicated, this is similar to what Delta has. One issue that 
>>> they are experiencing is that the users expect that you should also be able 
>>> to disable features. For example, when you have row-lineage enabled, and 
>>> you want to read the table with an engine that does not support 
>>> row-lineage, there is an expectation to disable row-lineage. This is 
>>> different from what we support today with the format-version which only 
>>> allows upgrades (and not downgrades), this will also add a lot of 
>>> complexity to the codebase.
>>> Curious to learn what others think.
>>> 
>>> Kind regards,
>>> Fokko
>>> 
>>> Op ma 14 apr 2025 om 19:56 schreef Brian Hulette <[email protected]>:
>>>> As a consumer of Iceberg metadata I think something like this might be 
>>>> helpful. We used approach #2 for adding partial Iceberg V2 support to 
>>>> BigQuery external tables, but this was more straightforward as we just had 
>>>> to detect the existence of delete files. With V3 we will have to be very 
>>>> confident that we can detect all of the unsupported features before we add 
>>>> support for any one of them.
>>>> 
>>>> That being said I don't think that will be *that* difficult. Would it be 
>>>> very hard for metadata producers to populate this?
>>>> 
>>>> On Mon, Apr 14, 2025 at 8:48 AM Jean-Baptiste Onofré <[email protected]> 
>>>> wrote:
>>>>> Hi folks,
>>>>> 
>>>>> I started to work on multi args transforms, and you probably saw
>>>>> Fokko's proposal about the way to deal with source-id/source-ids to
>>>>> ensure backward compatibility.
>>>>> 
>>>>> While working on the changes on iceberg-core/iceberg-java, I'm
>>>>> wondering if we should not introduce Iceberg Features on metadata.
>>>>> Let me explain what I have in mind.
>>>>> In Table Spec V3, we have new functionalities: new types (timestamp
>>>>> nz, variant, ...), default values, row lineage, etc.
>>>>> For readers/writers, there are two ways to know if functionalities are
>>>>> available or not:
>>>>> 1. Reading the table version spec (v2, v3)
>>>>> 2. Reading if metadata contains some fields (for instance, regarding
>>>>> multi args transforms, we have source-id / source-ids).
>>>>> It means that we already have to "parse" the metadata and likely
>>>>> implement "complex" logic.
>>>>> 
>>>>> In addition of table spec version, I wonder if we should not introduce
>>>>> Iceberg Features in metadata, clearly listing/describing the supported
>>>>> features, decoupled from table spec version:
>>>>> 
>>>>> "features": ["row_lineage","variant","default_value"]
>>>>> 
>>>>> Reader/writer can just check the features to know how to behave. We
>>>>> would like more flexible to support features, unbinding from the table
>>>>> spec version.
>>>>> 
>>>>> Afaik, Delta has something similar.
>>>>> 
>>>>> Long term, it could be extended to Data File format API proposed by
>>>>> Peter, e.g. some features related to data files (that would be a
>>>>> different layer, but similar idea).
>>>>> 
>>>>> Thoughts ?
>>>>> 
>>>>> Regards
>>>>> JB
Xuanwo

https://xuanwo.io/

Re: [DISCUSS] Introducing Iceberg Features ?

Reply via email to