Re: [DISCUSS] Introducing Iceberg Features ?

Jean-Baptiste Onofré Wed, 16 Apr 2025 01:45:05 -0700

Hi Eduard

You all convinced me, I just wanted to get your feedback as I remember
some kind of "additional" complexity while adding views support to the
JDBC Catalog or working on multi-args transforms. Just a "flag" would
have simplified a bit the code (for instance for JDBC Catalog, we have
to check the version to use the "right" RDBMS schema, etc).


Thanks everyone for the discussion and inputs !

Regards
JB

On Wed, Apr 16, 2025 at 10:34 AM Eduard Tudenhöfner
<etudenhoef...@apache.org> wrote:
>
> I fully agree with what Fokko said and I'm concerned that this adds a lot of 
> new complexity and also leads to engines only supporting a minimal set of 
> features for a given Spec version, which makes it even harder for users to 
> know what subset of features a V3 compliant engine actually supports.
>
> Eduard
>
> On Wed, Apr 16, 2025 at 8:23 AM Jean-Baptiste Onofré <j...@nanthrax.net> 
> wrote:
>>
>> Hi Xuanwo
>>
>> Thanks for the feedback. Fair enough.
>>
>> Regards
>> JB
>>
>> Le mer. 16 avr. 2025 à 05:44, Xuanwo <xua...@apache.org> a écrit :
>>>
>>> Hi, JB
>>>
>>> Thank you for starting this discussion. Based on my experience with 
>>> Parquet, when a specification allows readers and writers to freely choose 
>>> which features to use, it often leads to the entire ecosystem relying on 
>>> only the minimal feature set. As a result, many valuable features are 
>>> overlooked. For example, Bloom filters in Parquet are extremely useful, but 
>>> they are rarely supported by writers, which in turn leads to minimal 
>>> support from readers as well.
>>>
>>> So I personally support the ON/OFF method, which means the engine must 
>>> fully implement v3.
>>>
>>> On Wed, Apr 16, 2025, at 03:18, Jean-Baptiste Onofré wrote:
>>>
>>> Thanks for your feedback.
>>>
>>> I got your points. My question was more about the features that an engine 
>>> (reader/writer) should support: for v3 it means that an engine will have to 
>>> implement/support all features from v3 (required features). They can stay 
>>> on v2 or fully update to v3. That makes sense to me for the engine. My 
>>> question came because v3 includes a lot of changes, some requiring “checks” 
>>> on metadata (a bit complex for the reader/writer).
>>>
>>> Thanks for the feedback again !
>>>
>>> Regards
>>> JB
>>>
>>> Le mar. 15 avr. 2025 à 20:54, Russell Spitzer <russell.spit...@gmail.com> a 
>>> écrit :
>>>
>>> I'm not a big fan of this, I am currently a strong supporter of the V3 is 
>>> V3 approach. This is one of the reasons we decided to make row-lineage 
>>> mandatory, we want to avoid encouraging engines from selectively adopting 
>>> requirements.
>>>
>>> On Tue, Apr 15, 2025 at 1:42 PM Fokko Driesprong <fo...@apache.org> wrote:
>>>
>>> Hey JB,
>>>
>>> Thanks for raising this. This would be another way of indicating (next to 
>>> the format version) what's supported. At first glance, I'm reluctant to add 
>>> this. For two reasons:
>>>
>>> Because of the added complexity, both from a technical perspective, and 
>>> because it also might confuse downstream users, for example, an engine does 
>>> support Iceberg V3, but not variant type.
>>> As you indicated, this is similar to what Delta has. One issue that they 
>>> are experiencing is that the users expect that you should also be able to 
>>> disable features. For example, when you have row-lineage enabled, and you 
>>> want to read the table with an engine that does not support row-lineage, 
>>> there is an expectation to disable row-lineage. This is different from what 
>>> we support today with the format-version which only allows upgrades (and 
>>> not downgrades), this will also add a lot of complexity to the codebase.
>>>
>>> Curious to learn what others think.
>>>
>>> Kind regards,
>>> Fokko
>>>
>>> Op ma 14 apr 2025 om 19:56 schreef Brian Hulette <bhule...@apache.org>:
>>>
>>> As a consumer of Iceberg metadata I think something like this might be 
>>> helpful. We used approach #2 for adding partial Iceberg V2 support to 
>>> BigQuery external tables, but this was more straightforward as we just had 
>>> to detect the existence of delete files. With V3 we will have to be very 
>>> confident that we can detect all of the unsupported features before we add 
>>> support for any one of them.
>>>
>>> That being said I don't think that will be *that* difficult. Would it be 
>>> very hard for metadata producers to populate this?
>>>
>>> On Mon, Apr 14, 2025 at 8:48 AM Jean-Baptiste Onofré <j...@nanthrax.net> 
>>> wrote:
>>>
>>> Hi folks,
>>>
>>> I started to work on multi args transforms, and you probably saw
>>> Fokko's proposal about the way to deal with source-id/source-ids to
>>> ensure backward compatibility.
>>>
>>> While working on the changes on iceberg-core/iceberg-java, I'm
>>> wondering if we should not introduce Iceberg Features on metadata.
>>> Let me explain what I have in mind.
>>> In Table Spec V3, we have new functionalities: new types (timestamp
>>> nz, variant, ...), default values, row lineage, etc.
>>> For readers/writers, there are two ways to know if functionalities are
>>> available or not:
>>> 1. Reading the table version spec (v2, v3)
>>> 2. Reading if metadata contains some fields (for instance, regarding
>>> multi args transforms, we have source-id / source-ids).
>>> It means that we already have to "parse" the metadata and likely
>>> implement "complex" logic.
>>>
>>> In addition of table spec version, I wonder if we should not introduce
>>> Iceberg Features in metadata, clearly listing/describing the supported
>>> features, decoupled from table spec version:
>>>
>>> "features": ["row_lineage","variant","default_value"]
>>>
>>> Reader/writer can just check the features to know how to behave. We
>>> would like more flexible to support features, unbinding from the table
>>> spec version.
>>>
>>> Afaik, Delta has something similar.
>>>
>>> Long term, it could be extended to Data File format API proposed by
>>> Peter, e.g. some features related to data files (that would be a
>>> different layer, but similar idea).
>>>
>>> Thoughts ?
>>>
>>> Regards
>>> JB
>>>
>>> Xuanwo
>>>
>>> https://xuanwo.io/
>>>

Re: [DISCUSS] Introducing Iceberg Features ?

Reply via email to