I fully agree with what Fokko said and I'm concerned that this adds a lot of new complexity and also leads to engines only supporting a minimal set of features for a given Spec version, which makes it even harder for users to know what subset of features a V3 compliant engine actually supports.
Eduard On Wed, Apr 16, 2025 at 8:23 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Xuanwo > > Thanks for the feedback. Fair enough. > > Regards > JB > > Le mer. 16 avr. 2025 à 05:44, Xuanwo <xua...@apache.org> a écrit : > >> Hi, JB >> >> Thank you for starting this discussion. Based on my experience with >> Parquet, when a specification allows readers and writers to freely choose >> which features to use, it often leads to the entire ecosystem relying on >> only the minimal feature set. As a result, many valuable features are >> overlooked. For example, Bloom filters in Parquet are extremely useful, but >> they are rarely supported by writers, which in turn leads to minimal >> support from readers as well. >> >> So I personally support the ON/OFF method, which means the engine must >> fully implement v3. >> >> On Wed, Apr 16, 2025, at 03:18, Jean-Baptiste Onofré wrote: >> >> Thanks for your feedback. >> >> I got your points. My question was more about the features that an engine >> (reader/writer) should support: for v3 it means that an engine will have to >> implement/support all features from v3 (required features). They can stay >> on v2 or fully update to v3. That makes sense to me for the engine. My >> question came because v3 includes a lot of changes, some requiring “checks” >> on metadata (a bit complex for the reader/writer). >> >> Thanks for the feedback again ! >> >> Regards >> JB >> >> Le mar. 15 avr. 2025 à 20:54, Russell Spitzer <russell.spit...@gmail.com> >> a écrit : >> >> I'm not a big fan of this, I am currently a strong supporter of the V3 is >> V3 approach. This is one of the reasons we decided to make row-lineage >> mandatory, we want to avoid encouraging engines from selectively adopting >> requirements. >> >> On Tue, Apr 15, 2025 at 1:42 PM Fokko Driesprong <fo...@apache.org> >> wrote: >> >> Hey JB, >> >> Thanks for raising this. This would be another way of indicating (next to >> the format version) what's supported. At first glance, I'm reluctant to add >> this. For two reasons: >> >> 1. Because of the added complexity, both from a technical >> perspective, and because it also might confuse downstream users, for >> example, an engine does support Iceberg V3, but not variant type. >> 2. As you indicated, this is similar to what Delta has. One issue >> that they are experiencing is that the users expect that you should also >> be >> able to disable features. For example, when you have row-lineage enabled, >> and you want to read the table with an engine that does not support >> row-lineage, there is an expectation to disable row-lineage. This is >> different from what we support today with the format-version which only >> allows upgrades (and not downgrades), this will also add a lot of >> complexity to the codebase. >> >> Curious to learn what others think. >> >> Kind regards, >> Fokko >> >> Op ma 14 apr 2025 om 19:56 schreef Brian Hulette <bhule...@apache.org>: >> >> As a consumer of Iceberg metadata I think something like this might be >> helpful. We used approach #2 for adding partial Iceberg V2 support to >> BigQuery external tables, but this was more straightforward as we just had >> to detect the existence of delete files. With V3 we will have to be very >> confident that we can detect all of the unsupported features before we add >> support for any one of them. >> >> That being said I don't think that will be *that* difficult. Would it be >> very hard for metadata producers to populate this? >> >> On Mon, Apr 14, 2025 at 8:48 AM Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >> Hi folks, >> >> I started to work on multi args transforms, and you probably saw >> Fokko's proposal about the way to deal with source-id/source-ids to >> ensure backward compatibility. >> >> While working on the changes on iceberg-core/iceberg-java, I'm >> wondering if we should not introduce Iceberg Features on metadata. >> Let me explain what I have in mind. >> In Table Spec V3, we have new functionalities: new types (timestamp >> nz, variant, ...), default values, row lineage, etc. >> For readers/writers, there are two ways to know if functionalities are >> available or not: >> 1. Reading the table version spec (v2, v3) >> 2. Reading if metadata contains some fields (for instance, regarding >> multi args transforms, we have source-id / source-ids). >> It means that we already have to "parse" the metadata and likely >> implement "complex" logic. >> >> In addition of table spec version, I wonder if we should not introduce >> Iceberg Features in metadata, clearly listing/describing the supported >> features, decoupled from table spec version: >> >> "features": ["row_lineage","variant","default_value"] >> >> Reader/writer can just check the features to know how to behave. We >> would like more flexible to support features, unbinding from the table >> spec version. >> >> Afaik, Delta has something similar. >> >> Long term, it could be extended to Data File format API proposed by >> Peter, e.g. some features related to data files (that would be a >> different layer, but similar idea). >> >> Thoughts ? >> >> Regards >> JB >> >> Xuanwo >> >> https://xuanwo.io/ >> >>