Hi Eduard You all convinced me, I just wanted to get your feedback as I remember some kind of "additional" complexity while adding views support to the JDBC Catalog or working on multi-args transforms. Just a "flag" would have simplified a bit the code (for instance for JDBC Catalog, we have to check the version to use the "right" RDBMS schema, etc).
Thanks everyone for the discussion and inputs ! Regards JB On Wed, Apr 16, 2025 at 10:34 AM Eduard Tudenhöfner <etudenhoef...@apache.org> wrote: > > I fully agree with what Fokko said and I'm concerned that this adds a lot of > new complexity and also leads to engines only supporting a minimal set of > features for a given Spec version, which makes it even harder for users to > know what subset of features a V3 compliant engine actually supports. > > Eduard > > On Wed, Apr 16, 2025 at 8:23 AM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: >> >> Hi Xuanwo >> >> Thanks for the feedback. Fair enough. >> >> Regards >> JB >> >> Le mer. 16 avr. 2025 à 05:44, Xuanwo <xua...@apache.org> a écrit : >>> >>> Hi, JB >>> >>> Thank you for starting this discussion. Based on my experience with >>> Parquet, when a specification allows readers and writers to freely choose >>> which features to use, it often leads to the entire ecosystem relying on >>> only the minimal feature set. As a result, many valuable features are >>> overlooked. For example, Bloom filters in Parquet are extremely useful, but >>> they are rarely supported by writers, which in turn leads to minimal >>> support from readers as well. >>> >>> So I personally support the ON/OFF method, which means the engine must >>> fully implement v3. >>> >>> On Wed, Apr 16, 2025, at 03:18, Jean-Baptiste Onofré wrote: >>> >>> Thanks for your feedback. >>> >>> I got your points. My question was more about the features that an engine >>> (reader/writer) should support: for v3 it means that an engine will have to >>> implement/support all features from v3 (required features). They can stay >>> on v2 or fully update to v3. That makes sense to me for the engine. My >>> question came because v3 includes a lot of changes, some requiring “checks” >>> on metadata (a bit complex for the reader/writer). >>> >>> Thanks for the feedback again ! >>> >>> Regards >>> JB >>> >>> Le mar. 15 avr. 2025 à 20:54, Russell Spitzer <russell.spit...@gmail.com> a >>> écrit : >>> >>> I'm not a big fan of this, I am currently a strong supporter of the V3 is >>> V3 approach. This is one of the reasons we decided to make row-lineage >>> mandatory, we want to avoid encouraging engines from selectively adopting >>> requirements. >>> >>> On Tue, Apr 15, 2025 at 1:42 PM Fokko Driesprong <fo...@apache.org> wrote: >>> >>> Hey JB, >>> >>> Thanks for raising this. This would be another way of indicating (next to >>> the format version) what's supported. At first glance, I'm reluctant to add >>> this. For two reasons: >>> >>> Because of the added complexity, both from a technical perspective, and >>> because it also might confuse downstream users, for example, an engine does >>> support Iceberg V3, but not variant type. >>> As you indicated, this is similar to what Delta has. One issue that they >>> are experiencing is that the users expect that you should also be able to >>> disable features. For example, when you have row-lineage enabled, and you >>> want to read the table with an engine that does not support row-lineage, >>> there is an expectation to disable row-lineage. This is different from what >>> we support today with the format-version which only allows upgrades (and >>> not downgrades), this will also add a lot of complexity to the codebase. >>> >>> Curious to learn what others think. >>> >>> Kind regards, >>> Fokko >>> >>> Op ma 14 apr 2025 om 19:56 schreef Brian Hulette <bhule...@apache.org>: >>> >>> As a consumer of Iceberg metadata I think something like this might be >>> helpful. We used approach #2 for adding partial Iceberg V2 support to >>> BigQuery external tables, but this was more straightforward as we just had >>> to detect the existence of delete files. With V3 we will have to be very >>> confident that we can detect all of the unsupported features before we add >>> support for any one of them. >>> >>> That being said I don't think that will be *that* difficult. Would it be >>> very hard for metadata producers to populate this? >>> >>> On Mon, Apr 14, 2025 at 8:48 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> >>> Hi folks, >>> >>> I started to work on multi args transforms, and you probably saw >>> Fokko's proposal about the way to deal with source-id/source-ids to >>> ensure backward compatibility. >>> >>> While working on the changes on iceberg-core/iceberg-java, I'm >>> wondering if we should not introduce Iceberg Features on metadata. >>> Let me explain what I have in mind. >>> In Table Spec V3, we have new functionalities: new types (timestamp >>> nz, variant, ...), default values, row lineage, etc. >>> For readers/writers, there are two ways to know if functionalities are >>> available or not: >>> 1. Reading the table version spec (v2, v3) >>> 2. Reading if metadata contains some fields (for instance, regarding >>> multi args transforms, we have source-id / source-ids). >>> It means that we already have to "parse" the metadata and likely >>> implement "complex" logic. >>> >>> In addition of table spec version, I wonder if we should not introduce >>> Iceberg Features in metadata, clearly listing/describing the supported >>> features, decoupled from table spec version: >>> >>> "features": ["row_lineage","variant","default_value"] >>> >>> Reader/writer can just check the features to know how to behave. We >>> would like more flexible to support features, unbinding from the table >>> spec version. >>> >>> Afaik, Delta has something similar. >>> >>> Long term, it could be extended to Data File format API proposed by >>> Peter, e.g. some features related to data files (that would be a >>> different layer, but similar idea). >>> >>> Thoughts ? >>> >>> Regards >>> JB >>> >>> Xuanwo >>> >>> https://xuanwo.io/ >>>