> - Whether we should build S3 Tables catalog support similar to what we do for > AWS Glue.
Yes, happy to have someone start that discussion separately, if it makes sense to do so. Amazon has already provided such an catalog implementation in a separate Apache 2.0-licensed project called Amazon S3 Tables Catalog for Apache Iceberg [0]. I'm not familiar enough with the way the Iceberg project operates to know whether it would make sense to package that implementation as part of the official Iceberg distribution. - Continuing the discussion about the object storage-based catalog. I'm happy to report that I got pointed at a project that is planning to build exactly this. [1] The use case I was interested in is actually entirely solved by S3 Tables, so I no longer plan to pursue this. But if someone else is interested in picking this up, I'm sure Jan Kaul would be eager to collaborate. [0]: https://github.com/awslabs/s3-tables-catalog [1]: https://bsky.app/profile/jankaul.bsky.social/post/3lbutx7ju4k2c On Wed, Dec 4, 2024 at 12:11 AM Xuanwo <xua...@apache.org> wrote: > > Hi, Nikhil > > Thank you very much for bringing S3 tables discussion here. > > However, I would like to point out that the S3 Table is not the same concept > we are discussing here. It is not an object storage-based catalog; instead, > it is a stateful service that provides dedicated APIs. It’s better to think > of it as another AWS Glue, but internally backed by an S3 bucket. > > Therefore, I believe we should split this into two separate discussion > threads: > > - Whether we should build S3 Tables catalog support similar to what we do for > AWS Glue. > - Continuing the discussion about the object storage-based catalog. > > > On Wed, Dec 4, 2024, at 03:17, Nikhil Benesch wrote: > > > And I'm also looking forward to what Jack is alluding to. > > AWS just announced *native* S3 support for Iceberg buckets! [0] This is > almost surely what Jack was alluding to. > > This is very cool. It's a much deeper integration than I was expecting but > nonetheless one that fully satisfies my use case [1]. > > In classic AWS fashion the documentation for the feature has not yet been > published. I'm also can't find the "Amazon S3 Tables Catalog for Apache > Iceberg" package that Jeff Barr references in his announcement post. I'll > circle back with details once these materials are made available. > > [0]: > https://aws.amazon.com/blogs/aws/new-amazon-s3-tables-storage-optimized-for-analytics-workloads/ > [1]: We're looking to add a native Iceberg-on-S3 export feature to > Materialize (https://materialize.com), but without requiring users to manage > a catalog. > > On Wed, Nov 27, 2024 at 1:52 PM rdb...@gmail.com <rdb...@gmail.com> wrote: > > > We deprecated this recently and we don't have to deprecate it if object > > stores support atomic operations like this. > > I disagree because this misses many of the reasons for deprecation. It isn't > just that S3 didn't support a `putIfAbsent` operation. Other object stores > did and there are still several problems with this approach. The fundamental > issue is that it is attempting to solve problems at the wrong level. > > One of the reasons why Iceberg exists is that we saw people doing the same > thing with Parquet. People were trying to solve problems with their tables by > attempting to modify Parquet in wacky ways, like wanting to replace the > footer to make schema changes. Schema evolution needed to be solved at the > table level and in this community we've always tried to solve problems more > directly and elegantly by addressing them at the right layer of the stack. > > Iceberg tables scale up existing atomic operations to make transactional > guarantees on very large tables. Object stores and file systems aren't well > suited for this task. Just like they were not sufficient to provide > transactional guarantees across files and partitions, the primitives you can > use aren't sufficient for a database. Storage capabilities are also not the > right place to deliver other catalog features, like basic CRUD operations. > > The addition of `putIfAbsent` to S3 doesn't support transactions where you > need to modify multiple tables and it also doesn't address cases like the > need to atomically rename and delete tables. Schemes that use `putIfAbsent` > also rely either on consistent listing a large prefix or on maintaining a > version-hint file. That version-hint file can be out of date, so even with > one you still need to list or iteratively attempt to read metadata files to > determine the latest. > > Getting a file-only scheme right is complicated and is specific to a > particular store (both commits and version-hint handling). Local file systems > would use an exclusive create operation to commit, Hadoop uses atomic rename, > and object stores use different `putIfAbsent` operations. Making this work > across languages and engines requires a lot of work to specify requirements > and document issues, only to get to single-table functionality that doesn't > deliver the catalog-level primitives like atomic rename that are commonly > used. > > In the end, catalog problems are best solved at the catalog layer, not > through an elaborate scheme that uses storage-layer primitives, just as it > was not a good idea to deliver table behaviors using similar storage-layer > schemes. Adding `putIfAbsent` to S3 doesn't change that design principle. > > I sympathize with the idea that it would be great if you didn't need a > catalog. Simpler infrastructure is generally better. > > But trying to avoid a catalog limits the capabilities of this infrastructure, > while setting people up for later failure. When I talk with people that have > been trying to avoid having a catalog, they tend to have tables scattered > across buckets that they need to track down, they lack observability to know > what is being used, don't to know if they are deleting data in compliance > with regulations, and they often lack simple and usable access controls. > > I think that the solution is to make it easier to run or use a catalog, not > to try to build without one. > > And I'm also looking forward to what Jack is alluding to. > > On Tue, Nov 26, 2024 at 11:05 PM Ajantha Bhat <ajanthab...@gmail.com> wrote: > > Interesting. > > We already have file system tables [1] in Iceberg (HadoopCatalog implements > this spec). > We deprecated this recently and we don't have to deprecate it if object > stores support atomic operations like this. > > [1] https://iceberg.apache.org/spec/#file-system-tables > > - Ajantha > > On Wed, Nov 27, 2024 at 2:53 AM Nikhil Benesch <nikhil.bene...@gmail.com> > wrote: > > Ah, fascinating. Thanks very much for the pointer. > > Here's the thread introducing the proposal [0], for anyone else curious. > > [0]: https://lists.apache.org/thread/kh4n98w4z22sc8h2vot4q8n44vdtnltg > > On Tue, Nov 26, 2024 at 3:27 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > > Hi Vignesh > > > > Thanks for the reminder, I remember we quickly discussed this during a > > community meeting. > > > > I will take a new look at the doc. > > > > Regards > > JB > > > > On Tue, Nov 26, 2024 at 9:19 PM Vignesh <vignesh.v...@gmail.com> wrote: > > > > > > Hi, > > > There was a proposal along the same lines, for the read portion few weeks > > > back by Ashvin. > > > > > > https://docs.google.com/document/d/1yzLXSOtzBXyaWHfeVsWsMu4xmOH8rV6QyM5ZAnJZjMQ/edit?usp=drivesdk > > > > > > Thanks, > > > Vignesh. > > > > > > > > > On Tue, Nov 26, 2024, 11:59 AM Jean-Baptiste Onofré <j...@nanthrax.net> > > > wrote: > > >> > > >> Hi Nikhil > > >> > > >> Thanks for your message, very interesting. > > >> > > >> I think it would be great to involve the Polaris project here as well, > > >> as a REST Catalog implementation. > > >> The Polaris community is discussing storage/backend right now, so it > > >> would be the perfect timing to consider leveraging S3 conditional > > >> writes (as a plugin for instance first). > > >> > > >> I would be happy to connect and know more about your perspective about > > >> that. > > >> > > >> Thanks, > > >> Regards > > >> JB > > >> > > >> PS: I will be at AWS re:Invent next week, so maybe we can connect there. > > >> > > >> On Tue, Nov 26, 2024 at 6:35 PM Nikhil Benesch > > >> <nikhil.bene...@gmail.com> wrote: > > >> > > > >> > Hi all, > > >> > > > >> > With Amazon S3 announcing support for the If-Match header yesterday > > >> > [0], all the > > >> > major object store implementations now support a compare-and-swap > > >> > operation. > > >> > > > >> > As far as I can tell, this opens up the possibility of storing Iceberg > > >> > catalogs directly on object storage, without the need for a separate > > >> > metastore, > > >> > and without violating any of Iceberg's ACID guarantees. > > >> > > > >> > It seems the immediate next step is to build an independent Java or > > >> > REST catalog > > >> > backend to prove this concept out. Long term, though, the ideal would > > >> > be to > > >> > have such a catalog backend be a first class citizen in the Iceberg > > >> > project. > > >> > > > >> > Is anyone else in the Iceberg community barking up this tree? I'm a > > >> > long term > > >> > Iceberg enthusiast, but new to the community. I'd very much appreciate > > >> > any > > >> > pointers to current or past discussions on the topic. So far all I've > > >> > been > > >> > able to turn up is some light chatter from myself and others on > > >> > Bluesky and > > >> > Hacker News ([1][2][3]). > > >> > > > >> > Cheers, > > >> > Nikhil > > >> > > > >> > [0]: > > >> > https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/ > > >> > [1]: https://bsky.app/profile/benesch.bsky.social/post/3lauesxg3ic2c > > >> > [2]: https://bsky.app/profile/eatonphil.bsky.social/post/3lbskq3jwk22e > > >> > [3]: https://news.ycombinator.com/item?id=42240370 > > Xuanwo > > https://xuanwo.io/ >