Hi, Nikhil Thank you very much for bringing S3 tables discussion here.
However, I would like to point out that the S3 Table is not the same concept we are discussing here. It is not an object storage-based catalog; instead, it is a stateful service that provides dedicated APIs. It’s better to think of it as another AWS Glue, but internally backed by an S3 bucket. Therefore, I believe we should split this into two separate discussion threads: - Whether we should build S3 Tables catalog support similar to what we do for AWS Glue. - Continuing the discussion about the object storage-based catalog. On Wed, Dec 4, 2024, at 03:17, Nikhil Benesch wrote: > > And I'm also looking forward to what Jack is alluding to. > > AWS just announced *native* S3 support for Iceberg buckets! [0] This is > almost surely what Jack was alluding to. > > This is very cool. It's a much deeper integration than I was expecting but > nonetheless one that fully satisfies my use case [1]. > > In classic AWS fashion the documentation for the feature has not yet been > published. I'm also can't find the "Amazon S3 Tables Catalog for Apache > Iceberg" package that Jeff Barr references in his announcement post. I'll > circle back with details once these materials are made available. > > [0]: > https://aws.amazon.com/blogs/aws/new-amazon-s3-tables-storage-optimized-for-analytics-workloads/ > [1]: We're looking to add a native Iceberg-on-S3 export feature to > Materialize (https://materialize.com), but without requiring users to manage > a catalog. > > On Wed, Nov 27, 2024 at 1:52 PM rdb...@gmail.com <rdb...@gmail.com> wrote: >> > We deprecated this recently and we don't have to deprecate it if object >> > stores support atomic operations like this. >> >> I disagree because this misses many of the reasons for deprecation. It isn't >> just that S3 didn't support a `putIfAbsent` operation. Other object stores >> did and there are still several problems with this approach. The fundamental >> issue is that it is attempting to solve problems at the wrong level. >> >> One of the reasons why Iceberg exists is that we saw people doing the same >> thing with Parquet. People were trying to solve problems with their tables >> by attempting to modify Parquet in wacky ways, like wanting to replace the >> footer to make schema changes. Schema evolution needed to be solved at the >> table level and in this community we've always tried to solve problems more >> directly and elegantly by addressing them at the right layer of the stack. >> >> Iceberg tables scale up existing atomic operations to make transactional >> guarantees on very large tables. Object stores and file systems aren't well >> suited for this task. Just like they were not sufficient to provide >> transactional guarantees across files and partitions, the primitives you can >> use aren't sufficient for a database. Storage capabilities are also not the >> right place to deliver other catalog features, like basic CRUD operations. >> >> The addition of `putIfAbsent` to S3 doesn't support transactions where you >> need to modify multiple tables and it also doesn't address cases like the >> need to atomically rename and delete tables. Schemes that use `putIfAbsent` >> also rely either on consistent listing a large prefix or on maintaining a >> version-hint file. That version-hint file can be out of date, so even with >> one you still need to list or iteratively attempt to read metadata files to >> determine the latest. >> >> Getting a file-only scheme right is complicated and is specific to a >> particular store (both commits and version-hint handling). Local file >> systems would use an exclusive create operation to commit, Hadoop uses >> atomic rename, and object stores use different `putIfAbsent` operations. >> Making this work across languages and engines requires a lot of work to >> specify requirements and document issues, only to get to single-table >> functionality that doesn't deliver the catalog-level primitives like atomic >> rename that are commonly used. >> >> In the end, catalog problems are best solved at the catalog layer, not >> through an elaborate scheme that uses storage-layer primitives, just as it >> was not a good idea to deliver table behaviors using similar storage-layer >> schemes. Adding `putIfAbsent` to S3 doesn't change that design principle. >> >> I sympathize with the idea that it would be great if you didn't need a >> catalog. Simpler infrastructure is generally better. >> >> But trying to avoid a catalog limits the capabilities of this >> infrastructure, while setting people up for later failure. When I talk with >> people that have been trying to avoid having a catalog, they tend to have >> tables scattered across buckets that they need to track down, they lack >> observability to know what is being used, don't to know if they are deleting >> data in compliance with regulations, and they often lack simple and usable >> access controls. >> >> I think that the solution is to make it easier to run or use a catalog, not >> to try to build without one. >> >> And I'm also looking forward to what Jack is alluding to. >> >> On Tue, Nov 26, 2024 at 11:05 PM Ajantha Bhat <ajanthab...@gmail.com> wrote: >>> Interesting. >>> >>> We already have file system tables [1] in Iceberg (HadoopCatalog implements >>> this spec). >>> We deprecated this recently and we don't have to deprecate it if object >>> stores support atomic operations like this. >>> >>> [1] https://iceberg.apache.org/spec/#file-system-tables >>> >>> - Ajantha >>> >>> On Wed, Nov 27, 2024 at 2:53 AM Nikhil Benesch <nikhil.bene...@gmail.com> >>> wrote: >>>> Ah, fascinating. Thanks very much for the pointer. >>>> >>>> Here's the thread introducing the proposal [0], for anyone else curious. >>>> >>>> [0]: https://lists.apache.org/thread/kh4n98w4z22sc8h2vot4q8n44vdtnltg >>>> >>>> On Tue, Nov 26, 2024 at 3:27 PM Jean-Baptiste Onofré <j...@nanthrax.net> >>>> wrote: >>>> > >>>> > Hi Vignesh >>>> > >>>> > Thanks for the reminder, I remember we quickly discussed this during a >>>> > community meeting. >>>> > >>>> > I will take a new look at the doc. >>>> > >>>> > Regards >>>> > JB >>>> > >>>> > On Tue, Nov 26, 2024 at 9:19 PM Vignesh <vignesh.v...@gmail.com> wrote: >>>> > > >>>> > > Hi, >>>> > > There was a proposal along the same lines, for the read portion few >>>> > > weeks back by Ashvin. >>>> > > >>>> > > https://docs.google.com/document/d/1yzLXSOtzBXyaWHfeVsWsMu4xmOH8rV6QyM5ZAnJZjMQ/edit?usp=drivesdk >>>> > > >>>> > > Thanks, >>>> > > Vignesh. >>>> > > >>>> > > >>>> > > On Tue, Nov 26, 2024, 11:59 AM Jean-Baptiste Onofré >>>> > > <j...@nanthrax.net> wrote: >>>> > >> >>>> > >> Hi Nikhil >>>> > >> >>>> > >> Thanks for your message, very interesting. >>>> > >> >>>> > >> I think it would be great to involve the Polaris project here as well, >>>> > >> as a REST Catalog implementation. >>>> > >> The Polaris community is discussing storage/backend right now, so it >>>> > >> would be the perfect timing to consider leveraging S3 conditional >>>> > >> writes (as a plugin for instance first). >>>> > >> >>>> > >> I would be happy to connect and know more about your perspective >>>> > >> about that. >>>> > >> >>>> > >> Thanks, >>>> > >> Regards >>>> > >> JB >>>> > >> >>>> > >> PS: I will be at AWS re:Invent next week, so maybe we can connect >>>> > >> there. >>>> > >> >>>> > >> On Tue, Nov 26, 2024 at 6:35 PM Nikhil Benesch >>>> > >> <nikhil.bene...@gmail.com> wrote: >>>> > >> > >>>> > >> > Hi all, >>>> > >> > >>>> > >> > With Amazon S3 announcing support for the If-Match header yesterday >>>> > >> > [0], all the >>>> > >> > major object store implementations now support a compare-and-swap >>>> > >> > operation. >>>> > >> > >>>> > >> > As far as I can tell, this opens up the possibility of storing >>>> > >> > Iceberg >>>> > >> > catalogs directly on object storage, without the need for a >>>> > >> > separate metastore, >>>> > >> > and without violating any of Iceberg's ACID guarantees. >>>> > >> > >>>> > >> > It seems the immediate next step is to build an independent Java or >>>> > >> > REST catalog >>>> > >> > backend to prove this concept out. Long term, though, the ideal >>>> > >> > would be to >>>> > >> > have such a catalog backend be a first class citizen in the Iceberg >>>> > >> > project. >>>> > >> > >>>> > >> > Is anyone else in the Iceberg community barking up this tree? I'm a >>>> > >> > long term >>>> > >> > Iceberg enthusiast, but new to the community. I'd very much >>>> > >> > appreciate any >>>> > >> > pointers to current or past discussions on the topic. So far all >>>> > >> > I've been >>>> > >> > able to turn up is some light chatter from myself and others on >>>> > >> > Bluesky and >>>> > >> > Hacker News ([1][2][3]). >>>> > >> > >>>> > >> > Cheers, >>>> > >> > Nikhil >>>> > >> > >>>> > >> > [0]: >>>> > >> > https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/ >>>> > >> > [1]: https://bsky.app/profile/benesch.bsky.social/post/3lauesxg3ic2c >>>> > >> > [2]: >>>> > >> > https://bsky.app/profile/eatonphil.bsky.social/post/3lbskq3jwk22e >>>> > >> > [3]: https://news.ycombinator.com/item?id=42240370 Xuanwo https://xuanwo.io/