Re: Storing catalog directly on object store

Nikhil Benesch Thu, 05 Dec 2024 14:10:30 -0800

> - Whether we should build S3 Tables catalog support similar to what we do for
> AWS Glue.


Yes, happy to have someone start that discussion separately, if it makes sense
to do so. Amazon has already provided such an catalog implementation in
a separate Apache 2.0-licensed project called Amazon S3 Tables Catalog for
Apache Iceberg [0].

I'm not familiar enough with the way the Iceberg project operates to know
whether it would make sense to package that implementation as part of the
official Iceberg distribution.

- Continuing the discussion about the object storage-based catalog.

I'm happy to report that I got pointed at a project that is planning to build
exactly this. [1]

The use case I was interested in is actually entirely solved by S3 Tables,
so I no longer plan to pursue this. But if someone else is interested in picking
this up, I'm sure Jan Kaul would be eager to collaborate.

[0]: https://github.com/awslabs/s3-tables-catalog
[1]: https://bsky.app/profile/jankaul.bsky.social/post/3lbutx7ju4k2c

On Wed, Dec 4, 2024 at 12:11 AM Xuanwo <[email protected]> wrote:
>
> Hi, Nikhil
>
> Thank you very much for bringing S3 tables discussion here.
>
> However, I would like to point out that the S3 Table is not the same concept 
> we are discussing here. It is not an object storage-based catalog; instead, 
> it is a stateful service that provides dedicated APIs. It’s better to think 
> of it as another AWS Glue, but internally backed by an S3 bucket.
>
> Therefore, I believe we should split this into two separate discussion 
> threads:
>
> - Whether we should build S3 Tables catalog support similar to what we do for 
> AWS Glue.
> - Continuing the discussion about the object storage-based catalog.
>
>
> On Wed, Dec 4, 2024, at 03:17, Nikhil Benesch wrote:
>
> > And I'm also looking forward to what Jack is alluding to.
>
> AWS just announced *native* S3 support for Iceberg buckets! [0] This is 
> almost surely what Jack was alluding to.
>
> This is very cool. It's a much deeper integration than I was expecting but 
> nonetheless one that fully satisfies my use case [1].
>
> In classic AWS fashion the documentation for the feature has not yet been 
> published. I'm also can't find the "Amazon S3 Tables Catalog for Apache 
> Iceberg" package that Jeff Barr references in his announcement post. I'll 
> circle back with details once these materials are made available.
>
> [0]: 
> https://aws.amazon.com/blogs/aws/new-amazon-s3-tables-storage-optimized-for-analytics-workloads/
> [1]: We're looking to add a native Iceberg-on-S3 export feature to 
> Materialize (https://materialize.com), but without requiring users to manage 
> a catalog.
>
> On Wed, Nov 27, 2024 at 1:52 PM [email protected] <[email protected]> wrote:
>
> > We deprecated this recently and we don't have to deprecate it if object 
> > stores support atomic operations like this.
>
> I disagree because this misses many of the reasons for deprecation. It isn't 
> just that S3 didn't support a `putIfAbsent` operation. Other object stores 
> did and there are still several problems with this approach. The fundamental 
> issue is that it is attempting to solve problems at the wrong level.
>
> One of the reasons why Iceberg exists is that we saw people doing the same 
> thing with Parquet. People were trying to solve problems with their tables by 
> attempting to modify Parquet in wacky ways, like wanting to replace the 
> footer to make schema changes. Schema evolution needed to be solved at the 
> table level and in this community we've always tried to solve problems more 
> directly and elegantly by addressing them at the right layer of the stack.
>
> Iceberg tables scale up existing atomic operations to make transactional 
> guarantees on very large tables. Object stores and file systems aren't well 
> suited for this task. Just like they were not sufficient to provide 
> transactional guarantees across files and partitions, the primitives you can 
> use aren't sufficient for a database. Storage capabilities are also not the 
> right place to deliver other catalog features, like basic CRUD operations.
>
> The addition of `putIfAbsent` to S3 doesn't support transactions where you 
> need to modify multiple tables and it also doesn't address cases like the 
> need to atomically rename and delete tables. Schemes that use `putIfAbsent` 
> also rely either on consistent listing a large prefix or on maintaining a 
> version-hint file. That version-hint file can be out of date, so even with 
> one you still need to list or iteratively attempt to read metadata files to 
> determine the latest.
>
> Getting a file-only scheme right is complicated and is specific to a 
> particular store (both commits and version-hint handling). Local file systems 
> would use an exclusive create operation to commit, Hadoop uses atomic rename, 
> and object stores use different `putIfAbsent` operations. Making this work 
> across languages and engines requires a lot of work to specify requirements 
> and document issues, only to get to single-table functionality that doesn't 
> deliver the catalog-level primitives like atomic rename that are commonly 
> used.
>
> In the end, catalog problems are best solved at the catalog layer, not 
> through an elaborate scheme that uses storage-layer primitives, just as it 
> was not a good idea to deliver table behaviors using similar storage-layer 
> schemes. Adding `putIfAbsent` to S3 doesn't change that design principle.
>
> I sympathize with the idea that it would be great if you didn't need a 
> catalog. Simpler infrastructure is generally better.
>
> But trying to avoid a catalog limits the capabilities of this infrastructure, 
> while setting people up for later failure. When I talk with people that have 
> been trying to avoid having a catalog, they tend to have tables scattered 
> across buckets that they need to track down, they lack observability to know 
> what is being used, don't to know if they are deleting data in compliance 
> with regulations, and they often lack simple and usable access controls.
>
> I think that the solution is to make it easier to run or use a catalog, not 
> to try to build without one.
>
> And I'm also looking forward to what Jack is alluding to.
>
> On Tue, Nov 26, 2024 at 11:05 PM Ajantha Bhat <[email protected]> wrote:
>
> Interesting.
>
> We already have file system tables [1] in Iceberg (HadoopCatalog implements 
> this spec).
> We deprecated this recently and we don't have to deprecate it if object 
> stores support atomic operations like this.
>
> [1] https://iceberg.apache.org/spec/#file-system-tables
>
> - Ajantha
>
> On Wed, Nov 27, 2024 at 2:53 AM Nikhil Benesch <[email protected]> 
> wrote:
>
> Ah, fascinating. Thanks very much for the pointer.
>
> Here's the thread introducing the proposal [0], for anyone else curious.
>
> [0]: https://lists.apache.org/thread/kh4n98w4z22sc8h2vot4q8n44vdtnltg
>
> On Tue, Nov 26, 2024 at 3:27 PM Jean-Baptiste Onofré <[email protected]> 
> wrote:
> >
> > Hi Vignesh
> >
> > Thanks for the reminder, I remember we quickly discussed this during a
> > community meeting.
> >
> > I will take a new look at the doc.
> >
> > Regards
> > JB
> >
> > On Tue, Nov 26, 2024 at 9:19 PM Vignesh <[email protected]> wrote:
> > >
> > > Hi,
> > > There was a proposal along the same lines, for the read portion few weeks 
> > > back by Ashvin.
> > >
> > > https://docs.google.com/document/d/1yzLXSOtzBXyaWHfeVsWsMu4xmOH8rV6QyM5ZAnJZjMQ/edit?usp=drivesdk
> > >
> > > Thanks,
> > > Vignesh.
> > >
> > >
> > > On Tue, Nov 26, 2024, 11:59 AM Jean-Baptiste Onofré <[email protected]> 
> > > wrote:
> > >>
> > >> Hi Nikhil
> > >>
> > >> Thanks for your message, very interesting.
> > >>
> > >> I think it would be great to involve the Polaris project here as well,
> > >> as a REST Catalog implementation.
> > >> The Polaris community is discussing storage/backend right now, so it
> > >> would be the perfect timing to consider leveraging S3 conditional
> > >> writes (as a plugin for instance first).
> > >>
> > >> I would be happy to connect and know more about your perspective about 
> > >> that.
> > >>
> > >> Thanks,
> > >> Regards
> > >> JB
> > >>
> > >> PS: I will be at AWS re:Invent next week, so maybe we can connect there.
> > >>
> > >> On Tue, Nov 26, 2024 at 6:35 PM Nikhil Benesch 
> > >> <[email protected]> wrote:
> > >> >
> > >> > Hi all,
> > >> >
> > >> > With Amazon S3 announcing support for the If-Match header yesterday 
> > >> > [0], all the
> > >> > major object store implementations now support a compare-and-swap 
> > >> > operation.
> > >> >
> > >> > As far as I can tell, this opens up the possibility of storing Iceberg
> > >> > catalogs directly on object storage, without the need for a separate 
> > >> > metastore,
> > >> > and without violating any of Iceberg's ACID guarantees.
> > >> >
> > >> > It seems the immediate next step is to build an independent Java or 
> > >> > REST catalog
> > >> > backend to prove this concept out. Long term, though, the ideal would 
> > >> > be to
> > >> > have such a catalog backend be a first class citizen in the Iceberg 
> > >> > project.
> > >> >
> > >> > Is anyone else in the Iceberg community barking up this tree? I'm a 
> > >> > long term
> > >> > Iceberg enthusiast, but new to the community. I'd very much appreciate 
> > >> > any
> > >> > pointers to current or past discussions on the topic. So far all I've 
> > >> > been
> > >> > able to turn up is some light chatter from myself and others on 
> > >> > Bluesky and
> > >> > Hacker News ([1][2][3]).
> > >> >
> > >> > Cheers,
> > >> > Nikhil
> > >> >
> > >> > [0]: 
> > >> > https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/
> > >> > [1]: https://bsky.app/profile/benesch.bsky.social/post/3lauesxg3ic2c
> > >> > [2]: https://bsky.app/profile/eatonphil.bsky.social/post/3lbskq3jwk22e
> > >> > [3]: https://news.ycombinator.com/item?id=42240370
>
> Xuanwo
>
> https://xuanwo.io/
>

Re: Storing catalog directly on object store

Reply via email to