Re: Storing catalog directly on object store

Xuanwo Tue, 03 Dec 2024 21:10:48 -0800

Hi, Nikhil

Thank you very much for bringing S3 tables discussion here.


However, I would like to point out that the S3 Table is not the same concept we 
are discussing here. It is not an object storage-based catalog; instead, it is 
a stateful service that provides dedicated APIs. It’s better to think of it as 
another AWS Glue, but internally backed by an S3 bucket.

Therefore, I believe we should split this into two separate discussion threads:

- Whether we should build S3 Tables catalog support similar to what we do for 
AWS Glue.
- Continuing the discussion about the object storage-based catalog. 


On Wed, Dec 4, 2024, at 03:17, Nikhil Benesch wrote:
> > And I'm also looking forward to what Jack is alluding to.
> 
> AWS just announced *native* S3 support for Iceberg buckets! [0] This is 
> almost surely what Jack was alluding to.
> 
> This is very cool. It's a much deeper integration than I was expecting but 
> nonetheless one that fully satisfies my use case [1]. 
> 
> In classic AWS fashion the documentation for the feature has not yet been 
> published. I'm also can't find the "Amazon S3 Tables Catalog for Apache 
> Iceberg" package that Jeff Barr references in his announcement post. I'll 
> circle back with details once these materials are made available. 
> 
> [0]: 
> https://aws.amazon.com/blogs/aws/new-amazon-s3-tables-storage-optimized-for-analytics-workloads/
> [1]: We're looking to add a native Iceberg-on-S3 export feature to 
> Materialize (https://materialize.com), but without requiring users to manage 
> a catalog.
> 
> On Wed, Nov 27, 2024 at 1:52 PM rdb...@gmail.com <rdb...@gmail.com> wrote:
>> > We deprecated this recently and we don't have to deprecate it if object 
>> > stores support atomic operations like this.
>> 
>> I disagree because this misses many of the reasons for deprecation. It isn't 
>> just that S3 didn't support a `putIfAbsent` operation. Other object stores 
>> did and there are still several problems with this approach. The fundamental 
>> issue is that it is attempting to solve problems at the wrong level.
>> 
>> One of the reasons why Iceberg exists is that we saw people doing the same 
>> thing with Parquet. People were trying to solve problems with their tables 
>> by attempting to modify Parquet in wacky ways, like wanting to replace the 
>> footer to make schema changes. Schema evolution needed to be solved at the 
>> table level and in this community we've always tried to solve problems more 
>> directly and elegantly by addressing them at the right layer of the stack.
>> 
>> Iceberg tables scale up existing atomic operations to make transactional 
>> guarantees on very large tables. Object stores and file systems aren't well 
>> suited for this task. Just like they were not sufficient to provide 
>> transactional guarantees across files and partitions, the primitives you can 
>> use aren't sufficient for a database. Storage capabilities are also not the 
>> right place to deliver other catalog features, like basic CRUD operations.
>> 
>> The addition of `putIfAbsent` to S3 doesn't support transactions where you 
>> need to modify multiple tables and it also doesn't address cases like the 
>> need to atomically rename and delete tables. Schemes that use `putIfAbsent` 
>> also rely either on consistent listing a large prefix or on maintaining a 
>> version-hint file. That version-hint file can be out of date, so even with 
>> one you still need to list or iteratively attempt to read metadata files to 
>> determine the latest.
>> 
>> Getting a file-only scheme right is complicated and is specific to a 
>> particular store (both commits and version-hint handling). Local file 
>> systems would use an exclusive create operation to commit, Hadoop uses 
>> atomic rename, and object stores use different `putIfAbsent` operations. 
>> Making this work across languages and engines requires a lot of work to 
>> specify requirements and document issues, only to get to single-table 
>> functionality that doesn't deliver the catalog-level primitives like atomic 
>> rename that are commonly used.
>> 
>> In the end, catalog problems are best solved at the catalog layer, not 
>> through an elaborate scheme that uses storage-layer primitives, just as it 
>> was not a good idea to deliver table behaviors using similar storage-layer 
>> schemes. Adding `putIfAbsent` to S3 doesn't change that design principle.
>> 
>> I sympathize with the idea that it would be great if you didn't need a 
>> catalog. Simpler infrastructure is generally better.
>> 
>> But trying to avoid a catalog limits the capabilities of this 
>> infrastructure, while setting people up for later failure. When I talk with 
>> people that have been trying to avoid having a catalog, they tend to have 
>> tables scattered across buckets that they need to track down, they lack 
>> observability to know what is being used, don't to know if they are deleting 
>> data in compliance with regulations, and they often lack simple and usable 
>> access controls.
>> 
>> I think that the solution is to make it easier to run or use a catalog, not 
>> to try to build without one.
>> 
>> And I'm also looking forward to what Jack is alluding to.
>> 
>> On Tue, Nov 26, 2024 at 11:05 PM Ajantha Bhat <ajanthab...@gmail.com> wrote:
>>> Interesting. 
>>> 
>>> We already have file system tables [1] in Iceberg (HadoopCatalog implements 
>>> this spec). 
>>> We deprecated this recently and we don't have to deprecate it if object 
>>> stores support atomic operations like this.
>>> 
>>> [1] https://iceberg.apache.org/spec/#file-system-tables 
>>> 
>>> - Ajantha
>>> 
>>> On Wed, Nov 27, 2024 at 2:53 AM Nikhil Benesch <nikhil.bene...@gmail.com> 
>>> wrote:
>>>> Ah, fascinating. Thanks very much for the pointer.
>>>> 
>>>> Here's the thread introducing the proposal [0], for anyone else curious.
>>>> 
>>>> [0]: https://lists.apache.org/thread/kh4n98w4z22sc8h2vot4q8n44vdtnltg
>>>> 
>>>> On Tue, Nov 26, 2024 at 3:27 PM Jean-Baptiste Onofré <j...@nanthrax.net> 
>>>> wrote:
>>>> >
>>>> > Hi Vignesh
>>>> >
>>>> > Thanks for the reminder, I remember we quickly discussed this during a
>>>> > community meeting.
>>>> >
>>>> > I will take a new look at the doc.
>>>> >
>>>> > Regards
>>>> > JB
>>>> >
>>>> > On Tue, Nov 26, 2024 at 9:19 PM Vignesh <vignesh.v...@gmail.com> wrote:
>>>> > >
>>>> > > Hi,
>>>> > > There was a proposal along the same lines, for the read portion few 
>>>> > > weeks back by Ashvin.
>>>> > >
>>>> > > https://docs.google.com/document/d/1yzLXSOtzBXyaWHfeVsWsMu4xmOH8rV6QyM5ZAnJZjMQ/edit?usp=drivesdk
>>>> > >
>>>> > > Thanks,
>>>> > > Vignesh.
>>>> > >
>>>> > >
>>>> > > On Tue, Nov 26, 2024, 11:59 AM Jean-Baptiste Onofré 
>>>> > > <j...@nanthrax.net> wrote:
>>>> > >>
>>>> > >> Hi Nikhil
>>>> > >>
>>>> > >> Thanks for your message, very interesting.
>>>> > >>
>>>> > >> I think it would be great to involve the Polaris project here as well,
>>>> > >> as a REST Catalog implementation.
>>>> > >> The Polaris community is discussing storage/backend right now, so it
>>>> > >> would be the perfect timing to consider leveraging S3 conditional
>>>> > >> writes (as a plugin for instance first).
>>>> > >>
>>>> > >> I would be happy to connect and know more about your perspective 
>>>> > >> about that.
>>>> > >>
>>>> > >> Thanks,
>>>> > >> Regards
>>>> > >> JB
>>>> > >>
>>>> > >> PS: I will be at AWS re:Invent next week, so maybe we can connect 
>>>> > >> there.
>>>> > >>
>>>> > >> On Tue, Nov 26, 2024 at 6:35 PM Nikhil Benesch 
>>>> > >> <nikhil.bene...@gmail.com> wrote:
>>>> > >> >
>>>> > >> > Hi all,
>>>> > >> >
>>>> > >> > With Amazon S3 announcing support for the If-Match header yesterday 
>>>> > >> > [0], all the
>>>> > >> > major object store implementations now support a compare-and-swap 
>>>> > >> > operation.
>>>> > >> >
>>>> > >> > As far as I can tell, this opens up the possibility of storing 
>>>> > >> > Iceberg
>>>> > >> > catalogs directly on object storage, without the need for a 
>>>> > >> > separate metastore,
>>>> > >> > and without violating any of Iceberg's ACID guarantees.
>>>> > >> >
>>>> > >> > It seems the immediate next step is to build an independent Java or 
>>>> > >> > REST catalog
>>>> > >> > backend to prove this concept out. Long term, though, the ideal 
>>>> > >> > would be to
>>>> > >> > have such a catalog backend be a first class citizen in the Iceberg 
>>>> > >> > project.
>>>> > >> >
>>>> > >> > Is anyone else in the Iceberg community barking up this tree? I'm a 
>>>> > >> > long term
>>>> > >> > Iceberg enthusiast, but new to the community. I'd very much 
>>>> > >> > appreciate any
>>>> > >> > pointers to current or past discussions on the topic. So far all 
>>>> > >> > I've been
>>>> > >> > able to turn up is some light chatter from myself and others on 
>>>> > >> > Bluesky and
>>>> > >> > Hacker News ([1][2][3]).
>>>> > >> >
>>>> > >> > Cheers,
>>>> > >> > Nikhil
>>>> > >> >
>>>> > >> > [0]: 
>>>> > >> > https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/
>>>> > >> > [1]: https://bsky.app/profile/benesch.bsky.social/post/3lauesxg3ic2c
>>>> > >> > [2]: 
>>>> > >> > https://bsky.app/profile/eatonphil.bsky.social/post/3lbskq3jwk22e
>>>> > >> > [3]: https://news.ycombinator.com/item?id=42240370
Xuanwo

https://xuanwo.io/

Re: Storing catalog directly on object store

Reply via email to