Re: Storing catalog directly on object store

Vladimir Ozerov Tue, 03 Dec 2024 22:27:11 -0800

I second Ryan’s opinion that production-grade catalog is a much broader
concept than just CAS-ing the pointer.


What we observe in practice in our company, is that users want to work with
large schemas (sometimes - with literally thousands schemes and millions
tables), have support for common DDL operations (including schema and table
renames), multi-table transactions, hot metadata cache, CDC capabilities
for schema changes, at least basic integrity constraints between objects,
ability to do consistent backups, integrated authentication and
authorization, centralized monitoring, etc.

Even REST catalog cannot handle some common cases now (namespace renames,
object references in views, etc).

With this in mind, it seems that while new S3 capabilities are formally
sufficient to implement a basic catalog, they can address only a small
fraction of real user requirements.

*Vladimir Ozerov*
Founder
querifylabs.com


Ср, 4 дек. 2024 г. в 08:11, Xuanwo <[email protected]>:

> Hi, Nikhil
>
> Thank you very much for bringing S3 tables discussion here.
>
> However, I would like to point out that the S3 Table is not the same
> concept we are discussing here. It is not an object storage-based catalog;
> instead, it is a stateful service that provides dedicated APIs. It’s better
> to think of it as another AWS Glue, but internally backed by an S3 bucket.
>
> Therefore, I believe we should split this into two separate discussion
> threads:
>
> - Whether we should build S3 Tables catalog support similar to what we do
> for AWS Glue.
> - Continuing the discussion about the object storage-based catalog.
>
>
> On Wed, Dec 4, 2024, at 03:17, Nikhil Benesch wrote:
>
> > And I'm also looking forward to what Jack is alluding to.
>
> AWS just announced *native* S3 support for Iceberg buckets! [0] This is
> almost surely what Jack was alluding to.
>
> This is very cool. It's a much deeper integration than I was expecting but
> nonetheless one that fully satisfies my use case [1].
>
> In classic AWS fashion the documentation for the feature has not yet been
> published. I'm also can't find the "Amazon S3 Tables Catalog for Apache
> Iceberg" package that Jeff Barr references in his announcement post. I'll
> circle back with details once these materials are made available.
>
> [0]:
> https://aws.amazon.com/blogs/aws/new-amazon-s3-tables-storage-optimized-for-analytics-workloads/
> [1]: We're looking to add a native Iceberg-on-S3 export feature to
> Materialize (https://materialize.com), but without requiring users to
> manage a catalog.
>
> On Wed, Nov 27, 2024 at 1:52 PM [email protected] <[email protected]> wrote:
>
> > We deprecated this recently and we don't have to deprecate it if object
> stores support atomic operations like this.
>
> I disagree because this misses many of the reasons for deprecation. It
> isn't just that S3 didn't support a `putIfAbsent` operation. Other object
> stores did and there are still several problems with this approach. The
> fundamental issue is that it is attempting to solve problems at the wrong
> level.
>
> One of the reasons why Iceberg exists is that we saw people doing the same
> thing with Parquet. People were trying to solve problems with their tables
> by attempting to modify Parquet in wacky ways, like wanting to replace
> the footer to make schema changes. Schema evolution needed to be solved at
> the table level and in this community we've always tried to solve problems
> more directly and elegantly by addressing them at the right layer of the
> stack.
>
> Iceberg tables scale up existing atomic operations to make transactional
> guarantees on very large tables. Object stores and file systems aren't well
> suited for this task. Just like they were not sufficient to provide
> transactional guarantees across files and partitions, the primitives you
> can use aren't sufficient for a database. Storage capabilities are also not
> the right place to deliver other catalog features, like basic CRUD
> operations.
>
> The addition of `putIfAbsent` to S3 doesn't support transactions where you
> need to modify multiple tables and it also doesn't address cases like the
> need to atomically rename and delete tables. Schemes that use `putIfAbsent`
> also rely either on consistent listing a large prefix or on maintaining a
> version-hint file. That version-hint file can be out of date, so even with
> one you still need to list or iteratively attempt to read metadata files to
> determine the latest.
>
> Getting a file-only scheme right is complicated and is specific to a
> particular store (both commits and version-hint handling). Local file
> systems would use an exclusive create operation to commit, Hadoop uses
> atomic rename, and object stores use different `putIfAbsent` operations.
> Making this work across languages and engines requires a lot of work to
> specify requirements and document issues, only to get to single-table
> functionality that doesn't deliver the catalog-level primitives like atomic
> rename that are commonly used.
>
> In the end, catalog problems are best solved at the catalog layer, not
> through an elaborate scheme that uses storage-layer primitives, just as it
> was not a good idea to deliver table behaviors using similar storage-layer
> schemes. Adding `putIfAbsent` to S3 doesn't change that design principle.
>
> I sympathize with the idea that it would be great if you didn't need a
> catalog. Simpler infrastructure is generally better.
>
> But trying to avoid a catalog limits the capabilities of this
> infrastructure, while setting people up for later failure. When I talk with
> people that have been trying to avoid having a catalog, they tend to have
> tables scattered across buckets that they need to track down, they lack
> observability to know what is being used, don't to know if they are
> deleting data in compliance with regulations, and they often lack simple
> and usable access controls.
>
> I think that the solution is to make it easier to run or use a catalog,
> not to try to build without one.
>
> And I'm also looking forward to what Jack is alluding to.
>
> On Tue, Nov 26, 2024 at 11:05 PM Ajantha Bhat <[email protected]>
> wrote:
>
> Interesting.
>
> We already have file system tables [1] in Iceberg (HadoopCatalog
> implements this spec).
> We deprecated this recently and we don't have to deprecate it if object
> stores support atomic operations like this.
>
> [1] https://iceberg.apache.org/spec/#file-system-tables
>
> - Ajantha
>
> On Wed, Nov 27, 2024 at 2:53 AM Nikhil Benesch <[email protected]>
> wrote:
>
> Ah, fascinating. Thanks very much for the pointer.
>
> Here's the thread introducing the proposal [0], for anyone else curious.
>
> [0]: https://lists.apache.org/thread/kh4n98w4z22sc8h2vot4q8n44vdtnltg
>
> On Tue, Nov 26, 2024 at 3:27 PM Jean-Baptiste Onofré <[email protected]>
> wrote:
> >
> > Hi Vignesh
> >
> > Thanks for the reminder, I remember we quickly discussed this during a
> > community meeting.
> >
> > I will take a new look at the doc.
> >
> > Regards
> > JB
> >
> > On Tue, Nov 26, 2024 at 9:19 PM Vignesh <[email protected]> wrote:
> > >
> > > Hi,
> > > There was a proposal along the same lines, for the read portion few
> weeks back by Ashvin.
> > >
> > >
> https://docs.google.com/document/d/1yzLXSOtzBXyaWHfeVsWsMu4xmOH8rV6QyM5ZAnJZjMQ/edit?usp=drivesdk
> > >
> > > Thanks,
> > > Vignesh.
> > >
> > >
> > > On Tue, Nov 26, 2024, 11:59 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
> > >>
> > >> Hi Nikhil
> > >>
> > >> Thanks for your message, very interesting.
> > >>
> > >> I think it would be great to involve the Polaris project here as well,
> > >> as a REST Catalog implementation.
> > >> The Polaris community is discussing storage/backend right now, so it
> > >> would be the perfect timing to consider leveraging S3 conditional
> > >> writes (as a plugin for instance first).
> > >>
> > >> I would be happy to connect and know more about your perspective
> about that.
> > >>
> > >> Thanks,
> > >> Regards
> > >> JB
> > >>
> > >> PS: I will be at AWS re:Invent next week, so maybe we can connect
> there.
> > >>
> > >> On Tue, Nov 26, 2024 at 6:35 PM Nikhil Benesch <
> [email protected]> wrote:
> > >> >
> > >> > Hi all,
> > >> >
> > >> > With Amazon S3 announcing support for the If-Match header yesterday
> [0], all the
> > >> > major object store implementations now support a compare-and-swap
> operation.
> > >> >
> > >> > As far as I can tell, this opens up the possibility of storing
> Iceberg
> > >> > catalogs directly on object storage, without the need for a
> separate metastore,
> > >> > and without violating any of Iceberg's ACID guarantees.
> > >> >
> > >> > It seems the immediate next step is to build an independent Java or
> REST catalog
> > >> > backend to prove this concept out. Long term, though, the ideal
> would be to
> > >> > have such a catalog backend be a first class citizen in the Iceberg
> project.
> > >> >
> > >> > Is anyone else in the Iceberg community barking up this tree? I'm a
> long term
> > >> > Iceberg enthusiast, but new to the community. I'd very much
> appreciate any
> > >> > pointers to current or past discussions on the topic. So far all
> I've been
> > >> > able to turn up is some light chatter from myself and others on
> Bluesky and
> > >> > Hacker News ([1][2][3]).
> > >> >
> > >> > Cheers,
> > >> > Nikhil
> > >> >
> > >> > [0]:
> https://aws.amazon.com/about-aws/whats-new/2024/11/amazon-s3-functionality-conditional-writes/
> > >> > [1]:
> https://bsky.app/profile/benesch.bsky.social/post/3lauesxg3ic2c
> > >> > [2]:
> https://bsky.app/profile/eatonphil.bsky.social/post/3lbskq3jwk22e
> > >> > [3]: https://news.ycombinator.com/item?id=42240370
>
> Xuanwo
>
> https://xuanwo.io/
>
>

Re: Storing catalog directly on object store

Reply via email to