Thanks Dennis for raising this! I had a similar discussion last year [1] that I definitely want to discuss more. But I feel the main focus of this discussion is less about external tables, but more about federation vs notification. For this topic, I have 2 questions:
(1) To support federation, what is still missing in the REST spec? It feels to me that we don't need anything new. As long as the REST server supports hooking up to another REST server and adds the proxy routing logic, it would work. For example, think about a Polaris catalog federating a Unity catalog, Unity catalog will show as a "unity" namespace in Polaris, and a table ns1.table1 will be shown as unity.ns1.table1 through Polaris. A LoadTable(unity.ns1, table1) against this table will hit Polaris, and Polaris as a proxy calls Unity with LoadTable(ns1, table1). And you can imagine that you can do this proxy routing for every single API. One nit is that maybe mapping a catalog to a namespace level feels hacky, and we could add another catalog level in the REST spec with CRUD APIs for catalogs. That concept is already there in Polaris, maybe worth adding to Iceberg REST? But that is not strictly required for federation to work since we have the namespace approach. (2) if we all do federation, do we still need the notification approach? If the final goal is just to enable access of one catalog in another catalog, federation already serves the goal. There could be caching built at the proxy layer, and cache eviction could be improved with notifications. But that feels like an optimization. What is the value for having both approaches at the same time? Best, Jack Ye [1] https://lists.apache.org/thread/ohqfvhf4wofzkhrvff1lxl58blh432o6 On Wed, Oct 9, 2024 at 5:51 PM Dennis Huo <huoi...@gmail.com> wrote: > Summarizing discussion from today's Iceberg Catalog Community Sync, here > were some of the key points: > > - General agreement on the need for some flavors of mechanisms for > catalog federation in-line with this proposal > - We should come up with a more fitting name for the endpoint other > than "notifications" > - Some debate over whether to just add behaviors to updateTable or > registerTable endpoints; ultimately agreed that the behavior of these > tables is intended to be fundamentally different, and want to avoid > accidentally dangerous implementations, so it's better to have a > different > endpoint > - The idea of "Notifications" in itself is too general for this > purpose, and we might want something in the future that is more in-line > with much more generalized Notifications and don't want a conflict > - This endpoint focuses on the semantic of "force-update" without a > standard Iceberg commit protocol > - The endpoint should potentially be a "bulk endpoint" since the use > case is more likely to want to reflect batches at a time > - Some debate over whether this is strictly necessary, and whether > there would be any implicit atomicity expectations > - For this use case the goal is explicitly *not* to perform a > heavyweight commit protocol, so a bulk API is just an optimization to > avoid > making a bunch of individual calls; some or all of the requests in the > bulk > request could succeed or fail > - The receiving side should not have structured failure modes relating > to out-of-sync state -- e.g. the caller should not be depending on response > state to determine consistency on the sending side > - This was debated with pros/cons of sending meaningful response > errors > - Pro: Useful for the caller to receive some amount of feedback to > know whether the force-update made it through, whether there are other > issues preventing syncing, etc > - Con: This is likely a slippery-slope of scope creep that still > fundamentally only partially addresses failure modes; instead, the > overall > system must be designed for idempotency of declared updated state and if > consistency is desired, the caller must not rely only on responses to > reconcile state anyways > - We want to separate out the discussion of the relative merits of a > push vs pull model of federation, so the merits of pull/polling/readthrough > don't preclude adding this push-based endpoint > - In-depth discussion of relative pros/cons, but agreed that one > doesn't necessarily preclude the other, and this push endpoint targets a > particular use case > - Keep the notion of "external tables" only "implicit" instead of > having to plumb a new table type everywhere (for now?) > - We could document the intended behavior of tables that come into > existence from this endpoint having a different "ownership" semantic > than > those created by createTable/registerTable, but it REST spec itself > doesn't > necessarily need to expose any specific syntax/properties/etc about > these > tables > > Thanks everyone for the inputs to the discussion! Please feel free to > chime in here if I missed anything or got anything wrong from today's > discussion. > > > On Fri, Sep 20, 2024 at 9:05 PM Dennis Huo <huoi...@gmail.com> wrote: > >> Thanks for the input, Christian! >> >> I agree a comprehensive solution would likely require some notion of >> pull-based approaches (and even federated read-through on-demand). I see >> some pros/cons to both push and pull approaches, and it seems in part to >> relate to: >> >> - Whether only the "reflecting catalog" is an Iceberg REST server, or >> only the "owning catalog" is an Iceberg REST server, or both >> - Whether it's "easier" to put the complexity of >> connection/credential/state management in the "owning catalog" or in the >> "reflecting catalog" >> >> Though the "push" approach glosses over some of the potential complexity >> on the "owning catalog" side, it does seem like a more minimal starting >> point the doesn't require any additional state or data model within the >> Iceberg REST server, but can still be useful as a building block even where >> integrations aren't necessarily formally defined via the REST spec. For >> example, for a single-tenant internal deployment of an Iceberg REST server >> whose goal is to reflect a subset of a large legacy Hive metastore (which >> is the "owning" catalog in this case) where engines are using the Iceberg >> HiveCatalog, it may not be easy to retrofit a shim to expose a compatible >> "/changes" endpoint, but might be possible to add a >> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java >> with some one-off code that pushes to an Iceberg REST endpoint. >> >> You raise a good point about failure scenarios though; this flavor of >> federation doesn't provide strong consistency guarantees without some >> sophisticated cooperation within the query layer to somehow reconcile >> whether things are "up to date". Any "periodically-scheduled" pull-based >> model would have the same lack of strong consistency though, so it seems >> like query-time read-through would be a necessary component of a long-term >> "strongly consistent" model. >> >> Ultimately I could see a hybrid approach being useful for production use >> cases, especially if there are things like caching/prefetching that are >> important for performance at the "reflecting" Iceberg REST catalog; >> push-based for a "fast-path", polling pull-based for scenarios where push >> isn't possible from the source or is unreliable, and read-through as the >> "authoritative" state. >> >> On Fri, Sep 20, 2024 at 1:23 AM Christian Thiel >> <christ...@hansetag.com.invalid> wrote: >> >>> Hi Dennis, >>> >>> >>> >>> thanks for your initiative! >>> I believe externally owned Tables / Federation would enable a variety of >>> new use-cases in Data Mesh scenarios. >>> >>> >>> >>> Personally, I currently favor pull based over push-based approaches >>> (think /changes Endpoint) for the following reasons: >>> >>> - Less ambiguity for failure / missing messages. What does the >>> sender do if a POST fails? How often is it retried? What is the fallback >>> behavior? If a message is missed, how would the reflecting catalog ever >>> get >>> back to the correct state?. In contrast, a pull-based approach is quite >>> clear: The reflecting catalog is responsible to store a pointer and can >>> handle retries internally. >>> - Changes are not only relevant for other catalogs, but for a >>> variety of other systems that might want to act based on them. They might >>> not have a REST API and certainly don’t want to implement the Iceberg >>> REST >>> protocol (i.e. /config). >>> - Pull-based approaches need less configuration – only the >>> reflecting catalog needs to be configured. This follows the behavior we >>> already implement in the other endpoints with other clients. I don’t >>> think >>> the “owning” catalog must know where it’s federated to – very much like >>> it >>> doesn’t need to know which query engines access it. >>> - The "Push" feature itself not part of spec, thus making it easier >>> for Catalogs to just implement the receiving end without the actual >>> "push" >>> and still be 100% spec compliant - without being fully integrable with >>> other catalogs. This is also a problem regarding my first point: push & >>> receive behaviors and expectations must match between sender and >>> receiver – >>> and we don’t have a good place to document the “push” part. >>> >>> >>> >>> I would design a /changes endpoint to only contain the information THAT >>> something changed, not WHAT changed – to keep it lightweight. >>> For full change tracking I believe event queues / streaming solutions >>> such as kafka, nats are better suited. Standardizing events could be a >>> second step. In our catalog we are just using CloudEvents wrapped around >>> `TableUpdates` enriched with a few extensions. >>> >>> >>> >>> For both pull and push based approaches, your building block 1) is >>> needed anyway – so that’s surely a common ground. >>> >>> >>> >>> I would be interested to hear some more motivation from your side >>> @Dennis to choose the pull-based approach – maybe I am looking at this too >>> specific for my own use-case. >>> >>> >>> >>> Thanks! >>> Christian >>> >>> >>> >>> >>> >>> *From: *Dennis Huo <huoi...@gmail.com> >>> *Date: *Thursday, 19. September 2024 at 05:46 >>> *To: *dev@iceberg.apache.org <dev@iceberg.apache.org> >>> *Subject: *[DISCUSS] Defining a concept of "externally owned" tables in >>> the REST spec >>> >>> Hi all, >>> >>> >>> >>> I wanted to follow up on some discussions that came up in one of the >>> Iceberg Catalog community syncs awhile back relating to the concept of >>> tables that can be registered in an Iceberg REST Catalog but which have >>> their "source of truth" in some external Catalog. >>> >>> >>> >>> The original context was that Apache Polaris currently adds a >>> Polaris-specific method "sendNotification" on top of the otherwise standard >>> Iceberg REST API ( >>> https://github.com/apache/polaris/blob/0547e8b3a9e38fedc466348d05f3d448f4a03930/spec/rest-catalog-open-api.yaml#L977) >>> but the goal is to come up with something that the broader community can >>> align on to ensure standardization long term. >>> >>> >>> >>> This relates closely to a couple other more ambitious areas of >>> discussion that have also come up in community syncs: >>> >>> 1. Catalog Federation - defining the protocol(s) by which all our >>> different Iceberg REST Catalog implementations can talk to each other >>> cooperatively, where entity metadata might be read-through, pushed, or >>> pulled in various ways >>> 2. Generalized events and notifications - beyond serving the purpose >>> of federation, folks have proposed a generalized model that could also be >>> applied to things like workflow triggering >>> >>> In the narrowest formulation there are two building blocks to consider: >>> >>> 1. Expressing the concept of an "externally owned table" in an >>> Iceberg REST Catalog >>> >>> >>> 1. At the most basic level, this could just mean that the target >>> REST Catalog should refuse to perform mutation dances on the table >>> (i.e. >>> reject updateTable/commitTransaction calls on such tables) because it >>> knows >>> there's an external "source of truth" and wants to avoid causing a >>> split-brain problem >>> >>> >>> 2. Endpoint for doing a "simple" register/update of a table by >>> "forcing" the table metadata to the latest incarnation >>> >>> >>> 1. Instead of updates being something for this target REST Catalog >>> to perform a transaction protocol for, the semantic is that the >>> "source of >>> truth" transaction is already committed in the external source, so >>> this >>> target catalog's job is simply to "trust" the latest metadata (modulo >>> some >>> watermark semantics to deal with transient errors and out-of-order >>> deliveries) >>> >>> Interestingly, it appears there was a github issue filed awhile back for >>> some formulation of (2) that was closed silently: >>> https://github.com/apache/iceberg/issues/7261 >>> >>> >>> >>> It seems like there's an opportunity to find a good balance between >>> breadth of scope, generalizability and practicality in terms of what >>> building blocks can be defined in the core spec and what broader/ambitious >>> features can be built on top of it. >>> >>> >>> >>> Would love to hear everyone's thoughts on this. >>> >>> >>> >>> Cheers, >>> >>> Dennis >>> >>