Summarizing discussion from today's Iceberg Catalog Community Sync, here were some of the key points:
- General agreement on the need for some flavors of mechanisms for catalog federation in-line with this proposal - We should come up with a more fitting name for the endpoint other than "notifications" - Some debate over whether to just add behaviors to updateTable or registerTable endpoints; ultimately agreed that the behavior of these tables is intended to be fundamentally different, and want to avoid accidentally dangerous implementations, so it's better to have a different endpoint - The idea of "Notifications" in itself is too general for this purpose, and we might want something in the future that is more in-line with much more generalized Notifications and don't want a conflict - This endpoint focuses on the semantic of "force-update" without a standard Iceberg commit protocol - The endpoint should potentially be a "bulk endpoint" since the use case is more likely to want to reflect batches at a time - Some debate over whether this is strictly necessary, and whether there would be any implicit atomicity expectations - For this use case the goal is explicitly *not* to perform a heavyweight commit protocol, so a bulk API is just an optimization to avoid making a bunch of individual calls; some or all of the requests in the bulk request could succeed or fail - The receiving side should not have structured failure modes relating to out-of-sync state -- e.g. the caller should not be depending on response state to determine consistency on the sending side - This was debated with pros/cons of sending meaningful response errors - Pro: Useful for the caller to receive some amount of feedback to know whether the force-update made it through, whether there are other issues preventing syncing, etc - Con: This is likely a slippery-slope of scope creep that still fundamentally only partially addresses failure modes; instead, the overall system must be designed for idempotency of declared updated state and if consistency is desired, the caller must not rely only on responses to reconcile state anyways - We want to separate out the discussion of the relative merits of a push vs pull model of federation, so the merits of pull/polling/readthrough don't preclude adding this push-based endpoint - In-depth discussion of relative pros/cons, but agreed that one doesn't necessarily preclude the other, and this push endpoint targets a particular use case - Keep the notion of "external tables" only "implicit" instead of having to plumb a new table type everywhere (for now?) - We could document the intended behavior of tables that come into existence from this endpoint having a different "ownership" semantic than those created by createTable/registerTable, but it REST spec itself doesn't necessarily need to expose any specific syntax/properties/etc about these tables Thanks everyone for the inputs to the discussion! Please feel free to chime in here if I missed anything or got anything wrong from today's discussion. On Fri, Sep 20, 2024 at 9:05 PM Dennis Huo <huoi...@gmail.com> wrote: > Thanks for the input, Christian! > > I agree a comprehensive solution would likely require some notion of > pull-based approaches (and even federated read-through on-demand). I see > some pros/cons to both push and pull approaches, and it seems in part to > relate to: > > - Whether only the "reflecting catalog" is an Iceberg REST server, or > only the "owning catalog" is an Iceberg REST server, or both > - Whether it's "easier" to put the complexity of > connection/credential/state management in the "owning catalog" or in the > "reflecting catalog" > > Though the "push" approach glosses over some of the potential complexity > on the "owning catalog" side, it does seem like a more minimal starting > point the doesn't require any additional state or data model within the > Iceberg REST server, but can still be useful as a building block even where > integrations aren't necessarily formally defined via the REST spec. For > example, for a single-tenant internal deployment of an Iceberg REST server > whose goal is to reflect a subset of a large legacy Hive metastore (which > is the "owning" catalog in this case) where engines are using the Iceberg > HiveCatalog, it may not be easy to retrofit a shim to expose a compatible > "/changes" endpoint, but might be possible to add a > https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreEventListener.java > with some one-off code that pushes to an Iceberg REST endpoint. > > You raise a good point about failure scenarios though; this flavor of > federation doesn't provide strong consistency guarantees without some > sophisticated cooperation within the query layer to somehow reconcile > whether things are "up to date". Any "periodically-scheduled" pull-based > model would have the same lack of strong consistency though, so it seems > like query-time read-through would be a necessary component of a long-term > "strongly consistent" model. > > Ultimately I could see a hybrid approach being useful for production use > cases, especially if there are things like caching/prefetching that are > important for performance at the "reflecting" Iceberg REST catalog; > push-based for a "fast-path", polling pull-based for scenarios where push > isn't possible from the source or is unreliable, and read-through as the > "authoritative" state. > > On Fri, Sep 20, 2024 at 1:23 AM Christian Thiel > <christ...@hansetag.com.invalid> wrote: > >> Hi Dennis, >> >> >> >> thanks for your initiative! >> I believe externally owned Tables / Federation would enable a variety of >> new use-cases in Data Mesh scenarios. >> >> >> >> Personally, I currently favor pull based over push-based approaches >> (think /changes Endpoint) for the following reasons: >> >> - Less ambiguity for failure / missing messages. What does the sender >> do if a POST fails? How often is it retried? What is the fallback >> behavior? >> If a message is missed, how would the reflecting catalog ever get back to >> the correct state?. In contrast, a pull-based approach is quite clear: The >> reflecting catalog is responsible to store a pointer and can handle >> retries >> internally. >> - Changes are not only relevant for other catalogs, but for a variety >> of other systems that might want to act based on them. They might not have >> a REST API and certainly don’t want to implement the Iceberg REST protocol >> (i.e. /config). >> - Pull-based approaches need less configuration – only the reflecting >> catalog needs to be configured. This follows the behavior we already >> implement in the other endpoints with other clients. I don’t think the >> “owning” catalog must know where it’s federated to – very much like it >> doesn’t need to know which query engines access it. >> - The "Push" feature itself not part of spec, thus making it easier >> for Catalogs to just implement the receiving end without the actual "push" >> and still be 100% spec compliant - without being fully integrable with >> other catalogs. This is also a problem regarding my first point: push & >> receive behaviors and expectations must match between sender and receiver >> – >> and we don’t have a good place to document the “push” part. >> >> >> >> I would design a /changes endpoint to only contain the information THAT >> something changed, not WHAT changed – to keep it lightweight. >> For full change tracking I believe event queues / streaming solutions >> such as kafka, nats are better suited. Standardizing events could be a >> second step. In our catalog we are just using CloudEvents wrapped around >> `TableUpdates` enriched with a few extensions. >> >> >> >> For both pull and push based approaches, your building block 1) is needed >> anyway – so that’s surely a common ground. >> >> >> >> I would be interested to hear some more motivation from your side @Dennis >> to choose the pull-based approach – maybe I am looking at this too specific >> for my own use-case. >> >> >> >> Thanks! >> Christian >> >> >> >> >> >> *From: *Dennis Huo <huoi...@gmail.com> >> *Date: *Thursday, 19. September 2024 at 05:46 >> *To: *dev@iceberg.apache.org <dev@iceberg.apache.org> >> *Subject: *[DISCUSS] Defining a concept of "externally owned" tables in >> the REST spec >> >> Hi all, >> >> >> >> I wanted to follow up on some discussions that came up in one of the >> Iceberg Catalog community syncs awhile back relating to the concept of >> tables that can be registered in an Iceberg REST Catalog but which have >> their "source of truth" in some external Catalog. >> >> >> >> The original context was that Apache Polaris currently adds a >> Polaris-specific method "sendNotification" on top of the otherwise standard >> Iceberg REST API ( >> https://github.com/apache/polaris/blob/0547e8b3a9e38fedc466348d05f3d448f4a03930/spec/rest-catalog-open-api.yaml#L977) >> but the goal is to come up with something that the broader community can >> align on to ensure standardization long term. >> >> >> >> This relates closely to a couple other more ambitious areas of discussion >> that have also come up in community syncs: >> >> 1. Catalog Federation - defining the protocol(s) by which all our >> different Iceberg REST Catalog implementations can talk to each other >> cooperatively, where entity metadata might be read-through, pushed, or >> pulled in various ways >> 2. Generalized events and notifications - beyond serving the purpose >> of federation, folks have proposed a generalized model that could also be >> applied to things like workflow triggering >> >> In the narrowest formulation there are two building blocks to consider: >> >> 1. Expressing the concept of an "externally owned table" in an >> Iceberg REST Catalog >> >> >> 1. At the most basic level, this could just mean that the target REST >> Catalog should refuse to perform mutation dances on the table (i.e. >> reject >> updateTable/commitTransaction calls on such tables) because it knows >> there's an external "source of truth" and wants to avoid causing a >> split-brain problem >> >> >> 2. Endpoint for doing a "simple" register/update of a table by >> "forcing" the table metadata to the latest incarnation >> >> >> 1. Instead of updates being something for this target REST Catalog to >> perform a transaction protocol for, the semantic is that the "source of >> truth" transaction is already committed in the external source, so this >> target catalog's job is simply to "trust" the latest metadata (modulo >> some >> watermark semantics to deal with transient errors and out-of-order >> deliveries) >> >> Interestingly, it appears there was a github issue filed awhile back for >> some formulation of (2) that was closed silently: >> https://github.com/apache/iceberg/issues/7261 >> >> >> >> It seems like there's an opportunity to find a good balance between >> breadth of scope, generalizability and practicality in terms of what >> building blocks can be defined in the core spec and what broader/ambitious >> features can be built on top of it. >> >> >> >> Would love to hear everyone's thoughts on this. >> >> >> >> Cheers, >> >> Dennis >> >