Hi Ryan,

Thanks for starting the thread! Just want to share some of my thoughts
related to this topic.

I think the AWS Glue, DynamoDB and JDBC catalogs will continue to live, I
don't see a unification through REST as we are not going to build a REST
server between Iceberg and the related AWS services, and I think anyone can
continue to add more implementations in this route if they want (although
not recommended). In my opinion, everything in Iceberg will be client side,
there is not going to be a server module because that is what the extension
point is. REST catalog is just a parallel implementation to
BaseMetastoreCatalog, but it does not enforce writing a JSON metadata file.
Instead, it only tells the server side the set of changes and let the
server handle those changes, the server can choose to write the JSON file
in whatever way it wants, or even not write it at all. Just like Glue,
DynamoDB and JDBC all extend BaseMetastoreCatalog, other catalogs can
choose to "extend" the REST catalog, but the extension is through the
OpenAPI REST client but not just Java inheritance.

The biggest benefit I see out of this development is that catalog providers
can focus on the server side implementation to build really great catalog
services with all sorts of nice features, and no new integration and
maintenance is needed when Iceberg rolls out new catalog features or
support for new languages because everything goes through the base REST
implementation. Because of such simplification in open source
compatibility, I think most new catalog providers will prefer integration
through REST. In addition, systems that only have exposure to a
non-Java/Python language can also be used as a catalog provider using a
client generated from the OpenAPI spec. It does not need to have any Java
compatibility. Just like there are people who prefer DynamoDB catalog over
Glue catalog, we also have use cases in AWS for catalog implementations
that would only be achievable through a REST catalog, which I will
contribute in the future after the REST catalog is finalized.

The fact that the REST catalog server receives table changes instead of
rewriting the entire table metadata also means the catalog service can
optimize a lot of performance aspects. We have seen issues in streaming
where the table metadata JSON file size gets too big and impact read, we
also generally agree that small table metadata update through rewriting the
entire metadata file is very inefficient. All these issues could be fixed
by moving to a client-server model for a scalable service to handle and
store these changes.

Best,
Jack Ye

On Mon, Dec 13, 2021 at 12:28 PM Ryan Murray <rym...@dremio.com> wrote:

> Hi all,
>
>
> For those of you who haven't been following there has been some
> interesting discussion around the proposal for a REST based catalog[1].
>
>
> One of the primary questions I had while reading it was 'what is the
> overall goal of the API?'. Given the size of this question I thought it
> might be better to pose it on the mailing list than to clutter the PR.
>
>
> So I guess primarily for Kyle: what is the long term goal/vision for the
> REST catalog? Eg what are the use cases and who are the users? Do you see
> this unifying the other existing catalogs or do you see it as another
> catalog to compliment existing choices?
>
>
> Additionally,
>
> * Is this a spec at the level that the table spec exists or is this an
> informative PR to agree on the REST api of _a_ catalog?
>
> * Is it meant to enshrine the `Catalog` interface into a spec? This came
> up on a python sync also
>
> * Will there be both server and client modules in the iceberg codebase? I
> would expect that at least a reference implementation of a server would be
> a good thing but this would be the first part of the codebase that runs as
> a server instead of as client code in an engine. On the other side an open
> api spec and a client impl w/o a server sounds like it's missing something.
>
> * It may be early to say for sure but does a server implementation imply
> authn/z, database backends, deployment artifacts and all the other fun
> things that go into a server side component?
>
>
> That's just a few things I have been thinking about. Curious to see if
> anyone else has been thinking similarly and very excited to hear your
> thoughts Kyle. Also very excited to see this catalog develop. The activity
> on the PR speaks to how excited people are about it landing.
>
>
> Best,
>
> Ryan
>
>
> [1] https://github.com/apache/iceberg/pull/3561
>

Reply via email to