By the way, I think it's a good time to think about REST Catalog API v2. Actually, I would name this the Catalog RFC containing: - the RFC description itself (documentation) - the improved OpenAPI 3.0 spec - possible OpenAPI extensions (allowing extra features, vendor specific, etc) https://swagger.io/docs/specification/openapi-extensions/ - TCK/ref impl to validate the RFC
I think it would be great to have it in a separate repo (let's say iceberg-catalog) similar to iceberg-rust or iceberg-python. I'm ready to start a document to share some ideas with a call for action to anyone who wants to participate/contribute. Thoughts ? Regards JB On Thu, Feb 29, 2024 at 11:56 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > > Hi Ajantha, > > Thanks for sharing your thoughts. > > It makes sense for Gravitino to be a TLP (after the incubation period) > because Gravitino is "more" than an Iceberg catalog. It implements the > Iceberg REST Catalog API, but it's also a metadata catalog/repo with > additional features. > > That said, I agree with what you said: > 1. We have the openapi yaml in the Iceberg project, but no reference > implementation in the project itself. I think REST Catalog is a good > approach as a "central" Catalog API because any Iceberg engine/layer > could use this API (even if written in Python, rust, go, whatever), > and it allows new use cases (like easily move data from an engine to > another as the catalog API would be the same). > 2. From an ASF standpoint, I would not talk about "subproject" but > more repositories. The reason is because in terms of governance, it's > still the Iceberg project (PMC member or committer has the same > permission on all repositories in the Iceberg project, it's not > possible to have a committer only on iceberg-rust for instance. > Generally speaking, we should limit the number of subprojects. > 3. I think it would be fair to have REST Catalog resources (openapi > yaml + a ref impl) in a iceberg-catalog repository. > 4. However, It's important to have a more global discussion within the > community about Iceberg 2.0 and the roadmap about catalogs: do we > deprecate Iceberg Java Catalog API in favor of the REST Catalog API ? > What do we do with the existing catalogs ? etc. I think it's a fair > discussion to have for Iceberg 2.0. > > It's an important discussion, community driven. > > Regards > JB > > On Thu, Feb 29, 2024 at 9:44 AM Ajantha Bhat <ajanthab...@gmail.com> wrote: > > > > I apologize for the delay in responding. > > > > I'm pleased to see the development of an open-source REST catalog > > implementation, and the potential transition of Gravitino to an ASF project > > is certainly promising. > > But REST catalog server implementation will be a small part of Gravitino > > ASF project. Which has many other things along with the catalog? > > > > While I understand Iceberg's focus on the table format specification and > > its implementation, > > I would like to propose the creation of a sub-project for the REST catalog > > server implementation under the Iceberg repository (similar to pyiceberg, > > iceberg-rust, etc.). > > This suggestion is based on several reasons: > > > > Everytime we make a change to the REST spec, there is no reference > > implementation to refer to or modify it. > > Many companies such as AWS, Apple, Tabular, and Datastrato are each > > implementing their own REST servers. > > Consolidating efforts within a sub-project could lead to efficiency gains > > and potential collaboration opportunities. > > From the perspective of open-source users, the absence of an open-source > > implementation for the REST catalog within Iceberg may be inconvenient or > > frustrating. > > > > I believe creating a dedicated sub-project would address these concerns and > > enhance the overall usability and collaborative nature of the Iceberg > > ecosystem. > > I also think we can have a sub-project for kafka-connect and iceberg tools > > (delta converter, catalog migrator etc) as they need not have to depend on > > the Iceberg release cycle > > and they are independent of table format spec. > > > > Let me know your thoughts on this. I can open a separate thread for > > discussion if required. > > > > - Ajantha > > > > > > On Wed, Jan 31, 2024 at 5:32 AM Jack Ye <yezhao...@gmail.com> wrote: > >> > >> +1 for using test-jar! > >> > >> -Jack > >> > >> On Fri, Jan 26, 2024 at 10:48 AM Ryan Blue <b...@tabular.io> wrote: > >>> > >>> I think I'd be fine exposing this through a test Jar, but it seems to me > >>> that if we were to put it into a normal package it would turn into the > >>> situation we want to avoid. People would use it for unintended purposes > >>> and it would become a distraction. > >>> > >>> What do you think about using the tests Jar for this? > >>> > >>> On Thu, Jan 25, 2024 at 12:48 PM Jack Ye <yezhao...@gmail.com> wrote: > >>>> > >>>> Yes, sorry I did not make it clear, I also agree it is not the right > >>>> direction to invest a lot of community effort. I am more talking about > >>>> casual use cases like importing a server for unit tests outside Iceberg, > >>>> running some local debugging, etc. I think it would be valuable to > >>>> provide a server in Iceberg for that purpose, and maybe vend it as test > >>>> utils. Thoughts? > >>>> > >>>> -Jack > >>>> > >>>> On Thu, Jan 25, 2024 at 11:35 AM Ryan Blue <b...@tabular.io> wrote: > >>>>> > >>>>> > I know we have the RESTCatalogAdapter and RESTCatalogSevlet for unit > >>>>> > tests, and technically we have a very similar Jetty server > >>>>> > implementation in TestRESTCatalog. Should we think about making those > >>>>> > components out of the tests into an iceberg-rest-server module for > >>>>> > this use case, and merge with the implementation that Gravitino has? > >>>>> > >>>>> I think that this would take the Iceberg project in the wrong > >>>>> direction. Iceberg has always been a library and I think it should > >>>>> continue to be. Concerns about runtime should be left to other projects > >>>>> that need to fit into existing infrastructure or skillsets of people > >>>>> maintaining them. The question of whether to use Jetty or Tomcat or > >>>>> whatever else is a serious consideration, as is how to monitor that > >>>>> application and send metrics. I think it would slow down the core > >>>>> purpose of Iceberg if we got distracted by these things. > >>>>> > >>>>> In fact, I think that this project shows that the library is getting > >>>>> the balance right: it is using `CatalogHandlers` for their intended > >>>>> purpose. It has opinions about how to run the actual HTTP service and > >>>>> people that agree can use it. Other people could use `CatalogHandlers` > >>>>> to build on a different foundation. > >>>>> > >>>>> On Thu, Jan 25, 2024 at 11:13 AM Jack Ye <yezhao...@gmail.com> wrote: > >>>>>> > >>>>>> Really cool project! > >>>>>> > >>>>>> I browsed a bit of the codebase, and see this implementation of the > >>>>>> REST service backend: > >>>>>> - > >>>>>> https://github.com/datastrato/gravitino/blob/main/catalogs/catalog-lakehouse-iceberg/src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/IcebergRESTService.java#L39 > >>>>>> - > >>>>>> https://github.com/datastrato/gravitino/blob/main/catalogs/catalog-lakehouse-iceberg/src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/ops/IcebergTableOps.java#L42-L51 > >>>>>> > >>>>>> Looks like it is initializing a Jetty server that uses > >>>>>> CatalogHandlers to delegate the execution to a specific Java Catalog > >>>>>> implementation. > >>>>>> > >>>>>> I think this is actually something that is lacking today in Iceberg, > >>>>>> which is an easy way for users to start an actual REST HTTP server. > >>>>>> > >>>>>> I know we have the RESTCatalogAdapter and RESTCatalogSevlet for unit > >>>>>> tests, and technically we have a very similar Jetty server > >>>>>> implementation in TestRESTCatalog. Should we think about making those > >>>>>> components out of the tests into an iceberg-rest-server module for > >>>>>> this use case, and merge with the implementation that Gravitino has? > >>>>>> > >>>>>> Best, > >>>>>> Jack Ye > >>>>>> > >>>>>> On Thu, Jan 25, 2024 at 10:47 AM Yufei Gu <flyrain...@gmail.com> wrote: > >>>>>>> > >>>>>>> Thanks Justin for the sharing. > >>>>>>> > >>>>>>> It's pretty cool to see an open source REST catalog implementation in > >>>>>>> action. Having dabbled a bit in the early development of Gravitino > >>>>>>> myself, I'm really excited about its potential with the Iceberg REST > >>>>>>> catalog. > >>>>>>> > >>>>>>> The idea of Gravitino moving to an ASF project is promising. It’ll > >>>>>>> surely boost its visibility and open up more doors for collaboration > >>>>>>> and adoption. > >>>>>>> > >>>>>>> Looking forward to where this goes. Keep up the fantastic work! > >>>>>>> > >>>>>>> Yufei > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Jan 25, 2024 at 5:55 AM Jean-Baptiste Onofré > >>>>>>> <j...@nanthrax.net> wrote: > >>>>>>>> > >>>>>>>> Hi Justin, > >>>>>>>> > >>>>>>>> I talked with Junping a couple of months ago about Gravitino. Thanks > >>>>>>>> for sharing ! > >>>>>>>> > >>>>>>>> Regards > >>>>>>>> JB > >>>>>>>> > >>>>>>>> On Thu, Jan 25, 2024 at 12:15 AM Justin Mclean > >>>>>>>> <jus...@classsoftware.com> wrote: > >>>>>>>> > > >>>>>>>> > Hi, > >>>>>>>> > > >>>>>>>> > We open-sourced a new project, Gravitino, in December and have > >>>>>>>> > been working on growing the community and adding new > >>>>>>>> > functionality. We plan to donate the project to the ASF this year. > >>>>>>>> > Gravitino is a unified metadata lake solution offering a unified > >>>>>>>> > approach to managing datasets from diverse sources and regions > >>>>>>>> > across multiple cloud platforms. Its core is an Iceberg REST > >>>>>>>> > catalog service implementation to manage Iceberg tables > >>>>>>>> > efficiently. > >>>>>>>> > > >>>>>>>> > If this sounds like something you would be interested in, then the > >>>>>>>> > following resources will help: > >>>>>>>> > - Blog post: > >>>>>>>> > https://datastrato.ai/blog/gravitino-iceberg-rest-catalog-service/ > >>>>>>>> > - Gravitino documentation: https://datastrato.ai/docs/0.3.1/ > >>>>>>>> > - Iceberg REST service documentation: > >>>>>>>> > https://datastrato.ai/docs/0.3.1/iceberg-rest-service > >>>>>>>> > > >>>>>>>> > We welcome any feedback and suggestions you have, and as always, > >>>>>>>> > all contributions are welcome. You can find the source code at > >>>>>>>> > https://github.com/datastrato/gravitino. > >>>>>>>> > > >>>>>>>> > Kind Regards, > >>>>>>>> > Justin > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Ryan Blue > >>>>> Tabular > >>> > >>> > >>> > >>> -- > >>> Ryan Blue > >>> Tabular