By the way, I think it's a good time to think about REST Catalog API v2.

Actually, I would name this the Catalog RFC containing:
- the RFC description itself (documentation)
- the improved OpenAPI 3.0 spec
- possible OpenAPI extensions (allowing extra features, vendor
specific, etc) https://swagger.io/docs/specification/openapi-extensions/
- TCK/ref impl to validate the RFC

I think it would be great to have it in a separate repo (let's say
iceberg-catalog) similar to iceberg-rust or iceberg-python.

I'm ready to start a document to share some ideas with a call for
action to anyone who wants to participate/contribute.

Thoughts ?

Regards
JB

On Thu, Feb 29, 2024 at 11:56 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
>
> Hi Ajantha,
>
> Thanks for sharing your thoughts.
>
> It makes sense for Gravitino to be a TLP (after the incubation period)
> because Gravitino is "more" than an Iceberg catalog. It implements the
> Iceberg REST Catalog API, but it's also a metadata catalog/repo with
> additional features.
>
> That said, I agree with what you said:
> 1. We have the openapi yaml in the Iceberg project, but no reference
> implementation in the project itself. I think REST Catalog is a good
> approach as a "central" Catalog API because any Iceberg engine/layer
> could use this API (even if written in Python, rust, go, whatever),
> and it allows new use cases (like easily move data from an engine to
> another as the catalog API would be the same).
> 2. From an ASF standpoint, I would not talk about "subproject" but
> more repositories. The reason is because in terms of governance, it's
> still the Iceberg project (PMC member or committer has the same
> permission on all repositories in the Iceberg project, it's not
> possible to have a committer only on iceberg-rust for instance.
> Generally speaking, we should limit the number of subprojects.
> 3. I think it would be fair to have REST Catalog resources (openapi
> yaml + a ref impl) in a iceberg-catalog repository.
> 4. However, It's important to have a more global discussion within the
> community about Iceberg 2.0 and the roadmap about catalogs: do we
> deprecate Iceberg Java Catalog API in favor of the REST Catalog API ?
> What do we do with the existing catalogs ? etc. I think it's a fair
> discussion to have for Iceberg 2.0.
>
> It's an important discussion, community driven.
>
> Regards
> JB
>
> On Thu, Feb 29, 2024 at 9:44 AM Ajantha Bhat <ajanthab...@gmail.com> wrote:
> >
> > I apologize for the delay in responding.
> >
> > I'm pleased to see the development of an open-source REST catalog 
> > implementation, and the potential transition of Gravitino to an ASF project 
> > is certainly promising.
> > But REST catalog server implementation will be a small part of Gravitino 
> > ASF project. Which has many other things along with the catalog?
> >
> > While I understand Iceberg's focus on the table format specification and 
> > its implementation,
> > I would like to propose the creation of a sub-project for the REST catalog 
> > server implementation under the Iceberg repository (similar to pyiceberg, 
> > iceberg-rust, etc.).
> > This suggestion is based on several reasons:
> >
> > Everytime we make a change to the REST spec, there is no reference 
> > implementation to refer to or modify it.
> > Many companies such as AWS, Apple, Tabular, and Datastrato are each 
> > implementing their own REST servers.
> > Consolidating efforts within a sub-project could lead to efficiency gains 
> > and potential collaboration opportunities.
> > From the perspective of open-source users, the absence of an open-source 
> > implementation for the REST catalog within Iceberg may be inconvenient or 
> > frustrating.
> >
> > I believe creating a dedicated sub-project would address these concerns and 
> > enhance the overall usability and collaborative nature of the Iceberg 
> > ecosystem.
> > I also think we can have a sub-project for kafka-connect and iceberg tools 
> > (delta converter, catalog migrator etc) as they need not have to depend on 
> > the Iceberg release cycle
> > and they are independent of table format spec.
> >
> > Let me know your thoughts on this. I can open a separate thread for 
> > discussion if required.
> >
> > - Ajantha
> >
> >
> > On Wed, Jan 31, 2024 at 5:32 AM Jack Ye <yezhao...@gmail.com> wrote:
> >>
> >> +1 for using test-jar!
> >>
> >> -Jack
> >>
> >> On Fri, Jan 26, 2024 at 10:48 AM Ryan Blue <b...@tabular.io> wrote:
> >>>
> >>> I think I'd be fine exposing this through a test Jar, but it seems to me 
> >>> that if we were to put it into a normal package it would turn into the 
> >>> situation we want to avoid. People would use it for unintended purposes 
> >>> and it would become a distraction.
> >>>
> >>> What do you think about using the tests Jar for this?
> >>>
> >>> On Thu, Jan 25, 2024 at 12:48 PM Jack Ye <yezhao...@gmail.com> wrote:
> >>>>
> >>>> Yes, sorry I did not make it clear, I also agree it is not the right 
> >>>> direction to invest a lot of community effort. I am more talking about 
> >>>> casual use cases like importing a server for unit tests outside Iceberg, 
> >>>> running some local debugging, etc. I think it would be valuable to 
> >>>> provide a server in Iceberg for that purpose, and maybe vend it as test 
> >>>> utils. Thoughts?
> >>>>
> >>>> -Jack
> >>>>
> >>>> On Thu, Jan 25, 2024 at 11:35 AM Ryan Blue <b...@tabular.io> wrote:
> >>>>>
> >>>>> > I know we have the RESTCatalogAdapter and RESTCatalogSevlet for unit 
> >>>>> > tests, and technically we have a very similar Jetty server 
> >>>>> > implementation in TestRESTCatalog. Should we think about making those 
> >>>>> > components out of the tests into an iceberg-rest-server module for 
> >>>>> > this use case, and merge with the implementation that Gravitino has?
> >>>>>
> >>>>> I think that this would take the Iceberg project in the wrong 
> >>>>> direction. Iceberg has always been a library and I think it should 
> >>>>> continue to be. Concerns about runtime should be left to other projects 
> >>>>> that need to fit into existing infrastructure or skillsets of people 
> >>>>> maintaining them. The question of whether to use Jetty or Tomcat or 
> >>>>> whatever else is a serious consideration, as is how to monitor that 
> >>>>> application and send metrics. I think it would slow down the core 
> >>>>> purpose of Iceberg if we got distracted by these things.
> >>>>>
> >>>>> In fact, I think that this project shows that the library is getting 
> >>>>> the balance right: it is using `CatalogHandlers` for their intended 
> >>>>> purpose. It has opinions about how to run the actual HTTP service and 
> >>>>> people that agree can use it. Other people could use `CatalogHandlers` 
> >>>>> to build on a different foundation.
> >>>>>
> >>>>> On Thu, Jan 25, 2024 at 11:13 AM Jack Ye <yezhao...@gmail.com> wrote:
> >>>>>>
> >>>>>> Really cool project!
> >>>>>>
> >>>>>> I browsed a bit of the codebase, and see this implementation of the 
> >>>>>> REST service backend:
> >>>>>> - 
> >>>>>> https://github.com/datastrato/gravitino/blob/main/catalogs/catalog-lakehouse-iceberg/src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/IcebergRESTService.java#L39
> >>>>>> - 
> >>>>>> https://github.com/datastrato/gravitino/blob/main/catalogs/catalog-lakehouse-iceberg/src/main/java/com/datastrato/gravitino/catalog/lakehouse/iceberg/ops/IcebergTableOps.java#L42-L51
> >>>>>>
> >>>>>>  Looks like it is initializing a Jetty server that uses 
> >>>>>> CatalogHandlers to delegate the execution to a specific Java Catalog 
> >>>>>> implementation.
> >>>>>>
> >>>>>> I think this is actually something that is lacking today in Iceberg, 
> >>>>>> which is an easy way for users to start an actual REST HTTP server.
> >>>>>>
> >>>>>> I know we have the RESTCatalogAdapter and RESTCatalogSevlet for unit 
> >>>>>> tests, and technically we have a very similar Jetty server 
> >>>>>> implementation in TestRESTCatalog. Should we think about making those 
> >>>>>> components out of the tests into an iceberg-rest-server module for 
> >>>>>> this use case, and merge with the implementation that Gravitino has?
> >>>>>>
> >>>>>> Best,
> >>>>>> Jack Ye
> >>>>>>
> >>>>>> On Thu, Jan 25, 2024 at 10:47 AM Yufei Gu <flyrain...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Thanks Justin for the sharing.
> >>>>>>>
> >>>>>>> It's pretty cool to see an open source REST catalog implementation in 
> >>>>>>> action. Having dabbled a bit in the early development of Gravitino 
> >>>>>>> myself, I'm really excited about its potential with the Iceberg REST 
> >>>>>>> catalog.
> >>>>>>>
> >>>>>>> The idea of Gravitino moving to an ASF project is promising. It’ll 
> >>>>>>> surely boost its visibility and open up more doors for collaboration 
> >>>>>>> and adoption.
> >>>>>>>
> >>>>>>> Looking forward to where this goes. Keep up the fantastic work!
> >>>>>>>
> >>>>>>> Yufei
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Jan 25, 2024 at 5:55 AM Jean-Baptiste Onofré 
> >>>>>>> <j...@nanthrax.net> wrote:
> >>>>>>>>
> >>>>>>>> Hi Justin,
> >>>>>>>>
> >>>>>>>> I talked with Junping a couple of months ago about Gravitino. Thanks
> >>>>>>>> for sharing !
> >>>>>>>>
> >>>>>>>> Regards
> >>>>>>>> JB
> >>>>>>>>
> >>>>>>>> On Thu, Jan 25, 2024 at 12:15 AM Justin Mclean 
> >>>>>>>> <jus...@classsoftware.com> wrote:
> >>>>>>>> >
> >>>>>>>> > Hi,
> >>>>>>>> >
> >>>>>>>> > We open-sourced a new project, Gravitino, in December and have 
> >>>>>>>> > been working on growing the community and adding new 
> >>>>>>>> > functionality. We plan to donate the project to the ASF this year. 
> >>>>>>>> > Gravitino is a unified metadata lake solution offering a unified 
> >>>>>>>> > approach to managing datasets from diverse sources and regions 
> >>>>>>>> > across multiple cloud platforms. Its core is an Iceberg REST 
> >>>>>>>> > catalog service implementation to manage Iceberg tables 
> >>>>>>>> > efficiently.
> >>>>>>>> >
> >>>>>>>> > If this sounds like something you would be interested in, then the 
> >>>>>>>> > following resources will help:
> >>>>>>>> > -  Blog post: 
> >>>>>>>> > https://datastrato.ai/blog/gravitino-iceberg-rest-catalog-service/
> >>>>>>>> > -  Gravitino documentation: https://datastrato.ai/docs/0.3.1/
> >>>>>>>> > -  Iceberg REST service documentation: 
> >>>>>>>> > https://datastrato.ai/docs/0.3.1/iceberg-rest-service
> >>>>>>>> >
> >>>>>>>> > We welcome any feedback and suggestions you have, and as always, 
> >>>>>>>> > all contributions are welcome. You can find the source code at 
> >>>>>>>> > https://github.com/datastrato/gravitino.
> >>>>>>>> >
> >>>>>>>> > Kind Regards,
> >>>>>>>> > Justin
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Ryan Blue
> >>>>> Tabular
> >>>
> >>>
> >>>
> >>> --
> >>> Ryan Blue
> >>> Tabular

Reply via email to