As members of the Amoro project, our team is thrilled to see the growing
attention towards Amoro.

We are excited about Polaris becoming open source, as it opens up greater
possibilities for future collaboration with the Amoro community.

Amoro focuses on data lake formats and aims to provide optimization
services and enhancements for the lake. Our primary goal is to offer
optimization services that support multiple table formats (though
currently, Iceberg is the most supported), such as small file optimization,
Z-order sorting optimization, and future index optimization.

Amoro provides both Internal Catalog and External Catalog methods to
optimize lake tables. To gather optimization information, we have conducted
some catalog management work.

I often hear people comparing Gravitino and Polaris as potential
competitors to Amoro, which I think is a misconception (I noticed that some
previous discussions about Amoro's positioning seemed unclear, so I wanted
to clarify this).

While there might be some overlap between Amoro, Gravitino, and Polaris:

- Gravitino focuses on unified metadata management across various areas,
including Kafka and AI, not just on data lakes.
- Polaris is an interoperable, open-source catalog for Apache Iceberg.

If there are any errors, please correct them.

Amoro plans to support both Polaris and Gravitino in the future.
Additionally, the Amoro community will continue to engage with the
Gravitino and Polaris communities to foster more collaborative efforts in
lake optimization.

[1] Amoro docs: https://amoro.apache.org/docs/latest/
[2] Gravitino docs:  https://datastrato.ai/docs/0.5.1/
[3] Polaris docs: https://polaris.io/

Jack Ye <yezhao...@gmail.com> 于2024年7月31日周三 13:22写道:

> > What's the difference between this project and Amoro
>
> Here is my $0.01, please correct me if I am wrong, especially for people
> working on Amoro and Gravitino.
>
> I think Apache Amoro is focused more on being a self-contained complete
> data lakehouse management and ingestion system. It is a complete solution
> with its own connectors in engines like Spark [1], and customized
> mixed-format integrations in engines like Trino [2]. Polaris is mostly
> focused on the data catalog aspect of a data lakehouse, and offers an open
> source vendor-neutral Iceberg catalog with additional governance support.
> By integrating with the Iceberg REST catalog interface, the intention is
> for it to leverage Iceberg for all the engine integrations to begin with.
> Similarly, any table management or ingestion system that works with Iceberg
> REST API will be able to be plugged in to directly work with Polaris. So
> you could imagine it could be possible for an Iceberg table to be ingested
> and managed by Amoro, but cataloged using Polaris.
>
> This does make Polaris more similar to Apache Gravitino. However, I think
> the key difference between them is that the emphasis of Gravitino is more
> breath-first on aspects like multi-format, multi-catalog, multi-datasource,
> different data catalog objects in AI [3], etc. It exposes different sets of
> APIs for different purposes, with Iceberg REST API being a part of it for
> the Iceberg tables, and other APIs for other data sources [4]. Polaris is
> more depth-first on Iceberg at this moment. Our future plan does say that
> it could extend to non-Iceberg data lakes, and there could be some overlap
> at that time. But even then, there could be different ways to achieve such
> support. For example, we could surface Hive Parquet tables as Iceberg
> tables, if the Iceberg REST catalog standard can be updated to accommodate
> that. There could also be potential collaborations between Polaris and
> Gravitino to achieve the goal together, and I am personally pretty excited
> about that opportunity.
>
> Best,
> Jack Ye
>
> [1] https://amoro.apache.org/docs/latest/spark-configuration/
> [2] https://amoro.apache.org/docs/latest/trino/#mixed-format
> [3]
>
> https://github.com/apache/gravitino-site/blob/10a967f18730c28018e064f3ee1ddd3cc32aa506/src/components/HomepageFeatures/index.tsx#L74
> [4] https://github.com/apache/gravitino/tree/main/catalogs
>
> On Tue, Jul 30, 2024 at 10:06 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > Hi Manu
> >
> > Thanks for the details !
> > I agree with you. As mentor on Gravitino, I would be more than happy
> > to connect the two podlings.
> >
> > Regards
> > JB
> >
> > On Wed, Jul 31, 2024 at 7:00 AM Manu Zhang <owenzhang1...@gmail.com>
> > wrote:
> > >
> > > AFAIK, Amoro is a management system with optimization service, catalog
> > > service, etc. It has a built-in catalog but can also work with other
> > > catalogs like Polaris.
> > > I think Polaris is more comparable to Gravitino which entered the
> > incubator
> > > recently. It would be interesting to see how these two communities can
> > > collaborate.
> > >
> > > Regards,
> > > Manu
> > >
> > >
> > > On Wed, Jul 31, 2024 at 12:36 PM Jean-Baptiste Onofré <j...@nanthrax.net
> >
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > The proposal is more generic: today it's Apache Iceberg, but after
> the
> > > > discussions with the initial community we agreed it could make sense
> > > > to address other use cases.
> > > >
> > > > I don't know Amoro in details, but I am happy to bridge the
> > > > communities to work together.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > > On Wed, Jul 31, 2024 at 5:16 AM Xuanwo <xua...@apache.org> wrote:
> > > > >
> > > > > Hi, JB
> > > > >
> > > > > Thank you for starting this thread; it's great to see an increasing
> > > > number of projects being developed around Iceberg.
> > > > >
> > > > > I have two questions:
> > > > >
> > > > > - The polaris github repo said it's "an open source catalog for
> > Apache
> > > > Iceberg", but the proposal changed into "a catalog for data lakes".
> > Does it
> > > > mean Polaris's scope has been changed?
> > > > > - What's the difference between this project and Amoro:
> > > > https://github.com/apache/amoro? How do these two communities
> > collaborate?
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jul 31, 2024, at 04:19, Dave Fisher wrote:
> > > > > >> On Jul 30, 2024, at 11:34 AM, Jean-Baptiste Onofré <
> > j...@nanthrax.net>
> > > > wrote:
> > > > > >>
> > > > > >> Hi Dave,
> > > > > >>
> > > > > >> That's a good question. The main reason is because we wanted
> > people
> > > > > >> with Apache experience in the PPMC to mentor the committers and
> > > > > >> contributors heading to PPMC as well.
> > > > > >> Also, the initial committers worked closely with PPMC guidance
> > > > > >> (explaining the ICLA, good practice, etc).
> > > > > >> So, we wanted to have PPMC acting more as mentor (both
> > technically but
> > > > > >> also with their Apache experience) with committers.
> > > > > >
> > > > > > That makes sense. Are any of the proposed PPMC members also ASF
> > Members
> > > > > > and/or potentially future Mentors?
> > > > > >
> > > > > >> If it's problematic, we can start only with the PPMC group and
> > invite
> > > > > >> new committers/PPMC members during incubation period.
> > > > > >
> > > > > > No problem. It will actually provide the Mentors and later the
> IPMC
> > > > > > additional data to see if the PPMC is properly growing the PPMC
> and
> > > > > > Committer base.
> > > > > >
> > > > > > Best,
> > > > > > Dave
> > > > > >
> > > > > >>
> > > > > >> Regards
> > > > > >> JB
> > > > > >>
> > > > > >> On Tue, Jul 30, 2024 at 8:19 PM Dave Fisher <w...@apache.org>
> > wrote:
> > > > > >>>
> > > > > >>> Hi JB,
> > > > > >>>
> > > > > >>> An interesting project that looks pretty mature.
> > > > > >>>
> > > > > >>> I’m curious about the split between Initial PPMC and initial
> > > > Committer. In the usual case a new podling will have all of the
> Initial
> > > > Committers on the PPMC. Can you tell us why this is not the case with
> > > > Polaris?
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Dave
> > > > > >>>
> > > > > >>>> On Jul 30, 2024, at 10:33 AM, Jean-Baptiste Onofré <
> > j...@nanthrax.net>
> > > > wrote:
> > > > > >>>>
> > > > > >>>> Hi folks,
> > > > > >>>>
> > > > > >>>> We would like to propose a new project to the ASF incubator:
> > > > Polaris.
> > > > > >>>>
> > > > > >>>> Polaris is a catalog for data lakes. It provides new levels of
> > > > choice,
> > > > > >>>> flexibility and control over data, with full enterprise
> > security and
> > > > > >>>> Apache Iceberg interoperability across a multitude of engines
> > and
> > > > > >>>> infrastructure. Polaris builds on standards such as those
> > created by
> > > > > >>>> Apache Iceberg, providing the following benefits for the
> > ecosystem:
> > > > > >>>> * Multi-engine interoperability over a single copy of data,
> > > > > >>>> eliminating the need for moving and copying data across
> > different
> > > > > >>>> engines and catalogs.
> > > > > >>>> * An interoperable security model providing a unified
> > authorization
> > > > > >>>> layer independent from the engines processing analytical
> tables.
> > > > > >>>> * For multi-catalog scenarios, a unified catalog level view of
> > data
> > > > > >>>> across multiple catalogs via catalog notification
> integrations.
> > > > > >>>> * The ability to host Polaris Catalog on the infrastructure of
> > your
> > > > choice.
> > > > > >>>>
> > > > > >>>> Here is the proposal:
> > > > > >>>>
> > > >
> https://cwiki.apache.org/confluence/display/INCUBATOR/PolarisProposal
> > > > > >>>>
> > > > > >>>> Comments and feedback are welcome.
> > > > > >>>>
> > > > > >>>> Thanks!
> > > > > >>>> Regards
> > > > > >>>> JB
> > > > > >>>>
> > > > > >>>>
> > > > ---------------------------------------------------------------------
> > > > > >>>> To unsubscribe, e-mail:
> > general-unsubscr...@incubator.apache.org
> > > > > >>>> For additional commands, e-mail:
> > general-h...@incubator.apache.org
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > ---------------------------------------------------------------------
> > > > > >>> To unsubscribe, e-mail:
> general-unsubscr...@incubator.apache.org
> > > > > >>> For additional commands, e-mail:
> > general-h...@incubator.apache.org
> > > > > >>>
> > > > > >>
> > > > > >>
> > ---------------------------------------------------------------------
> > > > > >> To unsubscribe, e-mail:
> general-unsubscr...@incubator.apache.org
> > > > > >> For additional commands, e-mail:
> > general-h...@incubator.apache.org
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > > > For additional commands, e-mail:
> general-h...@incubator.apache.org
> > > > >
> > > > > --
> > > > > Xuanwo
> > > > >
> > > > > https://xuanwo.io/
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


-- 
Best

ConradJam

Reply via email to