Very interesting.

I would like to join as a mentor, if needed.

Atri

On Fri, Feb 24, 2023 at 9:28 AM Yu Li <car...@gmail.com> wrote:
>
> Revision: the hyperlink of the first reference is incorrect and please use
> the website address directly instead of clicking it (sorry for my mistake).
>
> For easier reference: https://github.com/apache/flink-table-store
>
> Best Regards,
> Yu
>
>
> On Fri, 24 Feb 2023 at 11:48, Yu Li <car...@gmail.com> wrote:
>
> > Hi All,
> >
> >
> > I would like to propose Paimon [1] as a new apache incubator project, and
> > you can find the proposal [2] of Paimon for more details.
> >
> >
> > Paimon is a unified lake storage to build dynamic tables for both stream
> > and batch processing with big data compute engines (Apache Flink, Apache
> > Spark, Apache
> > Hive, Trino, etc.), supporting high-speed data ingestion and real-time data 
> > query.
> > With the adoption of stream processing in production, there is an 
> > increasing demand for storage to simultaneously support updates, deletes 
> > and streaming reads,
> > which cannot be fully satisfied by existing lake storages. To tackle these
> > new challenges, Paimon
> > natively adopts LSM (Log-Structured Merge-tree) as its underlying data 
> > structure, and provides enhanced performance for data with primary keys
> > (besides
> > the common lake storage capabilities). What's more, Paimon supports both 
> > batch and stream operations (reads and writes), facilitating applications 
> > pursuing batch-stream-unified semantics. Specifically:
> >
> >
> > 1. Paimon provides excellent performance on the intensive update
> > / delete workload, leveraging the append-write feature of the LSM data
> > structure.
> >
> > 2. Paimon utilizes the ordered feature of LSM to support effective filter
> > pushdown, and could reduce
> > the latency of queries with primary key filtering to milliseconds.
> >
> > 3.
> > Paimon supports various (row-based or row-columnar) file formats including 
> > Apache Avro, Apache ORC and Apache Parquet (rows will be sorted by the 
> > primary key before writing out).
> >
> > 4.
> > Tables provided by Paimon can be queried by various engines, including 
> > Apache Flink, Apache Spark, Apache Hive, Trino, etc.
> >
> > 5.
> > Paimon's metadata is self-managed, stored on the distributed file system 
> > and can be synchronized to Hive metastore (HMS).
> >
> > 6.
> > Besides the common batch read and write support, Paimon also supports 
> > streaming read and change data feed.
> >
> >
> >
> > Paimon has been used by various users and companies, including Alibaba, 
> > Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba 
> > Cloud's E-MapReduce and Realtime Compute products to provide cloud services.
> >
> >
> > Paimon was founded in the Flink community in 2022 with the name of "Flink 
> > Table Storeā€.
> > It has been developed for more than one year and produced 4 formal
> > releases. As its adoption expands to more computing engines, some of the 
> > ecology users express their concerns about the neutrality of the project. 
> > This makes us rethink the positioning of Flink Table Store, which can be an 
> > independent lake storage.
> >
> >
> > With adequate discussions, we have got the support from the Flink community 
> > to enter Apache incubation
> > [3] [4], with the below expectations:
> >
> > 1.
> > Expand Paimon's ecosystem, providing independent Java APIs to support 
> > reading and writing from more big data engines such as Apache
> > Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
> >
> > 2.
> > Supplement key capabilities, especially streaming reads and intensive 
> > updates/deletes,  for creating a unified and easy-to-use streaming data 
> > warehouse (lakehouse).
> >
> > 3. Grow into a more vibrant and neutral open source community.
> >
> >
> > And we believe the Paimon project will provide tremendous value for the
> > community if it is introduced into the Apache incubator.
> >
> >
> > I will help this project as the champion and mentor the project together
> > with three other mentors (many thanks):
> >
> >
> > * Becket Qin (j...@apache.org)
> >
> > * Robert Metzger (rmetz...@apache.org)
> >
> > * Stephan Ewen (se...@apache.org)
> >
> >
> > Look forward to your feedback. Thanks.
> >
> >
> > Best Regards,
> > Yu
> >
> > [1] https://github.com/apache/flink-table-store
> > <https://github.com/alibaba/RemoteShuffleService>
> >
> > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
> >
> > [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
> >
> > [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
> >
> >
> >

-- 
Regards,

Atri
Apache Concerted

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to