Very interesting. I would like to join as a mentor, if needed.
Atri On Fri, Feb 24, 2023 at 9:28 AM Yu Li <car...@gmail.com> wrote: > > Revision: the hyperlink of the first reference is incorrect and please use > the website address directly instead of clicking it (sorry for my mistake). > > For easier reference: https://github.com/apache/flink-table-store > > Best Regards, > Yu > > > On Fri, 24 Feb 2023 at 11:48, Yu Li <car...@gmail.com> wrote: > > > Hi All, > > > > > > I would like to propose Paimon [1] as a new apache incubator project, and > > you can find the proposal [2] of Paimon for more details. > > > > > > Paimon is a unified lake storage to build dynamic tables for both stream > > and batch processing with big data compute engines (Apache Flink, Apache > > Spark, Apache > > Hive, Trino, etc.), supporting high-speed data ingestion and real-time data > > query. > > With the adoption of stream processing in production, there is an > > increasing demand for storage to simultaneously support updates, deletes > > and streaming reads, > > which cannot be fully satisfied by existing lake storages. To tackle these > > new challenges, Paimon > > natively adopts LSM (Log-Structured Merge-tree) as its underlying data > > structure, and provides enhanced performance for data with primary keys > > (besides > > the common lake storage capabilities). What's more, Paimon supports both > > batch and stream operations (reads and writes), facilitating applications > > pursuing batch-stream-unified semantics. Specifically: > > > > > > 1. Paimon provides excellent performance on the intensive update > > / delete workload, leveraging the append-write feature of the LSM data > > structure. > > > > 2. Paimon utilizes the ordered feature of LSM to support effective filter > > pushdown, and could reduce > > the latency of queries with primary key filtering to milliseconds. > > > > 3. > > Paimon supports various (row-based or row-columnar) file formats including > > Apache Avro, Apache ORC and Apache Parquet (rows will be sorted by the > > primary key before writing out). > > > > 4. > > Tables provided by Paimon can be queried by various engines, including > > Apache Flink, Apache Spark, Apache Hive, Trino, etc. > > > > 5. > > Paimon's metadata is self-managed, stored on the distributed file system > > and can be synchronized to Hive metastore (HMS). > > > > 6. > > Besides the common batch read and write support, Paimon also supports > > streaming read and change data feed. > > > > > > > > Paimon has been used by various users and companies, including Alibaba, > > Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba > > Cloud's E-MapReduce and Realtime Compute products to provide cloud services. > > > > > > Paimon was founded in the Flink community in 2022 with the name of "Flink > > Table Storeā. > > It has been developed for more than one year and produced 4 formal > > releases. As its adoption expands to more computing engines, some of the > > ecology users express their concerns about the neutrality of the project. > > This makes us rethink the positioning of Flink Table Store, which can be an > > independent lake storage. > > > > > > With adequate discussions, we have got the support from the Flink community > > to enter Apache incubation > > [3] [4], with the below expectations: > > > > 1. > > Expand Paimon's ecosystem, providing independent Java APIs to support > > reading and writing from more big data engines such as Apache > > Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc. > > > > 2. > > Supplement key capabilities, especially streaming reads and intensive > > updates/deletes, for creating a unified and easy-to-use streaming data > > warehouse (lakehouse). > > > > 3. Grow into a more vibrant and neutral open source community. > > > > > > And we believe the Paimon project will provide tremendous value for the > > community if it is introduced into the Apache incubator. > > > > > > I will help this project as the champion and mentor the project together > > with three other mentors (many thanks): > > > > > > * Becket Qin (j...@apache.org) > > > > * Robert Metzger (rmetz...@apache.org) > > > > * Stephan Ewen (se...@apache.org) > > > > > > Look forward to your feedback. Thanks. > > > > > > Best Regards, > > Yu > > > > [1] https://github.com/apache/flink-table-store > > <https://github.com/alibaba/RemoteShuffleService> > > > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal > > > > [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk > > > > [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6 > > > > > > -- Regards, Atri Apache Concerted --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org