An interesting proposal. Since Paimon is already part of Apache Flink does the podling intend to graduate as it’s own Top Level Project? Or, is the plan currently to become a subproject of Flink? I’m just curious. Were there any discussions within the Flink community about incubating Paimon?
Best Regards, Dave Sent from my iPhone > On Feb 23, 2023, at 7:58 PM, Yu Li <car...@gmail.com> wrote: > > Revision: the hyperlink of the first reference is incorrect and please use > the website address directly instead of clicking it (sorry for my mistake). > > For easier reference: https://github.com/apache/flink-table-store > > Best Regards, > Yu > > >> On Fri, 24 Feb 2023 at 11:48, Yu Li <car...@gmail.com> wrote: >> >> Hi All, >> >> >> I would like to propose Paimon [1] as a new apache incubator project, and >> you can find the proposal [2] of Paimon for more details. >> >> >> Paimon is a unified lake storage to build dynamic tables for both stream >> and batch processing with big data compute engines (Apache Flink, Apache >> Spark, Apache >> Hive, Trino, etc.), supporting high-speed data ingestion and real-time data >> query. >> With the adoption of stream processing in production, there is an increasing >> demand for storage to simultaneously support updates, deletes and streaming >> reads, >> which cannot be fully satisfied by existing lake storages. To tackle these >> new challenges, Paimon >> natively adopts LSM (Log-Structured Merge-tree) as its underlying data >> structure, and provides enhanced performance for data with primary keys >> (besides >> the common lake storage capabilities). What's more, Paimon supports both >> batch and stream operations (reads and writes), facilitating applications >> pursuing batch-stream-unified semantics. Specifically: >> >> >> 1. Paimon provides excellent performance on the intensive update >> / delete workload, leveraging the append-write feature of the LSM data >> structure. >> >> 2. Paimon utilizes the ordered feature of LSM to support effective filter >> pushdown, and could reduce >> the latency of queries with primary key filtering to milliseconds. >> >> 3. >> Paimon supports various (row-based or row-columnar) file formats including >> Apache Avro, Apache ORC and Apache Parquet (rows will be sorted by the >> primary key before writing out). >> >> 4. >> Tables provided by Paimon can be queried by various engines, including >> Apache Flink, Apache Spark, Apache Hive, Trino, etc. >> >> 5. >> Paimon's metadata is self-managed, stored on the distributed file system and >> can be synchronized to Hive metastore (HMS). >> >> 6. >> Besides the common batch read and write support, Paimon also supports >> streaming read and change data feed. >> >> >> >> Paimon has been used by various users and companies, including Alibaba, >> Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba >> Cloud's E-MapReduce and Realtime Compute products to provide cloud services. >> >> >> Paimon was founded in the Flink community in 2022 with the name of "Flink >> Table Store”. >> It has been developed for more than one year and produced 4 formal >> releases. As its adoption expands to more computing engines, some of the >> ecology users express their concerns about the neutrality of the project. >> This makes us rethink the positioning of Flink Table Store, which can be an >> independent lake storage. >> >> >> With adequate discussions, we have got the support from the Flink community >> to enter Apache incubation >> [3] [4], with the below expectations: >> >> 1. >> Expand Paimon's ecosystem, providing independent Java APIs to support >> reading and writing from more big data engines such as Apache >> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc. >> >> 2. >> Supplement key capabilities, especially streaming reads and intensive >> updates/deletes, for creating a unified and easy-to-use streaming data >> warehouse (lakehouse). >> >> 3. Grow into a more vibrant and neutral open source community. >> >> >> And we believe the Paimon project will provide tremendous value for the >> community if it is introduced into the Apache incubator. >> >> >> I will help this project as the champion and mentor the project together >> with three other mentors (many thanks): >> >> >> * Becket Qin (j...@apache.org) >> >> * Robert Metzger (rmetz...@apache.org) >> >> * Stephan Ewen (se...@apache.org) >> >> >> Look forward to your feedback. Thanks. >> >> >> Best Regards, >> Yu >> >> [1] https://github.com/apache/flink-table-store >> <https://github.com/alibaba/RemoteShuffleService> >> >> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal >> >> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk >> >> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6 >> >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org