Re: [DISCUSS] Incubating Proposal for Paimon

Dave Fisher Thu, 23 Feb 2023 20:10:58 -0800

An interesting proposal. Since Paimon is already part of Apache Flink does the 
podling intend to graduate as it’s own Top Level Project? Or, is the plan 
currently to become a subproject of Flink? I’m just curious. Were there any 
discussions within the Flink community about incubating Paimon?


Best Regards,
Dave

Sent from my iPhone

> On Feb 23, 2023, at 7:58 PM, Yu Li <car...@gmail.com> wrote:
> 
> Revision: the hyperlink of the first reference is incorrect and please use
> the website address directly instead of clicking it (sorry for my mistake).
> 
> For easier reference: https://github.com/apache/flink-table-store
> 
> Best Regards,
> Yu
> 
> 
>> On Fri, 24 Feb 2023 at 11:48, Yu Li <car...@gmail.com> wrote:
>> 
>> Hi All,
>> 
>> 
>> I would like to propose Paimon [1] as a new apache incubator project, and
>> you can find the proposal [2] of Paimon for more details.
>> 
>> 
>> Paimon is a unified lake storage to build dynamic tables for both stream
>> and batch processing with big data compute engines (Apache Flink, Apache
>> Spark, Apache
>> Hive, Trino, etc.), supporting high-speed data ingestion and real-time data 
>> query.
>> With the adoption of stream processing in production, there is an increasing 
>> demand for storage to simultaneously support updates, deletes and streaming 
>> reads,
>> which cannot be fully satisfied by existing lake storages. To tackle these
>> new challenges, Paimon
>> natively adopts LSM (Log-Structured Merge-tree) as its underlying data 
>> structure, and provides enhanced performance for data with primary keys
>> (besides
>> the common lake storage capabilities). What's more, Paimon supports both 
>> batch and stream operations (reads and writes), facilitating applications 
>> pursuing batch-stream-unified semantics. Specifically:
>> 
>> 
>> 1. Paimon provides excellent performance on the intensive update
>> / delete workload, leveraging the append-write feature of the LSM data
>> structure.
>> 
>> 2. Paimon utilizes the ordered feature of LSM to support effective filter
>> pushdown, and could reduce
>> the latency of queries with primary key filtering to milliseconds.
>> 
>> 3.
>> Paimon supports various (row-based or row-columnar) file formats including 
>> Apache Avro, Apache ORC and Apache Parquet (rows will be sorted by the 
>> primary key before writing out).
>> 
>> 4.
>> Tables provided by Paimon can be queried by various engines, including 
>> Apache Flink, Apache Spark, Apache Hive, Trino, etc.
>> 
>> 5.
>> Paimon's metadata is self-managed, stored on the distributed file system and 
>> can be synchronized to Hive metastore (HMS).
>> 
>> 6.
>> Besides the common batch read and write support, Paimon also supports 
>> streaming read and change data feed.
>> 
>> 
>> 
>> Paimon has been used by various users and companies, including Alibaba, 
>> Bilibili, ByteDance and so on. Paimon is also integrated into Alibaba 
>> Cloud's E-MapReduce and Realtime Compute products to provide cloud services.
>> 
>> 
>> Paimon was founded in the Flink community in 2022 with the name of "Flink 
>> Table Store”.
>> It has been developed for more than one year and produced 4 formal
>> releases. As its adoption expands to more computing engines, some of the 
>> ecology users express their concerns about the neutrality of the project. 
>> This makes us rethink the positioning of Flink Table Store, which can be an 
>> independent lake storage.
>> 
>> 
>> With adequate discussions, we have got the support from the Flink community 
>> to enter Apache incubation
>> [3] [4], with the below expectations:
>> 
>> 1.
>> Expand Paimon's ecosystem, providing independent Java APIs to support 
>> reading and writing from more big data engines such as Apache
>> Doris, Apache Hive, Apache Presto, Apache Spark, Trino, etc.
>> 
>> 2.
>> Supplement key capabilities, especially streaming reads and intensive 
>> updates/deletes,  for creating a unified and easy-to-use streaming data 
>> warehouse (lakehouse).
>> 
>> 3. Grow into a more vibrant and neutral open source community.
>> 
>> 
>> And we believe the Paimon project will provide tremendous value for the
>> community if it is introduced into the Apache incubator.
>> 
>> 
>> I will help this project as the champion and mentor the project together
>> with three other mentors (many thanks):
>> 
>> 
>> * Becket Qin (j...@apache.org)
>> 
>> * Robert Metzger (rmetz...@apache.org)
>> 
>> * Stephan Ewen (se...@apache.org)
>> 
>> 
>> Look forward to your feedback. Thanks.
>> 
>> 
>> Best Regards,
>> Yu
>> 
>> [1] https://github.com/apache/flink-table-store
>> <https://github.com/alibaba/RemoteShuffleService>
>> 
>> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal
>> 
>> [3] https://lists.apache.org/thread/2ybxfg3zrzn4l3tnq3w2w3xvkhk0f9jk
>> 
>> [4] https://lists.apache.org/thread/kn7c08cr4l0ynt551yfjqvzh5ns226r6
>> 
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [DISCUSS] Incubating Proposal for Paimon

Reply via email to