[Help Needed] Efficient Handling of Large Historical Data in RocksDB (Flink 1.17.2)

Owais Ansari Sun, 15 Sep 2024 21:06:03 -0700

Dear Flink Community,

We want to process user data, including historical events, stored as JSON
arrays in RocksDB to keep the data localized during scaling in Flink
(1.17.2). When a new event arrives, our goal is to:


   1. Fetch historical data from RocksDB.
   2. Run a SQL query on the data (we are still exploring ways to achieve
   this, as we need SQL to generalize our logic).
   3. If the data exists, append the new event; otherwise, insert it.

We aim to process each event independently without retaining state.
However, querying large historical datasets (thousands of events) and
handling temporary tables raises concerns about scalability.

Could you provide advice on how to efficiently handle SQL processing in
this scenario? Is there a way to perform GROUP BY or similar queries
without retaining state, and will this approach scale well for larger
datasets?

Any guidance would be greatly appreciated!

[Help Needed] Efficient Handling of Large Historical Data in RocksDB (Flink 1.17.2)

Reply via email to