Dear Flink Community, We want to process user data, including historical events, stored as JSON arrays in RocksDB to keep the data localized during scaling in Flink (1.17.2). When a new event arrives, our goal is to:
1. Fetch historical data from RocksDB. 2. Run a SQL query on the data (we are still exploring ways to achieve this, as we need SQL to generalize our logic). 3. If the data exists, append the new event; otherwise, insert it. We aim to process each event independently without retaining state. However, querying large historical datasets (thousands of events) and handling temporary tables raises concerns about scalability. Could you provide advice on how to efficiently handle SQL processing in this scenario? Is there a way to perform GROUP BY or similar queries without retaining state, and will this approach scale well for larger datasets? Any guidance would be greatly appreciated!