Hi Jingsong, That sounds promising! +1 from my side to continue development under flink-dynamic-storage as a Flink subproject. I think having a more in-depth interface will benefit everyone.
Best regards, Martijn On Tue, 28 Dec 2021 at 04:23, Jingsong Li <jingsongl...@gmail.com> wrote: > Hi all, > > After some experimentation, we felt no problem putting the dynamic > storage outside of flink, and it also allowed us to design the > interface in more depth. > > What do you think? If there is no problem, I am asking for PMC's help > here: we want to propose flink-dynamic-storage as a flink subproject, > and we want to build the project under apache. > > Best, > Jingsong > > > On Wed, Nov 24, 2021 at 8:10 PM Jingsong Li <jingsongl...@gmail.com> > wrote: > > > > Hi Stephan, > > > > Thanks for your reply. > > > > Data never expires automatically. > > > > If there is a need for data retention, the user can choose one of the > > following options: > > - In the SQL for querying the managed table, users filter the data by > themselves > > - Define the time partition, and users can delete the expired > > partition by themselves. (DROP PARTITION ...) > > - In the future version, we will support the "DELETE FROM" statement, > > users can delete the expired data according to the conditions. > > > > So to answer your question: > > > > > Will the VMQ send retractions so that the data will be removed from > the table (via compactions)? > > > > The current implementation is not sending retraction, which I think > > theoretically should be sent, currently the user can filter by > > subsequent conditions. > > And yes, the subscriber would not see strictly a correct result. I > > think this is something we can improve for Flink SQL. > > > > > Do we want time retention semantics handled by the compaction? > > > > Currently, no, Data never expires automatically. > > > > > Do we want to declare those types of queries "out of scope" initially? > > > > I think we want users to be able to use three options above to > > accomplish their requirements. > > > > I will update FLIP to make the definition clearer and more explicit. > > > > Best, > > Jingsong > > > > On Wed, Nov 24, 2021 at 5:01 AM Stephan Ewen <ewenstep...@gmail.com> > wrote: > > > > > > Thanks for digging into this. > > > Regarding this query: > > > > > > INSERT INTO the_table > > > SELECT window_end, COUNT(*) > > > FROM (TUMBLE(TABLE interactions, DESCRIPTOR(ts), INTERVAL '5' > MINUTES)) > > > GROUP BY window_end > > > HAVING now() - window_end <= INTERVAL '14' DAYS; > > > > > > I am not sure I understand what the conclusion is on the data > retention question, where the continuous streaming SQL query has retention > semantics. I think we would need to answer the following questions (I will > call the query that computed the managed table the "view materializer > query" - VMQ). > > > > > > (1) I guess the VMQ will send no updates for windows beyond the > "retention period" is over (14 days), as you said. That makes sense. > > > > > > (2) Will the VMQ send retractions so that the data will be removed > from the table (via compactions)? > > > - if yes, this seems semantically better for users, but it will be > expensive to keep the timers for retractions. > > > - if not, we can still solve this by adding filters to queries > against the managed table, as long as these queries are in Flink. > > > - any subscriber to the changelog stream would not see strictly a > correct result if we are not doing the retractions > > > > > > (3) Do we want time retention semantics handled by the compaction? > > > - if we say that we lazily apply the deletes in the queries that > read the managed tables, then we could also age out the old data during > compaction. > > > - that is cheap, but it might be too much of a special case to be > very relevant here. > > > > > > (4) Do we want to declare those types of queries "out of scope" > initially? > > > - if yes, how many users are we affecting? (I guess probably not > many, but would be good to hear some thoughts from others on this) > > > - should we simply reject such queries in the optimizer as "not > possible to support in managed tables"? I would suggest that, always better > to tell users exactly what works and what not, rather than letting them be > surprised in the end. Users can still remove the HAVING clause if they want > the query to run, and that would be better than if the VMQ just silently > ignores those semantics. > > > > > > Thanks, > > > Stephan > > > > > > > > > -- > > Best, Jingsong Lee > > > > -- > Best, Jingsong Lee >