Hi Jingsong,

That sounds promising! +1 from my side to continue development under
flink-dynamic-storage as a Flink subproject. I think having a more in-depth
interface will benefit everyone.

Best regards,

Martijn

On Tue, 28 Dec 2021 at 04:23, Jingsong Li <jingsongl...@gmail.com> wrote:

> Hi all,
>
> After some experimentation, we felt no problem putting the dynamic
> storage outside of flink, and it also allowed us to design the
> interface in more depth.
>
> What do you think? If there is no problem, I am asking for PMC's help
> here: we want to propose flink-dynamic-storage as a flink subproject,
> and we want to build the project under apache.
>
> Best,
> Jingsong
>
>
> On Wed, Nov 24, 2021 at 8:10 PM Jingsong Li <jingsongl...@gmail.com>
> wrote:
> >
> > Hi Stephan,
> >
> > Thanks for your reply.
> >
> > Data never expires automatically.
> >
> > If there is a need for data retention, the user can choose one of the
> > following options:
> > - In the SQL for querying the managed table, users filter the data by
> themselves
> > - Define the time partition, and users can delete the expired
> > partition by themselves. (DROP PARTITION ...)
> > - In the future version, we will support the "DELETE FROM" statement,
> > users can delete the expired data according to the conditions.
> >
> > So to answer your question:
> >
> > > Will the VMQ send retractions so that the data will be removed from
> the table (via compactions)?
> >
> > The current implementation is not sending retraction, which I think
> > theoretically should be sent, currently the user can filter by
> > subsequent conditions.
> > And yes, the subscriber would not see strictly a correct result. I
> > think this is something we can improve for Flink SQL.
> >
> > > Do we want time retention semantics handled by the compaction?
> >
> > Currently, no, Data never expires automatically.
> >
> > > Do we want to declare those types of queries "out of scope" initially?
> >
> > I think we want users to be able to use three options above to
> > accomplish their requirements.
> >
> > I will update FLIP to make the definition clearer and more explicit.
> >
> > Best,
> > Jingsong
> >
> > On Wed, Nov 24, 2021 at 5:01 AM Stephan Ewen <ewenstep...@gmail.com>
> wrote:
> > >
> > > Thanks for digging into this.
> > > Regarding this query:
> > >
> > > INSERT INTO the_table
> > >   SELECT window_end, COUNT(*)
> > >     FROM (TUMBLE(TABLE interactions, DESCRIPTOR(ts), INTERVAL '5'
> MINUTES))
> > > GROUP BY window_end
> > >   HAVING now() - window_end <= INTERVAL '14' DAYS;
> > >
> > > I am not sure I understand what the conclusion is on the data
> retention question, where the continuous streaming SQL query has retention
> semantics. I think we would need to answer the following questions (I will
> call the query that computed the managed table the "view materializer
> query" - VMQ).
> > >
> > > (1) I guess the VMQ will send no updates for windows beyond the
> "retention period" is over (14 days), as you said. That makes sense.
> > >
> > > (2) Will the VMQ send retractions so that the data will be removed
> from the table (via compactions)?
> > >   - if yes, this seems semantically better for users, but it will be
> expensive to keep the timers for retractions.
> > >   - if not, we can still solve this by adding filters to queries
> against the managed table, as long as these queries are in Flink.
> > >   - any subscriber to the changelog stream would not see strictly a
> correct result if we are not doing the retractions
> > >
> > > (3) Do we want time retention semantics handled by the compaction?
> > >   - if we say that we lazily apply the deletes in the queries that
> read the managed tables, then we could also age out the old data during
> compaction.
> > >   - that is cheap, but it might be too much of a special case to be
> very relevant here.
> > >
> > > (4) Do we want to declare those types of queries "out of scope"
> initially?
> > >   - if yes, how many users are we affecting? (I guess probably not
> many, but would be good to hear some thoughts from others on this)
> > >   - should we simply reject such queries in the optimizer as "not
> possible to support in managed tables"? I would suggest that, always better
> to tell users exactly what works and what not, rather than letting them be
> surprised in the end. Users can still remove the HAVING clause if they want
> the query to run, and that would be better than if the VMQ just silently
> ignores those semantics.
> > >
> > > Thanks,
> > > Stephan
> > >
> >
> >
> > --
> > Best, Jingsong Lee
>
>
>
> --
> Best, Jingsong Lee
>

Reply via email to