Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage

Till Rohrmann Tue, 28 Dec 2021 01:15:24 -0800

Hi Jingsong,

I think that developing flink-dynamic-storage as a separate sub project is
a very good idea since it allows us to move a lot faster and decouple
releases from Flink. Hence big +1.


Do we want to name it flink-dynamic-storage or shall we use a more
descriptive name? dynamic-storage sounds a bit generic to me and I wouldn't
know that this has something to do with letting Flink manage your tables
and their storage. I don't have a very good idea but maybe we can call it
flink-managed-tables, flink-warehouse, flink-olap or so.

Cheers,
Till

On Tue, Dec 28, 2021 at 9:49 AM Martijn Visser <[email protected]>
wrote:

> Hi Jingsong,
>
> That sounds promising! +1 from my side to continue development under
> flink-dynamic-storage as a Flink subproject. I think having a more in-depth
> interface will benefit everyone.
>
> Best regards,
>
> Martijn
>
> On Tue, 28 Dec 2021 at 04:23, Jingsong Li <[email protected]> wrote:
>
>> Hi all,
>>
>> After some experimentation, we felt no problem putting the dynamic
>> storage outside of flink, and it also allowed us to design the
>> interface in more depth.
>>
>> What do you think? If there is no problem, I am asking for PMC's help
>> here: we want to propose flink-dynamic-storage as a flink subproject,
>> and we want to build the project under apache.
>>
>> Best,
>> Jingsong
>>
>>
>> On Wed, Nov 24, 2021 at 8:10 PM Jingsong Li <[email protected]>
>> wrote:
>> >
>> > Hi Stephan,
>> >
>> > Thanks for your reply.
>> >
>> > Data never expires automatically.
>> >
>> > If there is a need for data retention, the user can choose one of the
>> > following options:
>> > - In the SQL for querying the managed table, users filter the data by
>> themselves
>> > - Define the time partition, and users can delete the expired
>> > partition by themselves. (DROP PARTITION ...)
>> > - In the future version, we will support the "DELETE FROM" statement,
>> > users can delete the expired data according to the conditions.
>> >
>> > So to answer your question:
>> >
>> > > Will the VMQ send retractions so that the data will be removed from
>> the table (via compactions)?
>> >
>> > The current implementation is not sending retraction, which I think
>> > theoretically should be sent, currently the user can filter by
>> > subsequent conditions.
>> > And yes, the subscriber would not see strictly a correct result. I
>> > think this is something we can improve for Flink SQL.
>> >
>> > > Do we want time retention semantics handled by the compaction?
>> >
>> > Currently, no, Data never expires automatically.
>> >
>> > > Do we want to declare those types of queries "out of scope" initially?
>> >
>> > I think we want users to be able to use three options above to
>> > accomplish their requirements.
>> >
>> > I will update FLIP to make the definition clearer and more explicit.
>> >
>> > Best,
>> > Jingsong
>> >
>> > On Wed, Nov 24, 2021 at 5:01 AM Stephan Ewen <[email protected]>
>> wrote:
>> > >
>> > > Thanks for digging into this.
>> > > Regarding this query:
>> > >
>> > > INSERT INTO the_table
>> > >   SELECT window_end, COUNT(*)
>> > >     FROM (TUMBLE(TABLE interactions, DESCRIPTOR(ts), INTERVAL '5'
>> MINUTES))
>> > > GROUP BY window_end
>> > >   HAVING now() - window_end <= INTERVAL '14' DAYS;
>> > >
>> > > I am not sure I understand what the conclusion is on the data
>> retention question, where the continuous streaming SQL query has retention
>> semantics. I think we would need to answer the following questions (I will
>> call the query that computed the managed table the "view materializer
>> query" - VMQ).
>> > >
>> > > (1) I guess the VMQ will send no updates for windows beyond the
>> "retention period" is over (14 days), as you said. That makes sense.
>> > >
>> > > (2) Will the VMQ send retractions so that the data will be removed
>> from the table (via compactions)?
>> > >   - if yes, this seems semantically better for users, but it will be
>> expensive to keep the timers for retractions.
>> > >   - if not, we can still solve this by adding filters to queries
>> against the managed table, as long as these queries are in Flink.
>> > >   - any subscriber to the changelog stream would not see strictly a
>> correct result if we are not doing the retractions
>> > >
>> > > (3) Do we want time retention semantics handled by the compaction?
>> > >   - if we say that we lazily apply the deletes in the queries that
>> read the managed tables, then we could also age out the old data during
>> compaction.
>> > >   - that is cheap, but it might be too much of a special case to be
>> very relevant here.
>> > >
>> > > (4) Do we want to declare those types of queries "out of scope"
>> initially?
>> > >   - if yes, how many users are we affecting? (I guess probably not
>> many, but would be good to hear some thoughts from others on this)
>> > >   - should we simply reject such queries in the optimizer as "not
>> possible to support in managed tables"? I would suggest that, always better
>> to tell users exactly what works and what not, rather than letting them be
>> surprised in the end. Users can still remove the HAVING clause if they want
>> the query to run, and that would be better than if the VMQ just silently
>> ignores those semantics.
>> > >
>> > > Thanks,
>> > > Stephan
>> > >
>> >
>> >
>> > --
>> > Best, Jingsong Lee
>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>

Re: [DISCUSS] FLIP-188: Introduce Built-in Dynamic Table Storage

Reply via email to