Re: [External] [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

godfrey he Mon, 03 Apr 2023 00:17:07 -0700

Hi Jane,

Thanks for driving this FLIP.


I think the compiled plan solution and the hint solution do not
conflict, the two can exist at the same time.
The compiled plan solution can address the need of advanced users and
the platform users
which all stateful operators' state TTL can be defined by user. While
the hint solution can address some
 specific simple scenarios, which is very user-friendly, convenient,
and unambiguous to use.

Some stateful operators are not compiled from SQL directly, such as
ChangelogNormalize and
SinkUpsertMaterializer mentioned above,  I notice the the example given by Yisha
has hints propagation problem which does not conform to the current design.
The rough idea about the hint solution should be simple (only the
common operators are supported)
and easy to understand (no hints propagation).

If the hint solution is supported, a compiled plan which is from a
query with state TTL hints
 can also be further modified for the state TTL parts.

So, I prefer the hint solution to be discuss in a separate FLIP.  I
think that FLIP maybe
need a lot discussion.

Best,
Godfrey

周伊莎 <[email protected]> 于2023年3月30日周四 22:04写道：
>
> Hi Jane,
>
> Thanks for your detailed response.
>
> You mentioned that there are 10k+ SQL jobs in your production
> > environment, but only ~100 jobs' migration involves plan editing. Is 10k+
> > the number of total jobs, or the number of jobs that use stateful
> > computation and need state migration?
> >
>
> 10k is the number of SQL jobs that enable periodic checkpoint. And
> surely if users change their sql which result in changes of the plan, they
> need to do state migration.
>
> - You mentioned that "A truth that can not be ignored is that users
> > usually tend to give up editing TTL(or operator ID in our case) instead of
> > migrating this configuration between their versions of one given job." So
> > what would users prefer to do if they're reluctant to edit the operator
> > ID? Would they submit the same SQL as a new job with a higher version to
> > re-accumulating the state from the earliest offset?
>
>
> You're exactly right. People will tend to re-accumulate the state from a
> given offset by changing the namespace of their checkpoint.
> Namespace is an internal concept and restarting the sql job in a new
> namespace can be simply understood as submitting a new job.
>
> Back to your suggestions, I noticed that FLIP-190 [3] proposed the
> > following syntax to perform plan migration
>
>
> The 'plan migration'  I said in my last reply may be inaccurate.  It's more
> like 'query evolution'. In other word, if a user submitted a sql job with a
> configured compiled plan, and then
> he changes the sql,  the compiled plan changes too, how to move the
> configuration in the old plan to the new plan.
> IIUC, FLIP-190 aims to solve issues in flink version upgrades and leave out
> the 'query evolution' which is a fundamental change to the query. E.g.
> adding a filter condition, a different aggregation.
> And I'm really looking forward to a solution for query evolution.
>
> And I'm also curious about how to use the hint
> > approach to cover cases like
> >
> > - configuring TTL for operators like ChangelogNormalize,
> > SinkUpsertMaterializer, etc., these operators are derived by the planner
> > implicitly
> > - cope with two/multiple input stream operator's state TTL, like join,
> > and other operations like row_number, rank, correlate, etc.
>
>
>  Actually, in our company , we make operators in the query block where the
> hint locates all affected by that hint. For example,
>
> INSERT INTO sink
> > SELECT /*+ STATE_TTL('1D') */
> >    id,
> >    name,
> >    num
> > FROM (
> >    SELECT
> >        *,
> >        ROW_NUMBER() OVER (PARTITION BY id ORDER BY num DESC) as row_num
> >    FROM (
> >        SELECT
> >            *
> >        FROM (
> >            SELECT
> >                id,
> >                name,
> >                max(num) as num
> >            FROM source1
> >            GROUP BY
> >                id, name, TUMBLE(proc, INTERVAL '1' MINUTE)
> >        )
> >        GROUP BY
> >            id, name, num
> >    )
> > )
> > WHERE row_num = 1
> >
>
> In the SQL above, the state TTL of Rank and Agg will be all configured as 1
> day.  If users want to set different TTL for Rank and Agg, they can just
> make these two queries located in two different query blocks.
> It looks quite rough but straightforward enough.  For each side of join
> operator, one of my users proposed a syntax like below:
>
> > /*+ 
> > JOIN_TTL('tables'='left_talbe,right_table','left_ttl'='100000','right_ttl'='10000')
> >  */
> >
> > We haven't accepted this proposal now, maybe we could find some better
> design for this kind of case. Just for your information.
>
> I think if we want to utilize hints to support fine-grained configuration,
> we can open a new FLIP to discuss it.
> BTW, personally, I'm interested in how to design a graphical interface to
> help users to maintain their custom fine-grained configuration between
> their job versions.
>
> Best regards,
> Yisha

Re: [External] [DISCUSS] FLIP-292: Support configuring state TTL at operator level for Table API & SQL programs

Reply via email to