Thanks so much, Benchao! Hi, everyone, I started a vote [1], and if you have any questions, please don't hesitate to join this discussion.
[1] https://lists.apache.org/thread/837r63gdwzoqryvp3gbf67941g706s5d Best, Jane On Mon, Oct 23, 2023 at 10:39 AM Benchao Li <libenc...@apache.org> wrote: > Thanks Jane for driving this exciting feature, looking forward to the vote! > > Jane Chan <qingyue....@gmail.com> 于2023年10月23日周一 09:40写道: > > > > Hi, devs, > > > > Thanks for all the feedback. > > > > Based on the discussion [1], we seem to have a consensus, so I would like > > to start a vote. If you have any questions or concerns, please feel free > to > > follow up on this discussion. > > > > [1] https://lists.apache.org/thread/3s69dhv3rp4s0kysnslqbvyqo3qf7zq5 > > > > > > Best regards, > > Jane > > > > On Fri, Oct 13, 2023 at 5:36 PM Zakelly Lan <zakelly....@gmail.com> > wrote: > > > > > Hi Jane, > > > > > > Actually, the first example that comes to mind is a simple group > > > aggregation, such as "SELECT /*+ STATE_TTL('1d') */ userid, SUM(price) > > > FROM orders GROUP BY userid;". However, as you mentioned, omitting the > > > key can introduce a lot of ambiguity, especially in nested query > > > blocks. I agree that we should not support this syntax. > > > > > > > > > Thanks a lot for clarifying this for me. > > > > > > > > > Best, > > > Zakelly > > > > > > > > > On Fri, Oct 13, 2023 at 4:59 PM Jane Chan <qingyue....@gmail.com> > wrote: > > > > > > > > Hi Zakelly, > > > > > > > > What if an omitted key hint only applied for tables from the current > > > query > > > > > block? > > > > > > > > > > > > > Can you elaborate on some examples to illustrate? But if you're > asking > > > > about the usage mentioned in [1], like " SELECT * FROM left /*+ > > > STATE_TTL( > > > > '1h' ) */ JOIN right /*+ STATE_TTL('1d') */ ON ...; ", I'd like to > > > explain > > > > the reason why this syntax is not proposed in the following aspects: > > > > > > > > #1 According to the current hint syntax, hints that follow the > > > > SqlIdentifier are deemed as **DynamicTableOptions** [2]. However, > > > > "table.exec.state.ttl" is not a table option. If we want to support > this > > > > approach, a separate thread may be required to discuss whether hints > > > > that follow an identifier should be limited to table options only. > > > > > > > > #2 On the other hand, according to the "Options Propagation" > mentioned in > > > > FLIP-113 [3], > > > > > > > > > Let T be the table name that attaches hints: > > > > > > > > If T is a view, ignore the hints, we do not allow any query hints in > the > > > > > table hints context, which is ambiguous because the view is > actually a > > > > > query and itself can take a query hint; > > > > > > > > Since the "left" and "right" (the example above) could be either a > table > > > > name or a view name (alias to the query block). The hint key cannot > be > > > > omitted if placed on the query block. So, from this point, having a > > > unified > > > > syntax that covers all situations is challenging. > > > > > > > > [1] > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#examples > > > > < > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#examples > > > > > > > > [2] > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options > > > > < > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#dynamic-table-options > > > > > > > > [3] > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL > > > > < > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL > > > > > > > > > > > > Best, > > > > Jane > > > > > > > > On Thu, Oct 12, 2023 at 7:02 PM Zakelly Lan <zakelly....@gmail.com> > > > wrote: > > > > > > > > > Thanks Jane for clarifying this. I'm ok with keeping the current > hint > > > > > pattern with the table name specified by key. Just thinking if > there > > > > > is another simpler way to define the hint. What if an omitted key > hint > > > > > only applied for tables from the current query block? > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > Best, > > > > > Zakelly > > > > > > > > > > On Wed, Oct 11, 2023 at 11:25 PM Jane Chan <qingyue....@gmail.com> > > > wrote: > > > > > > > > > > > > Thank you all for the valuable feedback. > > > > > > > > > > > > There is a difference when users incorrectly use hints, and there > > > are two > > > > > > possible scenarios: > > > > > > <1> STATE_TTL hint can be applied to the current query block but > > > with an > > > > > > invalid key or value. In this case, the validation exception > will be > > > > > > thrown. The HintOptionChecker will throw exceptions if the > options > > > are > > > > > > empty or the value is an invalid duration. And JoinHintResolver > will > > > > > throw > > > > > > exceptions if the hint key(table name/alias, etc.) does not > exist. So > > > > > does > > > > > > for the group aggregate. > > > > > > > > > > > > <2> STATE_TTL hint cannot be applied to the current query block, > > > e.g., > > > > > > SELECT /*+ STATE_TTL('MyTable' = '2h') */ * a, b, c FROM > MyTable. In > > > this > > > > > > case, the hint is ignored. This is a by-design behavior > according to > > > > > > FLIP-229 [1]. > > > > > > > > > > > > I've made the modifications to the FLIP regarding exception > > > handling. I > > > > > > would appreciate it if you could review it again. > > > > > > > > > > > > @Xuyang > > > > > > > > > > > > > I notice that using STATE_TTL hints wrongly will not throw any > > > > > > > exceptions. But it seems that in the current join hint > scenario, if > > > > > user > > > > > > > uses an unknown table name as the chosen side, a validation > > > exception > > > > > will > > > > > > > be thrown. Maybe we should distinguish which exceptions need > to be > > > > > thrown > > > > > > > explicitly. > > > > > > > > > > > > > > > > > > You're right; the STATE_TTL hint semantic check should throw > > > exceptions > > > > > > like join hints. > > > > > > > > > > > > @Feng @Yun > > > > > > > > > > > > > since there is no exception thrown when the STATE hint is set > > > > > incorrectly, > > > > > > > should we somehow show that the TTL setting has taken effect? > > > > > > > > > > > > For instance, exhibiting the TTL option within the operator's > > > > > description? > > > > > > > > > > > > > > > > > > We can throw explicit exceptions for scenario #1. For scenario > #2, I > > > > > prefer > > > > > > to align the behavior for current query block hints for now (and > we > > > may > > > > > > open a separate discussion in the future). On the other hand, > from > > > the > > > > > > implementation aspect, it is not easy to do so. Taking the > example of > > > > > > deduplication mentioned earlier, the hint is lost before it > > > propagates to > > > > > > the FlinkLogicalRank node, making it challenging to capture the > > > > > exception. > > > > > > > > > > > > @Zakelly > > > > > > > > > > > > > would it be possible to provide a simple hint that allows the > > > omission > > > > > of > > > > > > > the key? For example, something like "SELECT /+ > STATE_TTL('1h')/", > > > > > which > > > > > > > would specify the TTL for all states in the 'SELECT' clause. > > > > > > > > > > > > > > > > > > I'm afraid the hint key cannot be omitted. E.g., the join > operator > > > is a > > > > > > TwoInputStreamOperator; if we want to specify different state > TTLs > > > for > > > > > the > > > > > > left and right input, we need to inform the planner of the > ordinal > > > info > > > > > > (LEFT v.s. RIGHT). On the other hand, the SELECT clause may > contain > > > > > several > > > > > > query blocks, while the hint scope is designed to apply to the > > > current > > > > > > query block [2] to prevent the hint on the outer query block from > > > > > > propagating to the inner query block. For the use case where > users > > > need > > > > > to > > > > > > specify the TTL for all states in the 'SELECT' clause, it is > > > preferable > > > > > to > > > > > > modify the compiled plan instead of using hints. > > > > > > > > > > > > @ConradJam > > > > > > I share the same opinion with @Yun that this is a nice-to-have > > > feature. > > > > > Big > > > > > > +1 for the follow-up on FLINK-33230. Users can now use the > COMPILE > > > PLAN > > > > > > statement to serialize the query to a JSON string or file and > then > > > check > > > > > > the state metadata to ensure the hint is applied. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job > > > > > > < > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-229%3A+Introduces+Join+Hint+for+Flink+SQL+Batch+Job > > > > > > > > > > > > [2] > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#query-hints > > > > > > < > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#query-hints > > > > > > > > > > > > > > > > > > Best, > > > > > > Jane > > > > > > > > > > > > > > > > > > On Wed, Oct 11, 2023 at 8:49 PM Yun Tang <myas...@live.com> > wrote: > > > > > > > > > > > > > I think showing the TTL for operators is a nice-to-have > feature, > > > not a > > > > > > > must one in this FLIP. We can still get the information from > the > > > > > operator > > > > > > > descriptions. > > > > > > > > > > > > > > And I think we can continue the TTL showing work based on > > > FLINK-33230 > > > > > [1]. > > > > > > > > > > > > > > Last but not least, I prefer to throw exceptions if the TTL > > > > > configuration > > > > > > > is mistakenly used as it will affect the data correctness. > > > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-33230 > > > > > > > > > > > > > > Best > > > > > > > Yun Tang > > > > > > > ________________________________ > > > > > > > From: ConradJam <jam.gz...@gmail.com> > > > > > > > Sent: Wednesday, October 11, 2023 20:30 > > > > > > > To: dev@flink.apache.org <dev@flink.apache.org> > > > > > > > Subject: Re: Re: [DISCUSS] FLIP-373: Support Configuring > Different > > > > > State > > > > > > > TTLs using SQL Hint > > > > > > > > > > > > > > +1 TTL shows the state ttl for operators in Flink web ui can be > > > know > > > > > what > > > > > > > operator state > > > > > > > > > > > > > > Zakelly Lan <zakelly....@gmail.com> 于2023年10月11日周三 19:14写道: > > > > > > > > > > > > > > > Hi Jane, > > > > > > > > > > > > > > > > The fine-grained TTL management is extremely useful for > > > performance > > > > > > > > tuning, so +1 for the idea. I have a minor suggestion: would > it > > > be > > > > > > > > possible to provide a simple hint that allows the omission > of the > > > > > key? > > > > > > > > For example, something like "SELECT /+ STATE_TTL('1h')/", > which > > > would > > > > > > > > specify the TTL for all states in the 'SELECT' clause. > > > > > > > > > > > > > > > > And I also share the same concern as Feng. I am wondering if > we > > > could > > > > > > > > show the state ttl for operators in Flink UI. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > Zakelly > > > > > > > > > > > > > > > > On Wed, Oct 11, 2023 at 1:27 PM Feng Jin < > jinfeng1...@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi Jane, > > > > > > > > > > > > > > > > > > Thank you for providing this explanation. > > > > > > > > > > > > > > > > > > Another small question, since there is no exception thrown > > > when the > > > > > > > STATE > > > > > > > > > hint is set incorrectly, > > > > > > > > > should we somehow show that the TTL setting has taken > effect? > > > > > > > > > For instance, exhibiting the TTL option within the > operator's > > > > > > > > description? > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Feng > > > > > > > > > > > > > > > > > > On Tue, Oct 10, 2023 at 7:51 PM Xuyang <xyzhong...@163.com > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi, Jane. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I think this syntax will be easier for users to set > operator > > > > > ttl. So > > > > > > > > big > > > > > > > > > > +1. I left some minor comments here. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I notice that using STATE_TTL hints wrongly will not > throw > > > any > > > > > > > > exceptions. > > > > > > > > > > But it seems that in the current join hint scenario, > > > > > > > > > > if user uses an unknown table name as the chosen side, a > > > > > validation > > > > > > > > > > exception will be thrown. > > > > > > > > > > Maybe we should distinguish which exceptions need to be > > > thrown > > > > > > > > explicitly. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > Best! > > > > > > > > > > Xuyang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 2023-10-10 18:23:55, "Jane Chan" < > qingyue....@gmail.com> > > > > > wrote: > > > > > > > > > > >Hi Feng, > > > > > > > > > > > > > > > > > > > > > >Thank you for your valuable comments. The reason for not > > > > > including > > > > > > > the > > > > > > > > > > >scenarios above is as follows: > > > > > > > > > > > > > > > > > > > > > >For <1>, the automatically inferred stateful operators > are > > > not > > > > > > > easily > > > > > > > > > > >expressible in SQL. This issue was discussed in > FLIP-292, > > > and > > > > > > > besides > > > > > > > > > > >ChangelogNormalize, SinkUpsertMateralizer also faces the > > > same > > > > > > > problem. > > > > > > > > > > > > > > > > > > > > > >For <2> and <3>, the challenge lies in internal > > > implementation. > > > > > > > > During the > > > > > > > > > > >default_rewrite phase, the row_number expression in > > > > > LogicalProject > > > > > > > is > > > > > > > > > > >transformed into LogicalWindow by Calcite's > > > > > > > > > > >CoreRules#PROJECT_TO_LOGICAL_PROJECT_AND_WINDOW. > However, > > > > > > > > CalcRelSplitter > > > > > > > > > > >does not pass the hints as an input argument when > creating > > > > > > > > LogicalWindow, > > > > > > > > > > >resulting in the loss of the hint at this step. To > support > > > > > this, we > > > > > > > > may > > > > > > > > > > >need to rewrite some optimization rules in Calcite, > which > > > could > > > > > be a > > > > > > > > > > >follow-up work if required. > > > > > > > > > > > > > > > > > > > > > >Best, > > > > > > > > > > >Jane > > > > > > > > > > > > > > > > > > > > > >On Tue, Oct 10, 2023 at 1:40 AM Feng Jin < > > > jinfeng1...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> Hi Jane, > > > > > > > > > > >> > > > > > > > > > > >> Thank you for proposing this FLIP. > > > > > > > > > > >> > > > > > > > > > > >> I believe that this FLIP will greatly enhance the > > > flexibility > > > > > of > > > > > > > > setting > > > > > > > > > > >> state, and by setting different operators' TTL, it can > > > also > > > > > > > > increase job > > > > > > > > > > >> stability, especially in regular join scenarios. > > > > > > > > > > >> The parameter design is very concise, big +1 for this, > > > and it > > > > > is > > > > > > > > also > > > > > > > > > > >> relatively easy to use for users. > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> I have a small question: in the FLIP, it only mentions > > > join > > > > > and > > > > > > > > group. > > > > > > > > > > >> Should we also consider other scenarios? > > > > > > > > > > >> > > > > > > > > > > >> 1. the auto generated deduplicate operator[1]. > > > > > > > > > > >> 2. the deduplicate query[2]. > > > > > > > > > > >> 3. the topN query[3]. > > > > > > > > > > >> > > > > > > > > > > >> [1] > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-exec-source-cdc-events-duplicate > > > > > > > > > > >> [2] > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/deduplication/ > > > > > > > > > > >> [3] > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/topn/ > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> Best, > > > > > > > > > > >> Feng > > > > > > > > > > >> > > > > > > > > > > >> On Sun, Oct 8, 2023 at 5:53 PM Jane Chan < > > > > > qingyue....@gmail.com> > > > > > > > > wrote: > > > > > > > > > > >> > > > > > > > > > > >> > Hi devs, > > > > > > > > > > >> > > > > > > > > > > > >> > I'd like to initiate a discussion on FLIP-373: > Support > > > > > > > Configuring > > > > > > > > > > >> > Different State TTLs using SQL Hint [1]. This > proposal > > > is > > > > > on top > > > > > > > > of > > > > > > > > > > the > > > > > > > > > > >> > FLIP-292 [2] to address typical scenarios with > > > unambiguous > > > > > > > > semantics > > > > > > > > > > and > > > > > > > > > > >> > hint propagation. > > > > > > > > > > >> > > > > > > > > > > > >> > I'm looking forward to your opinions! > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > [1] > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-373%3A+Support+Configuring+Different+State+TTLs+using+SQL+Hint > > > > > > > > > > >> > [2] > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-292%3A+Enhance+COMPILED+PLAN+to+support+operator-level+state+TTL+configuration > > > > > > > > > > >> > > > > > > > > > > > >> > Best, > > > > > > > > > > >> > Jane > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Best > > > > > > > > > > > > > > ConradJam > > > > > > > > > > > > > > > > > > > -- > > Best, > Benchao Li >