Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag
Hi Xuannan, I leave my comments inline. > In the case where a user wants to > use a CDC source and also determine backlog status based on watermark > lag, we still need to define the rule when that occurs This rule should be defined by the source itself (who knows backlog best), not by the framework. In the case of CDC source, it reports isBacklog=true during snapshot stage, and report isBacklog=false during changelog stage if watermark-lag is within the threshold. I think it's not intuitive to combine it with the logical OR operation. Even for the combination logic of backlog status from different channels, FLIP-309 said it is "up to the operator to determine its output records' isBacklog value" and proposed 3 different strategies. Therefore, I think backlog status from a single source should be up to the source. IMO, a better API design is not how to resolve conflicts but not introducing conflicts. Let the source determine backlog status removes the conflicts and I don't see big disadvantages. > It should not confuse the user that > DataStream#assignTimestampsAndWatermarks doesn't work with > backlog.watermark-lag-threshold, as it is not a source. Hmm, so this configuration may confuse Flink SQL users, because all watermarks are defined on the source DDL, but it may use a separate operator to emit watermarks if the source doesn't support emitting watermarks. > I think the description in the FLIP actually means the other way > around, where the job can never switch back to batch mode once it has > switched into streaming mode. This is to align with the current state > of FLIP-327[1], where only switching from batch to stream mode is > supported. This sounds like a limitation of FLIP-327 (that execution mode depends on backlog status). But the backlog status shouldn't have this limitation, because it is not only used for execution switching. Best, Jark On Fri, 8 Sept 2023 at 19:09, Xuannan Su wrote: > Hi Jark and Leonard, > > Thanks for the comments. Please see my reply below. > > @Jark > > > I think a better API doesn't compete with itself. Therefore, I'm in > favor of > > supporting the watermark lag threshold for each source without > introducing > > any framework API and configuration. > > I don't think supporting the watermark lag threshold for each source > can avoid the competition problem. In the case where a user wants to > use a CDC source and also determine backlog status based on watermark > lag, we still need to define the rule when that occurs. With that > said, I think it is more intuitive to combine it with the logical OR > operation, as the strategies (FLIP-309, FLIP-328) only determine when > the source's backlog status should be True. What do you think? > > > Besides, this can address another concern that the watermark may be > > generated by DataStream#assignTimestampsAnd > > Watermarks which doesn't > > work with the backlog.watermark-lag-threshold job config > > The description of the configuration explicitly states that "a source > would report isProcessingBacklog=true if its watermark lag exceeds the > configured value". It should not confuse the user that > DataStream#assignTimestampsAndWatermarks doesn't work with > backlog.watermark-lag-threshold, as it is not a source. > > > Does that mean the job can never back to streaming mode once switches > into > > backlog mode? It sounds like not a complete FLIP to me. Is it possible to > > support switching back in this FLIP? > > I think the description in the FLIP actually means the other way > around, where the job can never switch back to batch mode once it has > switched into streaming mode. This is to align with the current state > of FLIP-327[1], where only switching from batch to stream mode is > supported. > > @Leonard > > > > The FLIP describe that: And it should report isProcessingBacklog=false > at the beginning of the snapshot stage. > > This should be “changelog stage” > > I think the description is in FLIP-309. Thanks for pointing out. I > updated the description. > > > I'm not sure if it's enough to support this feature only in FLIP-27 > Source. Although we are pushing the sourceFunction API to be removed, these > APIs will be survive one or two versions in flink repo before they are > actually removed. > > I agree that it is good to support the SourceFunction API. However, > given that the SourceFunction API is marked as deprecated, I think I > will prioritize supporting the FLIP-27 Source. We can support the > SourceFunction API after the > FLIP-27 source. What do you think? > > Best regards, > Xuannan > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-327%3A+Support+switching+from+batch+to+stream+mode+to+improve+throughput+when+processing+backlog+data > > > > > On Fri, Sep 8, 2023 at 1:02 AM Leonard Xu wrote: > > > > Thanks Xuannan for driving this FLIP ! > > > > The proposal generally looks good to me, but I still left some comments: > > > > > One more question about the FLIP is that the
Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints
Hi, Thanks for bringing this to our attention. At the first glance, it looks reasonable to offer a new configuration to enable/disable SQL hints globally. However, IMHO, it is not the right timing to do it now, because we should not only think as platform providers but also as end users(the number of end users are much bigger than platform providers): 1. Users don't need it because users have the choice to use hints or not, just like Jark pointed out. With this configuration, there will be a fight between platform providers and users which will cause more confusions and conflicts. And users will probably win, IMHO, because they are the end customers that use Flink to create business values. 2. SQL hints could be considered as an additional feature for users to control, to optimize the execution plan without touching the internal logic, i.e. features for advanced use cases and i.e. don't use it if you don't understand it. 3. Before the system is smart enough to take over(where we are now, fortunately and unfortunately :-))), there should be a way for users to do such tuning, even if it is a temporary phase from a long-term's perspective, i.e. just because it is a temporary solution, does not mean it is not necessary for now. 4. What if users write wrong hints? Well, the code review process is recommended. Someone who truly understands hints should double check it before hints are merged to the master or submitted to the production env. Just like a common software development process. Just my two cents. Best regards, Jing On Thu, Sep 7, 2023 at 10:02 PM Bonnie Arogyam Varghese wrote: > Hi Liu, > The default will be set to enabled which is the current behavior. The > option will allow users/platform providers to disable it if they want to. > > On Wed, Sep 6, 2023 at 6:39 PM liu ron wrote: > > > Hi, Boonie > > > > I'm with Jark on why disable hint is needed if it won't affect security. > If > > users don't need to use hint, then they won't care about it and I don't > > think it's going to be a nuisance. On top of that, Lookup Join Hint is > very > > useful for streaming jobs, and disabling the hint would result in users > not > > being able to use it. > > > > Best, > > Ron > > > > Bonnie Arogyam Varghese 于2023年9月6日周三 > > 23:52写道: > > > > > Hi Liu Ron, > > > To answer your question, > > >Security might not be the main reason for disabling this option but > > > other arguments brought forward by Timo. Let me know if you have any > > > further questions or concerns. > > > > > > On Tue, Sep 5, 2023 at 9:35 PM Bonnie Arogyam Varghese < > > > bvargh...@confluent.io> wrote: > > > > > > > It looks like it will be nice to have a config to disable hints. Any > > > other > > > > thoughts/concerns before we can close this discussion? > > > > > > > > On Fri, Aug 18, 2023 at 7:43 AM Timo Walther > > wrote: > > > > > > > >> > lots of the streaming SQL syntax are extensions of SQL standard > > > >> > > > >> That is true. But hints are kind of a special case because they are > > not > > > >> even "part of Flink SQL" that's why they are written in a comment > > > syntax. > > > >> > > > >> Anyway, I feel hints could be sometimes confusing for users because > > most > > > >> of them have no effect for streaming and long-term we could also set > > > >> some hints via the CompiledPlan. And if you have multiple teams, > > > >> non-skilled users should not play around with hints and leave the > > > >> decision to the system that might become smarter over time. > > > >> > > > >> Regards, > > > >> Timo > > > >> > > > >> > > > >> On 17.08.23 18:47, liu ron wrote: > > > >> > Hi, Bonnie > > > >> > > > > >> >> Options hints could be a security concern since users can > override > > > >> > settings. > > > >> > > > > >> > I think this still doesn't answer my question > > > >> > > > > >> > Best, > > > >> > Ron > > > >> > > > > >> > Jark Wu 于2023年8月17日周四 19:51写道: > > > >> > > > > >> >> Sorry, I still don't understand why we need to disable the query > > > hint. > > > >> >> It doesn't have the security problems as options hint. Bonnie > said > > it > > > >> >> could affect performance, but that depends on users using it > > > >> explicitly. > > > >> >> If there is any performance problem, users can remove the hint. > > > >> >> > > > >> >> If we want to disable query hint just because it's an extension > to > > > SQL > > > >> >> standard. > > > >> >> I'm afraid we have to introduce a bunch of configuration, because > > > lots > > > >> of > > > >> >> the streaming SQL syntax are extensions of SQL standard. > > > >> >> > > > >> >> Best, > > > >> >> Jark > > > >> >> > > > >> >> On Thu, 17 Aug 2023 at 15:43, Timo Walther > > > wrote: > > > >> >> > > > >> >>> +1 for this proposal. > > > >> >>> > > > >> >>> Not every data team would like to enable hints. Also because > they > > > are > > > >> an > > > >> >>> extension to the SQL standard. It might also be the case that > > custom > > > >> >>> rules would be overwritten otherwise. Se
Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints
I agree with Jing, My biggest concern is this makes the boundary of adding an option very unclear. It's not a strong reason to add a config just because of it doesn't affect existing users. Does this mean that in the future we might add an option to disable each feature? Flink already has a very long list of configurations [1][2] and this is very scary and not easy to use. We should try to remove the unnecessary configuration from the list in Flink 2.0. However, from my perspective, adding this option makes us far away from this direction. Best, Jark [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/ [2] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/ On Sat, 9 Sept 2023 at 17:33, Jing Ge wrote: > Hi, > > Thanks for bringing this to our attention. At the first glance, it looks > reasonable to offer a new configuration to enable/disable SQL hints > globally. However, IMHO, it is not the right timing to do it now, because > we should not only think as platform providers but also as end users(the > number of end users are much bigger than platform providers): > > 1. Users don't need it because users have the choice to use hints or not, > just like Jark pointed out. With this configuration, there will be a fight > between platform providers and users which will cause more confusions and > conflicts. And users will probably win, IMHO, because they are the end > customers that use Flink to create business values. > 2. SQL hints could be considered as an additional feature for users to > control, to optimize the execution plan without touching the internal > logic, i.e. features for advanced use cases and i.e. don't use it if you > don't understand it. > 3. Before the system is smart enough to take over(where we are now, > fortunately and unfortunately :-))), there should be a way for users to do > such tuning, even if it is a temporary phase from a > long-term's perspective, i.e. just because it is a temporary solution, does > not mean it is not necessary for now. > 4. What if users write wrong hints? Well, the code review process is > recommended. Someone who truly understands hints should double check it > before hints are merged to the master or submitted to the production env. > Just like a common software development process. > > Just my two cents. > > Best regards, > Jing > > On Thu, Sep 7, 2023 at 10:02 PM Bonnie Arogyam Varghese > wrote: > > > Hi Liu, > > The default will be set to enabled which is the current behavior. The > > option will allow users/platform providers to disable it if they want to. > > > > On Wed, Sep 6, 2023 at 6:39 PM liu ron wrote: > > > > > Hi, Boonie > > > > > > I'm with Jark on why disable hint is needed if it won't affect > security. > > If > > > users don't need to use hint, then they won't care about it and I don't > > > think it's going to be a nuisance. On top of that, Lookup Join Hint is > > very > > > useful for streaming jobs, and disabling the hint would result in users > > not > > > being able to use it. > > > > > > Best, > > > Ron > > > > > > Bonnie Arogyam Varghese 于2023年9月6日周三 > > > 23:52写道: > > > > > > > Hi Liu Ron, > > > > To answer your question, > > > >Security might not be the main reason for disabling this option > but > > > > other arguments brought forward by Timo. Let me know if you have any > > > > further questions or concerns. > > > > > > > > On Tue, Sep 5, 2023 at 9:35 PM Bonnie Arogyam Varghese < > > > > bvargh...@confluent.io> wrote: > > > > > > > > > It looks like it will be nice to have a config to disable hints. > Any > > > > other > > > > > thoughts/concerns before we can close this discussion? > > > > > > > > > > On Fri, Aug 18, 2023 at 7:43 AM Timo Walther > > > wrote: > > > > > > > > > >> > lots of the streaming SQL syntax are extensions of SQL standard > > > > >> > > > > >> That is true. But hints are kind of a special case because they > are > > > not > > > > >> even "part of Flink SQL" that's why they are written in a comment > > > > syntax. > > > > >> > > > > >> Anyway, I feel hints could be sometimes confusing for users > because > > > most > > > > >> of them have no effect for streaming and long-term we could also > set > > > > >> some hints via the CompiledPlan. And if you have multiple teams, > > > > >> non-skilled users should not play around with hints and leave the > > > > >> decision to the system that might become smarter over time. > > > > >> > > > > >> Regards, > > > > >> Timo > > > > >> > > > > >> > > > > >> On 17.08.23 18:47, liu ron wrote: > > > > >> > Hi, Bonnie > > > > >> > > > > > >> >> Options hints could be a security concern since users can > > override > > > > >> > settings. > > > > >> > > > > > >> > I think this still doesn't answer my question > > > > >> > > > > > >> > Best, > > > > >> > Ron > > > > >> > > > > > >> > Jark Wu 于2023年8月17日周四 19:51写道: > > > > >> > > > > > >> >> Sorry, I still don't understand why we need to disable the > query > >