[ https://issues.apache.org/jira/browse/FLINK-22737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17354265#comment-17354265 ]
Timo Walther edited comment on FLINK-22737 at 5/31/21, 7:40 AM: ---------------------------------------------------------------- I'm against returning 0 as the initial watermark for the following reasons - This introduces a third value with special meaning next to Long.MIN_VALUE and LONG.MAX_VALUE. - It goes against the design of all other components in the code base (from DataStream API to Web UI). The ProcessFunction returns Long.MIN_VALUE when querying the timer service and the Web UI also checks against Long.MIN_VALUE and displays "No watermark received yet.". - It prevents the processing of historical data. This might not be a strong argument but it could happen that data from 1970 is processed (banks? weather data? stock data?). The decision to use Long.MIN_VALUE in streaming operators is definitely safer to prevent unintended side effects. Also, when further combing bounded and unbounded data processing, watermarks could also describe non-real-time data in the near future. Personally, I would go for Long.MIN_VALUE or NULL. The problem we have in SQL is that new type system is actually limiting the precision of years to 4 digits in the definition of `TimestampType`. So NULL is a good alternative. It is true that it introduces a three-valued logic but this should not be a problem for handling late data. {{FROM T WHERE rowtime < CURRENT_WATERMARK}} evaluates to false until a watermark arrives which is correct. was (Author: twalthr): I'm against returning 0 as the initial watermark for the following reasons - This introduces a third value with special meaning next to Long.MIN_VALUE and LONG.MAX_VALUE. - It goes against the design of all other components in the code base (from DataStream API to Web UI). The ProcessFunction returns Long.MIN_VALUE when querying the timer service and the Web UI also checks against Long.MIN_VALUE and displays "No watermark received yet.". - It prevents the processing of historical data. This might not be a strong argument but it could happen that data from 1970 is processed (banks? weather data? stock data?). The decision to use Long.MIN_VALUE in streaming operators is definitely safer to prevent unintended side effects. Also, when further combing bounded and unbounded data processing watermarks could also describe non-real-time data in the near future. Personally, I would go for Long.MIN_VALUE or NULL. The problem we have in SQL is that new type system is actually limiting the precision of years to 4 digits in the definition of `TimestampType`. So NULL is a good alternative. It is true that it introduces a three-valued logic but this should not be a problem for handling late data. {{FROM T WHERE rowtime < CURRENT_WATERMARK}} evaluates to false until a watermark arrives which is correct. > Add support for CURRENT_WATERMARK to SQL > ---------------------------------------- > > Key: FLINK-22737 > URL: https://issues.apache.org/jira/browse/FLINK-22737 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API > Reporter: David Anderson > Assignee: Ingo Bürk > Priority: Major > > With a built-in function returning the current watermark, one could operate > on late events without resorting to using the DataStream API. > Called with zero parameters, this function returns the current watermark for > the current row – if there is an event time attribute. Otherwise, it returns > NULL. -- This message was sent by Atlassian Jira (v8.3.4#803005)