Hi, Jark > I have a minor suggestion:
> I think we will still suggest users use TIMESTAMP even if we have > TIMESTAMP_NTZ. Then it seems > introducing TIMESTAMP_NTZ doesn't help much for users, but introduces more > learning costs. I think your suggestion makes sense, we should suggest users use TIMESTAMP for TIMESTAMP WITHOUT TIME ZONE as we did now, updated as following: original type name : shortcut type name : TIMESTAMP / TIMESTAMP WITHOUT TIME ZONE <=> TIMESTAMP TIMESTAMP WITH LOCAL TIME ZONE <=> TIMESTAMP_LTZ TIMESTAMP WITH TIME ZONE <=> TIMESTAMP_TZ (supports them in the future) Best, Leonard > > > > > > On Thu, 28 Jan 2021 at 18:52, Leonard Xu <xbjt...@gmail.com > <mailto:xbjt...@gmail.com>> wrote: > >> Thanks all for sharing your opinions. >> >> Looks like we’ve reached a consensus about the topic. >> >> @Timo: >>> 1) Are we on the same page that LOCALTIMESTAMP returns TIMESTAMP and not >> TIMESTAMP_LTZ? Maybe we should quickly list also LOCALTIME/LOCALDATE and >> LOCALTIMESTAMP for completeness. >> Yes, LOCALTIMESTAMP returns TIMESTAMP, LOCALTIME returns TIME, the >> behavior of them is clear so I just listed them in the excel[1] of this >> FLIP references. >> >>> 2) Shall we add aliases for the timestamp types as part of this FLIP? I >> see Snowflake supports TIMESTAMP_LTZ , TIMESTAMP_NTZ , TIMESTAMP_TZ [1]. I >> think the discussion was quite cumbersome with the full string of >> `TIMESTAMP WITH LOCAL TIME ZONE`. With this FLIP we are making this type >> even more prominent. And important concepts should have a short name >> because they are used frequently. According to the FLIP, we are introducing >> the abbriviation already in function names like `TO_TIMESTAMP_LTZ`. >> `TIMESTAMP_LTZ` could be treated similar to `STRING` for >> `VARCHAR(MAX_INT)`, the serializable string representation would not change. >> >> @Timo @Jark >> Nice idea, I also suffered from the long name during the discussions, the >> abbreviation will not only help us, but also makes it more convenient for >> users. I list the abbreviation name mapping to support: >> TIMESTAMP WITHOUT TIME ZONE <=> TIMESTAMP_NTZ (which synonyms >> TIMESTAMP) >> TIMESTAMP WITH LOCAL TIME ZONE <=> TIMESTAMP_LTZ >> TIMESTAMP WITH TIME ZONE <=> TIMESTAMP_TZ (supports >> them in the future) >>> 3) I'm fine with supporting all conversion classes like >> java.time.LocalDateTime, java.sql.Timestamp that TimestampType supported >> for LocalZonedTimestampType. But we agree that Instant stays the default >> conversion class right? The default extraction defined in [2] will not >> change, correct? >> Yes, Instant stays the default conversion class. The default >> >>> 4) I would remove the comment "Flink supports TIME-related types with >> precision well", because unfortunately this is still not correct. We still >> have issues with TIME(9), it would be great if someone can finally fix that >> though. Maybe the implementation of this FLIP would be a good time to fix >> this issue. >> You’re right, TIME(9) is not supported yet, I'll take account of TIME(9) >> to the scope of this FLIP. >> >> >> I’ve updated this FLIP[2] according your suggestions @Jark @Timo >> I’ll start the vote soon if there’re no objections. >> >> Best, >> Leonard >> >> [1] >> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing >> < >> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing >> >> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing> >>> >> [2] >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior> >> < >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162:+Consistent+Flink+SQL+time+function+behavior >> >> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-162:+Consistent+Flink+SQL+time+function+behavior>> >> >> >>> >>> On 28.01.21 03:18, Jark Wu wrote: >>>> Thanks Leonard for the further investigation. >>>> I think we all agree we should correct the return value of >>>> CURRENT_TIMESTAMP. >>>> Regarding the return type of CURRENT_TIMESTAMP, I also agree >> TIMESTAMP_LTZ >>>> would be more worldwide useful. This may need more effort, but if this >> is >>>> the right direction, we should do it. >>>> Regarding the CURRENT_TIME, if CURRENT_TIMESTAMP returns >>>> TIMESTAMP_LTZ, then I think CURRENT_TIME shouldn't return TIME_TZ. >>>> Otherwise, CURRENT_TIME will be quite special and strange. >>>> Thus I think it has to return TIME type. Given that we already have >>>> CURRENT_DATE which returns >>>> DATE WITHOUT TIME ZONE, I think it's fine to return TIME WITHOUT TIME >> ZONE >>>> for CURRENT_TIME. >>>> In a word, the updated FLIP looks good to me. I especially like the >>>> proposed new function TO_TIMESTAMP_LTZ(numeric, [,scale]). >>>> This will be very convenient to define rowtime on a long value which is >> a >>>> very common case and has been complained a lot in mailing list. >>>> Best, >>>> Jark >>>> On Mon, 25 Jan 2021 at 21:12, Kurt Young <ykt...@gmail.com> wrote: >>>>> Thanks Leonard for the detailed response and also the bad case about >> option >>>>> 1, these all >>>>> make sense to me. >>>>> >>>>> Also nice catch about conversion support of LocalZonedTimestampType, I >>>>> think it actually >>>>> makes sense to support java.sql.Timestamp as well as >>>>> java.time.LocalDateTime. It also has >>>>> a slight benefit that we might have a chance to run the udf which took >> them >>>>> as input parameter >>>>> after we change the return type. >>>>> >>>>> Regarding to the return type of CURRENT_TIME, I also think timezone >>>>> information is not useful. >>>>> To not expand this FLIP further, I'm lean to keep it as it is. >>>>> >>>>> Best, >>>>> Kurt >>>>> >>>>> >>>>> On Mon, Jan 25, 2021 at 8:50 PM Leonard Xu <xbjt...@gmail.com> wrote: >>>>> >>>>>> Hi, All >>>>>> >>>>>> Thanks for your comments. I think all of the thread have agreed that: >>>>>> (1) The return values of >> CURRENT_TIME/CURRENT_TIMESTAMP/NOW()/PROCTIME() >>>>>> are wrong. >>>>>> (2) The LOCALTIME/LOCALTIMESTAMP and CURRENT_TIME/CURRENT_TIMESTAMP >>>>> should >>>>>> be different whether from SQL standard’s perspective or mature >> systems. >>>>>> (3) The semantics of three TIMESTAMP types in Flink SQL follows the >> SQL >>>>>> standard and also keeps the same with other 'good' vendors. >>>>>> TIMESTAMP => A literal in >>>>>> ‘yyyy-MM-dd HH:mm:ss’ format to describe a time, does not contain >>>>> timezone >>>>>> info, can not represent an absolute time point. >>>>>> TIMESTAMP WITH LOCAL ZONE => Records the elapsed time from >> absolute >>>>>> time point origin, can represent an absolute time point, requires >> local >>>>>> time zone when expressed with ‘yyyy-MM-dd HH:mm:ss’ format. >>>>>> TIMESTAMP WITH TIME ZONE => Consists of time zone info and a >>>>>> literal in ‘yyyy-MM-dd HH:mm:ss’ format to describe time, can >> represent >>>>> an >>>>>> absolute time point. >>>>>> >>>>>> >>>>>> Currently we've two ways to correct >>>>>> CURRENT_TIME/CURRENT_TIMESTAMP/NOW()/PROCTIME(). >>>>>> >>>>>> option (1): As the FLIP proposed, change the return value from UTC >>>>>> timezone to local timezone. >>>>>> Pros: (1) The change looks smaller to users and developers >> (2) >>>>>> There're many SQL engines adopted this way >>>>>> Cons: (1) connector devs may confuse the underlying value of >>>>>> TimestampData which needs to change according to data type (2) I >> thought >>>>>> about this weekend. Unfortunately I found a bad case: >>>>>> >>>>>> The proposal is fine if we only use it in FLINK SQL world, but we >> need to >>>>>> consider the conversion between Table/DataStream, assume a record >>>>> produced >>>>>> in UTC+0 timezone with TIMESTAMP '1970-01-01 08:00:44' and the Flink >> SQL >>>>>> processes the data with session time zone 'UTC+8', if the sql program >>>>> need >>>>>> to convert the Table to DataStream, then we need to calculate the >>>>> timestamp >>>>>> in StreamRecord with session time zone (UTC+8), then we will get 44 in >>>>>> DataStream program, but it is wrong because the expected value should >> be >>>>> (8 >>>>>> * 60 * 60 + 44). The corner case tell us that the ROWTIME/PROCTIME in >>>>> Flink >>>>>> are based on UTC+0, when correct the PROCTIME() function, the better >> way >>>>> is >>>>>> to use TIMESTAMP WITH LOCAL TIME ZONE which keeps same long value with >>>>> time >>>>>> based on UTC+0 and can be expressed with local timezone. >>>>>> >>>>>> option (2) : As we considered in the FLIP as well as @Timo suggested, >>>>>> change the return type to TIMESTAMP WITH LOCAL TIME ZONE, the >> expressed >>>>>> value depends on the local time zone. >>>>>> Pros: (1) Make Flink SQL more close to SQL standard (2) Can >> deal >>>>>> the conversion between Table/DataStream well >>>>>> Cons: (1) We need to discuss the return value/type of >>>>> CURRENT_TIME >>>>>> function (2) The change is bigger to users, we need to support >> TIMESTAMP >>>>>> WITH LOCAL TIME ZONE in connectors/formats as well as custom >> connectors. >>>>>> (3)The TIMESTAMP WITH LOCAL TIME ZONE support is >> weak >>>>>> in Flink, thus we need some improvement,but the workload does not >> matter >>>>>> as long as we are doing the right thing ^_^ >>>>>> >>>>>> Due to the above bad case for option (1). I think option 2 should be >>>>>> adopted, >>>>>> But we also need to consider some problems: >>>>>> (1) More conversion classes like LocalDateTime, sql.Timestamp should >> be >>>>>> supported for LocalZonedTimestampType to resolve the UDF compatibility >>>>> issue >>>>>> (2) The timezone offset for window size of one day should still be >>>>>> considered >>>>>> (3) All connectors/formats should supports TIMESTAMP WITH LOCAL TIME >> ZONE >>>>>> well and we also should record in document >>>>>> I’ll update these sections of FLIP-162. >>>>>> >>>>>> >>>>>> >>>>>> We also need to discuss the CURRENT_TIME function. I know the standard >>>>> way >>>>>> is using TIME WITH TIME ZONE(there's no TIME WITH LOCAL TIME ZONE), >> but >>>>> we >>>>>> don't support this type yet and I don't see strong motivation to >> support >>>>> it >>>>>> so far. >>>>>> Compared to CURRENT_TIMESTAMP, the CURRENT_TIME can not represent an >>>>>> absolute time point which should be considered as a string consisting >> of >>>>> a >>>>>> time with 'HH:mm:ss' format and time zone info. We have several >> options >>>>>> for this: >>>>>> (1) We can forbid CURRENT_TIME as @Timo proposed to make all Flink SQL >>>>>> functions follow the standard well, in this way, we need to offer >> some >>>>>> guidance for user upgrading Flink versions. >>>>>> (2) We can also support it from a user's perspective who has used >>>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, btw,Snowflake also >> returns >>>>>> TIME type. >>>>>> (3) Returns TIMESTAMP WITH LOCAL TIME ZONE to make it equal to >>>>>> CURRENT_TIMESTAMP as Calcite did. >>>>>> >>>>>> I can image (1) which we don't want to left a bad smell in Flink SQL, >>>>> and >>>>>> I also accept (2) because I think users do not consider time zone >> issues >>>>>> when they use CURRENT_DATE/CURRENT_TIME, and the timezone info in >> time is >>>>>> not very useful. >>>>>> >>>>>> I don’t have a strong opinion for them. What do others think? >>>>>> >>>>>> >>>>>> I hope I've addressed your concerns. @Timo @Kurt >>>>>> >>>>>> Best, >>>>>> Leonard >>>>>> >>>>>> >>>>>> >>>>>>> Most of the mature systems have a clear difference between >>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. I wouldn't take Spark or Hive >> as a >>>>>> good example. Snowflake decided for TIMESTAMP WITH LOCAL TIME ZONE. >> As I >>>>>> mentioned in the last comment, I could also imagine this behavior for >>>>>> Flink. But in any case, there should be some time zone information >>>>>> considered in order to cast to all other types. >>>>>>> >>>>>>>>>> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL >>>>>> standard, but >>>>>>>>>> LOCALDATE not, I don’t think it’s a good idea that dropping >>>>>> functions which >>>>>>>>>> SQL standard supported and introducing a replacement which SQL >>>>>> standard not >>>>>>>>>> reminded. >>>>>>> >>>>>>> We can still add those functions in the future. But since we don't >>>>> offer >>>>>> a TIME WITH TIME ZONE, it is better to not support this function at >> all >>>>> for >>>>>> now. And by the way, this is exactly the behavior that also Microsoft >> SQL >>>>>> Server does: it also just supports CURRENT_TIMESTAMP (but it returns >>>>>> TIMESTAMP without a zone which completes the confusion). >>>>>>> >>>>>>>>>> I also agree returning TIMESTAMP WITH LOCAL TIME ZONE for >> PROCTIME >>>>>> has >>>>>>>>>> more clear semantics, but I realized that user didn’t care the >> type >>>>>> but >>>>>>>>>> more about the expressed value they saw, and change the type from >>>>>> TIMESTAMP >>>>>>>>>> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that we >> need >>>>>>>>>> consider all places where the TIMESTAMP type used >>>>>>> >>>>>>> From a UDF perspective, I think nothing will change. The new type >>>>> system >>>>>> and type inference were designed to support all these cases. There is >> a >>>>>> reason why Java has adopted Joda time, because it is hard to come up >>>>> with a >>>>>> good time library. That's why also we and the other Hadoop ecosystem >>>>> folks >>>>>> have decided for 3 different kinds of LocalDateTime, ZonedDateTime, >> and >>>>>> Instance. It makes the library more complex, but time is a complex >> topic. >>>>>>> >>>>>>> I also doubt that many users work with only one time zone. Take the >> US >>>>>> as an example, a country with 3 different timezones. Somebody working >>>>> with >>>>>> US data cannot properly see the data points with just LOCAL TIME ZONE. >>>>> But >>>>>> on the other hand, a lot of event data is stored using a UTC >> timestamp. >>>>>>> >>>>>>> >>>>>>>>> Before jumping into technique details, let's take a step back to >>>>>> discuss >>>>>>>>> user experience. >>>>>>>>> >>>>>>>>> The first important question is what kind of date and time will >>>>> Flink >>>>>>>>> display when users call >>>>>>>>> CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they are >>>>>> similar). >>>>>>>>> >>>>>>>>> Should it always display the date and time in UTC or in the user's >>>>>> time >>>>>>>>> zone? >>>>>>> >>>>>>> @Kurt: I think we all agree that the current behavior with just >> showing >>>>>> UTC is wrong. Also, we all agree that when calling CURRENT_TIMESTAMP >> or >>>>>> PROCTIME a user would like to see the time in it's current time zone. >>>>>>> >>>>>>> As you said, "my wall clock time". >>>>>>> >>>>>>> However, the question is what is the data type of what you "see". If >>>>> you >>>>>> pass this record on to a different system, operator, or different >>>>> cluster, >>>>>> should the "my" get lost or materialized into the record? >>>>>>> >>>>>>> TIMESTAMP -> completely lost and could cause confusion in a different >>>>>> system >>>>>>> >>>>>>> TIMESTAMP WITH LOCAL TIME ZONE -> at least the UTC is correct, so you >>>>>> can provide a new local time zone >>>>>>> >>>>>>> TIMESTAMP WITH TIME ZONE -> also "your" location is persisted >>>>>>> >>>>>>> Regards, >>>>>>> Timo >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 22.01.21 09:38, Kurt Young wrote: >>>>>>>> Forgot one more thing. Continue with displaying in UTC. As a user, >> if >>>>>> Flink >>>>>>>> want to display the timestamp >>>>>>>> in UTC, why don't we offer something like UTC_TIMESTAMP? >>>>>>>> Best, >>>>>>>> Kurt >>>>>>>> On Fri, Jan 22, 2021 at 4:33 PM Kurt Young <ykt...@gmail.com> >> wrote: >>>>>>>>> Before jumping into technique details, let's take a step back to >>>>>> discuss >>>>>>>>> user experience. >>>>>>>>> >>>>>>>>> The first important question is what kind of date and time will >> Flink >>>>>>>>> display when users call >>>>>>>>> CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they are >>>>>> similar). >>>>>>>>> >>>>>>>>> Should it always display the date and time in UTC or in the user's >>>>> time >>>>>>>>> zone? I think this part is the >>>>>>>>> reason that surprised lots of users. If we forget about the type >> and >>>>>>>>> internal representation of these >>>>>>>>> two methods, as a user, my instinct tells me that these two methods >>>>>> should >>>>>>>>> display my wall clock time. >>>>>>>>> >>>>>>>>> Display time in UTC? I'm not sure, why I should care about UTC >> time? >>>>> I >>>>>>>>> want to get my current timestamp. >>>>>>>>> For those users who have never gone abroad, they might not even be >>>>>> able to >>>>>>>>> realize that this is affected >>>>>>>>> by the time zone. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Kurt >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jan 22, 2021 at 12:25 PM Leonard Xu <xbjt...@gmail.com> >>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks @Timo for the detailed reply, let's go on this topic on >> this >>>>>>>>>> discussion, I've merged all mails to this discussion. >>>>>>>>>> >>>>>>>>>>> LOCALDATE / LOCALTIME / LOCALTIMESTAMP >>>>>>>>>>> >>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP >>>>>>>>>>> >>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP >>>>>>>>>>> >>>>>>>>>>> I'm very sceptical about this behavior. Almost all mature systems >>>>>>>>>> (Oracle, Postgres) and new high quality systems (Presto, >> Snowflake) >>>>>> use a >>>>>>>>>> data type with some degree of time zone information encoded. In a >>>>>>>>>> globalized world with businesses spanning different regions, I >> think >>>>>> we >>>>>>>>>> should do this as well. There should be a difference between >>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to >>>>>> choose >>>>>>>>>> which behavior they prefer for their pipeline. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I know that the two series should be different at first glance, >> but >>>>>>>>>> different SQL engines can have their own explanations,for example, >>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP are synonyms in Snowflake[1] >>>>> and >>>>>> has >>>>>>>>>> no difference, and Spark only supports the later one and doesn’t >>>>>> support >>>>>>>>>> LOCALTIME/LOCALTIMESTAMP[2]. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> If we would design this from scatch, I would suggest the >> following: >>>>>>>>>>> >>>>>>>>>>> - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / >>>>>>>>>> LOCALTIME for materialized timestamp parts >>>>>>>>>> >>>>>>>>>> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL >>>>> standard, >>>>>> but >>>>>>>>>> LOCALDATE not, I don’t think it’s a good idea that dropping >>>>> functions >>>>>> which >>>>>>>>>> SQL standard supported and introducing a replacement which SQL >>>>>> standard not >>>>>>>>>> reminded. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to >>>>>>>>>> materialize all session time information into every record. It it >>>>> the >>>>>> most >>>>>>>>>> generic data type and allows to cast to all other timestamp data >>>>>> types. >>>>>>>>>> This generic ability can be used for filter predicates as well >>>>> either >>>>>>>>>> through implicit or explicit casting. >>>>>>>>>> >>>>>>>>>> TIMESTAMP WITH TIME ZONE indeed contains more information to >>>>> describe >>>>>> a >>>>>>>>>> time point, but the type TIMESTAMP can cast to all other >> timestamp >>>>>> data >>>>>>>>>> types combining with session time zone as well, and it also can be >>>>>> used for >>>>>>>>>> filter predicates. For type casting between BIGINT and TIMESTAMP, >> I >>>>>> think >>>>>>>>>> the function way using TO_TIMEMTAMP()/FROM_UNIXTIMESTAMP() is more >>>>>> clear. >>>>>>>>>> >>>>>>>>>>> PROCTIME/ROWTIME should be time functions based on a long value. >>>>> Both >>>>>>>>>> System.currentMillis() and our watermark system work on long >> values. >>>>>> Those >>>>>>>>>> should return TIMESTAMP WITH LOCAL TIME ZONE because the main >>>>>> calculation >>>>>>>>>> should always happen based on UTC. >>>>>>>>>>> We discussed it in a different thread, but we should allow >> PROCTIME >>>>>>>>>> globally. People need a way to create instances of TIMESTAMP WITH >>>>>> LOCAL >>>>>>>>>> TIME ZONE. This is not considered in the current design doc. >>>>>>>>>>> Many pipelines contain UTC timestamps and thus it should be easy >> to >>>>>>>>>> create one. >>>>>>>>>>> Also, both CURRENT_TIMESTAMP and LOCALTIMESTAMP can work with >> this >>>>>> type >>>>>>>>>> because we should remember that TIMESTAMP WITH LOCAL TIME ZONE >>>>>> accepts all >>>>>>>>>> timestamp data types as casting target [1]. We could allow >> TIMESTAMP >>>>>> WITH >>>>>>>>>> TIME ZONE in the future for ROWTIME. >>>>>>>>>>> In any case, windows should simply adapt their behavior to the >>>>> passed >>>>>>>>>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is >>>>>> defined by >>>>>>>>>> considering the current session time zone. >>>>>>>>>> >>>>>>>>>> I also agree returning TIMESTAMP WITH LOCAL TIME ZONE for >> PROCTIME >>>>>> has >>>>>>>>>> more clear semantics, but I realized that user didn’t care the >> type >>>>>> but >>>>>>>>>> more about the expressed value they saw, and change the type from >>>>>> TIMESTAMP >>>>>>>>>> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that we >> need >>>>>>>>>> consider all places where the TIMESTAMP type used, and many >> builtin >>>>>>>>>> functions and UDFs doest not support TIMESTAMP WITH LOCAL TIME >> ZONE >>>>>> type. >>>>>>>>>> That means both user and Flink devs need to refactor the code(UDF, >>>>>> builtin >>>>>>>>>> functions, sql pipeline), to be honest, I didn’t see strong >>>>>> motivation that >>>>>>>>>> we have to do the pretty big refactor from user’s perspective and >>>>>>>>>> developer’s perspective. >>>>>>>>>> >>>>>>>>>> In one word, both your suggestion and my proposal can resolve >> almost >>>>>> all >>>>>>>>>> user problems,the divergence is whether we need to spend pretty >>>>>> energy just >>>>>>>>>> to get a bit more accurate semantics? I think we need a >> tradeoff. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Leonard >>>>>>>>>> [1] >>>>>>>>>> >>>>>> >> https://trino.io/docs/current/functions/datetime.html#current_timestamp >>>>> < >>>>>>>>>> >>>>>> >> https://trino.io/docs/current/functions/datetime.html#current_timestamp> >>>>>>>>>> [2] https://issues.apache.org/jira/browse/SPARK-30374 < >>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-30374> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2021-01-22,00:53,Timo Walther <twal...@apache.org> : >>>>>>>>>>> >>>>>>>>>>> Hi Leonard, >>>>>>>>>>> >>>>>>>>>>> thanks for working on this topic. I agree that time handling is >> not >>>>>>>>>> easy in Flink at the moment. We added new time data types (and >> some >>>>>> are >>>>>>>>>> still not supported which even further complicates things like >>>>>> TIME(9)). We >>>>>>>>>> should definitely improve this situation for users. >>>>>>>>>>> >>>>>>>>>>> This is a pretty opinionated topic and it seems that the SQL >>>>> standard >>>>>>>>>> is not really deciding this but is at least supporting. So let me >>>>>> express >>>>>>>>>> my opinion for the most important functions: >>>>>>>>>>> >>>>>>>>>>> LOCALDATE / LOCALTIME / LOCALTIMESTAMP >>>>>>>>>>> >>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP >>>>>>>>>>> >>>>>>>>>>> I think those are the most obvious ones because the LOCAL >> indicates >>>>>>>>>> that the locality should be materialized into the result and any >>>>> time >>>>>> zone >>>>>>>>>> information (coming from session config or data) is not important >>>>>>>>>> afterwards. >>>>>>>>>>> >>>>>>>>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP >>>>>>>>>>> >>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP >>>>>>>>>>> >>>>>>>>>>> I'm very sceptical about this behavior. Almost all mature systems >>>>>>>>>> (Oracle, Postgres) and new high quality systems (Presto, >> Snowflake) >>>>>> use a >>>>>>>>>> data type with some degree of time zone information encoded. In a >>>>>>>>>> globalized world with businesses spanning different regions, I >> think >>>>>> we >>>>>>>>>> should do this as well. There should be a difference between >>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to >>>>>> choose >>>>>>>>>> which behavior they prefer for their pipeline. >>>>>>>>>>> >>>>>>>>>>> If we would design this from scatch, I would suggest the >> following: >>>>>>>>>>> >>>>>>>>>>> - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE / >>>>>>>>>> LOCALTIME for materialized timestamp parts >>>>>>>>>>> >>>>>>>>>>> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to >>>>>>>>>> materialize all session time information into every record. It it >>>>> the >>>>>> most >>>>>>>>>> generic data type and allows to cast to all other timestamp data >>>>>> types. >>>>>>>>>> This generic ability can be used for filter predicates as well >>>>> either >>>>>>>>>> through implicit or explicit casting. >>>>>>>>>>> >>>>>>>>>>> PROCTIME/ROWTIME should be time functions based on a long value. >>>>> Both >>>>>>>>>> System.currentMillis() and our watermark system work on long >> values. >>>>>> Those >>>>>>>>>> should return TIMESTAMP WITH LOCAL TIME ZONE because the main >>>>>> calculation >>>>>>>>>> should always happen based on UTC. We discussed it in a different >>>>>> thread, >>>>>>>>>> but we should allow PROCTIME globally. People need a way to create >>>>>>>>>> instances of TIMESTAMP WITH LOCAL TIME ZONE. This is not >> considered >>>>>> in the >>>>>>>>>> current design doc. Many pipelines contain UTC timestamps and thus >>>>> it >>>>>>>>>> should be easy to create one. Also, both CURRENT_TIMESTAMP and >>>>>>>>>> LOCALTIMESTAMP can work with this type because we should remember >>>>> that >>>>>>>>>> TIMESTAMP WITH LOCAL TIME ZONE accepts all timestamp data types as >>>>>> casting >>>>>>>>>> target [1]. We could allow TIMESTAMP WITH TIME ZONE in the future >>>>> for >>>>>>>>>> ROWTIME. >>>>>>>>>>> >>>>>>>>>>> In any case, windows should simply adapt their behavior to the >>>>> passed >>>>>>>>>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is >>>>>> defined by >>>>>>>>>> considering the current session time zone. >>>>>>>>>>> >>>>>>>>>>> If we would like to design this with less effort required, we >> could >>>>>>>>>> think about returning TIMESTAMP WITH LOCAL TIME ZONE also for >>>>>>>>>> CURRENT_TIMESTAMP. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I will try to involve more people into this discussion. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>> >>>>>> >>>>> >> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 >>>>>>>>>> < >>>>>>>>>> >>>>>> >>>>> >> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2021-01-21,22:32,Leonard Xu <xbjt...@gmail.com> : >>>>>>>>>>>> Before the changes, as I am writing this reply, the local time >>>>> here >>>>>> is >>>>>>>>>> 2021-01-21 12:03:35 (Beijing time, UTC+8). >>>>>>>>>>>> And I tried these 5 functions in sql client, and got: >>>>>>>>>>>> >>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, >>>>>> CURRENT_DATE, >>>>>>>>>> CURRENT_TIME; >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>>> | EXPR$0 | EXPR$1 | >>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | >>>>>>>>>> 2021-01-21T04:03:35.228 | 2021-01-21 | 04:03:35.228 | >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>>> After the changes, the expected behavior will change to: >>>>>>>>>>>> >>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, >>>>>> CURRENT_DATE, >>>>>>>>>> CURRENT_TIME; >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>>> | EXPR$0 | EXPR$1 | >>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | >>>>>>>>>> 2021-01-21T12:03:35.228 | 2021-01-21 | 12:03:35.228 | >>>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>>> The return type of now(), proctime() and CURRENT_TIMESTAMP still >>>>> be >>>>>>>>>> TIMESTAMP; >>>>>>>>>>> >>>>>>>>>>> To Kurt, thanks for the intuitive case, it really clear, you’re >>>>>> wright >>>>>>>>>> that I want to propose to change the return value of these >>>>> functions. >>>>>> It’s >>>>>>>>>> the most important part of the topic from user's perspective. >>>>>>>>>>> >>>>>>>>>>>> I think this definitely deserves a FLIP. >>>>>>>>>>> To Jark, nice suggestion, I prepared a FLIP for this topic, and >>>>> will >>>>>>>>>> start the FLIP discussion soon. >>>>>>>>>>> >>>>>>>>>>>>> If use the default Flink SQL, the window time range of >> the >>>>>>>>>>>>> statistics is incorrect, then the statistical results will >>>>>> naturally >>>>>>>>>> be >>>>>>>>>>>>> incorrect. >>>>>>>>>>> To zhisheng, sorry to hear that this problem influenced your >>>>>> production >>>>>>>>>> jobs, Could you share your SQL pattern? we can have more inputs >>>>> and >>>>>> try >>>>>>>>>> to resolve them. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Leonard >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2021-01-21,14:19,Jark Wu <imj...@gmail.com> : >>>>>>>>>>> >>>>>>>>>>> Great examples to understand the problem and the proposed >> changes, >>>>>>>>>> @Kurt! >>>>>>>>>>> >>>>>>>>>>> Thanks Leonard for investigating this problem. >>>>>>>>>>> The time-zone problems around time functions and windows have >>>>>> bothered a >>>>>>>>>>> lot of users. It's time to fix them! >>>>>>>>>>> >>>>>>>>>>> The return value changes sound reasonable to me, and keeping the >>>>>> return >>>>>>>>>>> type unchanged will minimize the surprise to the users. >>>>>>>>>>> Besides that, I think it would be better to mention how this >>>>> affects >>>>>> the >>>>>>>>>>> window behaviors, and the interoperability with DataStream. >>>>>>>>>>> >>>>>>>>>>> I think this definitely deserves a FLIP. >>>>>>>>>>> >>>>>>>>>>> ==================================================== >>>>>>>>>>> >>>>>>>>>>> Hi zhisheng, >>>>>>>>>>> >>>>>>>>>>> Do you have examples to illustrate which case will get the wrong >>>>>> window >>>>>>>>>>> boundaries? >>>>>>>>>>> That will help to verify whether the proposed changes can solve >>>>> your >>>>>>>>>>> problem. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jark >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2021-01-21,12:54,zhisheng <173855...@qq.com> : >>>>>>>>>>> >>>>>>>>>>> Thanks to Leonard Xu for discussing this tricky topic. At >> present, >>>>>>>>>> there are many Flink jobs in our production environment that are >>>>> used >>>>>> to >>>>>>>>>> count day-level reports (eg: count PV/UV ). >>>>>>>>>>> >>>>>>>>>>> If use the default Flink SQL, the window time range of the >>>>>>>>>> statistics is incorrect, then the statistical results will >> naturally >>>>>> be >>>>>>>>>> incorrect. >>>>>>>>>>> >>>>>>>>>>> The user needs to deal with the time zone manually in order to >>>>> solve >>>>>>>>>> the problem. >>>>>>>>>>> >>>>>>>>>>> If Flink itself can solve these time zone issues, then I think it >>>>>> will >>>>>>>>>> be user-friendly. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Best!; >>>>>>>>>>> zhisheng >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2021-01-21,12:11,Kurt Young <ykt...@gmail.com> : >>>>>>>>>>> >>>>>>>>>>> cc this to user & user-zh mailing list because this will affect >>>>> lots >>>>>> of >>>>>>>>>> users, and also quite a lot of users >>>>>>>>>>> were asking questions around this topic. >>>>>>>>>>> >>>>>>>>>>> Let me try to understand this from user's perspective. >>>>>>>>>>> >>>>>>>>>>> Your proposal will affect five functions, which are: >>>>>>>>>>> PROCTIME() >>>>>>>>>>> NOW() >>>>>>>>>>> CURRENT_DATE >>>>>>>>>>> CURRENT_TIME >>>>>>>>>>> CURRENT_TIMESTAMP >>>>>>>>>>> Before the changes, as I am writing this reply, the local time >> here >>>>>> is >>>>>>>>>> 2021-01-21 12:03:35 (Beijing time, UTC+8). >>>>>>>>>>> And I tried these 5 functions in sql client, and got: >>>>>>>>>>> >>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, >>>>> CURRENT_DATE, >>>>>>>>>> CURRENT_TIME; >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>> | EXPR$0 | EXPR$1 | >>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | >>>>>>>>>> 2021-01-21T04:03:35.228 | 2021-01-21 | 04:03:35.228 | >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>> After the changes, the expected behavior will change to: >>>>>>>>>>> >>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, >>>>> CURRENT_DATE, >>>>>>>>>> CURRENT_TIME; >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>> | EXPR$0 | EXPR$1 | >>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME | >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | >>>>>>>>>> 2021-01-21T12:03:35.228 | 2021-01-21 | 12:03:35.228 | >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >> +-------------------------+-------------------------+-------------------------+--------------+--------------+ >>>>>>>>>>> The return type of now(), proctime() and CURRENT_TIMESTAMP still >> be >>>>>>>>>> TIMESTAMP; >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Kurt