Hi, Jark

> I have a minor suggestion:

> I think we will still suggest users use TIMESTAMP even if we have 
> TIMESTAMP_NTZ. Then it seems
> introducing TIMESTAMP_NTZ doesn't help much for users, but introduces more 
> learning costs.

I think your suggestion makes sense, we should suggest users use TIMESTAMP for 
TIMESTAMP WITHOUT TIME ZONE as we did now, updated as following:

        original type name :                                                    
                shortcut type name :
TIMESTAMP / TIMESTAMP WITHOUT TIME ZONE         <=> TIMESTAMP 
TIMESTAMP WITH LOCAL TIME ZONE                            <=> TIMESTAMP_LTZ  
TIMESTAMP WITH TIME ZONE                                         <=> 
TIMESTAMP_TZ     (supports them in the future)
Best,
Leonard


> 
> 
> 
> 
> 
> On Thu, 28 Jan 2021 at 18:52, Leonard Xu <xbjt...@gmail.com 
> <mailto:xbjt...@gmail.com>> wrote:
> 
>> Thanks all for sharing your opinions.
>> 
>> Looks like  we’ve reached a consensus about the topic.
>> 
>> @Timo:
>>> 1) Are we on the same page that LOCALTIMESTAMP returns TIMESTAMP and not
>> TIMESTAMP_LTZ? Maybe we should quickly list also LOCALTIME/LOCALDATE and
>> LOCALTIMESTAMP for completeness.
>> Yes, LOCALTIMESTAMP returns TIMESTAMP, LOCALTIME returns TIME, the
>> behavior of them is clear so I just listed them in the excel[1] of this
>> FLIP references.
>> 
>>> 2) Shall we add aliases for the timestamp types as part of this FLIP? I
>> see Snowflake supports TIMESTAMP_LTZ , TIMESTAMP_NTZ , TIMESTAMP_TZ [1]. I
>> think the discussion was quite cumbersome with the full string of
>> `TIMESTAMP WITH LOCAL TIME ZONE`. With this FLIP we are making this type
>> even more prominent. And important concepts should have a short name
>> because they are used frequently. According to the FLIP, we are introducing
>> the abbriviation already in function names like `TO_TIMESTAMP_LTZ`.
>> `TIMESTAMP_LTZ` could be treated similar to `STRING` for
>> `VARCHAR(MAX_INT)`, the serializable string representation would not change.
>> 
>> @Timo @Jark
>> Nice idea, I also suffered from the long name during the discussions, the
>> abbreviation will not only help us, but also makes it more convenient for
>> users. I list the abbreviation name mapping to support:
>> TIMESTAMP WITHOUT TIME ZONE         <=> TIMESTAMP_NTZ   (which synonyms
>> TIMESTAMP)
>> TIMESTAMP WITH LOCAL TIME ZONE    <=> TIMESTAMP_LTZ
>> TIMESTAMP WITH TIME ZONE                 <=> TIMESTAMP_TZ     (supports
>> them in the future)
>>> 3) I'm fine with supporting all conversion classes like
>> java.time.LocalDateTime, java.sql.Timestamp that TimestampType supported
>> for LocalZonedTimestampType. But we agree that Instant stays the default
>> conversion class right? The default extraction defined in [2] will not
>> change, correct?
>> Yes, Instant stays the default conversion class. The default
>> 
>>> 4) I would remove the comment "Flink supports TIME-related types with
>> precision well", because unfortunately this is still not correct. We still
>> have issues with TIME(9), it would be great if someone can finally fix that
>> though. Maybe the implementation of this FLIP would be a good time to fix
>> this issue.
>> You’re right, TIME(9) is not supported yet, I'll take account of TIME(9)
>> to the scope of this FLIP.
>> 
>> 
>> I’ve updated this FLIP[2] according your suggestions @Jark @Timo
>> I’ll start the vote soon if there’re no objections.
>> 
>> Best,
>> Leonard
>> 
>> [1]
>> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
>> <
>> https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
>>  
>> <https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>
>>> 
>> [2]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior
>>  
>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior>
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-162:+Consistent+Flink+SQL+time+function+behavior
>>  
>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-162:+Consistent+Flink+SQL+time+function+behavior>>
>> 
>> 
>>> 
>>> On 28.01.21 03:18, Jark Wu wrote:
>>>> Thanks Leonard for the further investigation.
>>>> I think we all agree we should correct the return value of
>>>> CURRENT_TIMESTAMP.
>>>> Regarding the return type of CURRENT_TIMESTAMP, I also agree
>> TIMESTAMP_LTZ
>>>> would be more worldwide useful. This may need more effort, but if this
>> is
>>>> the right direction, we should do it.
>>>> Regarding the CURRENT_TIME, if CURRENT_TIMESTAMP returns
>>>> TIMESTAMP_LTZ, then I think CURRENT_TIME shouldn't return TIME_TZ.
>>>> Otherwise, CURRENT_TIME will be quite special and strange.
>>>> Thus I think it has to return TIME type. Given that we already have
>>>> CURRENT_DATE which returns
>>>> DATE WITHOUT TIME ZONE, I think it's fine to return TIME WITHOUT TIME
>> ZONE
>>>> for CURRENT_TIME.
>>>> In a word, the updated FLIP looks good to me. I especially like the
>>>> proposed new function TO_TIMESTAMP_LTZ(numeric, [,scale]).
>>>> This will be very convenient to define rowtime on a long value which is
>> a
>>>> very common case and has been complained a lot in mailing list.
>>>> Best,
>>>> Jark
>>>> On Mon, 25 Jan 2021 at 21:12, Kurt Young <ykt...@gmail.com> wrote:
>>>>> Thanks Leonard for the detailed response and also the bad case about
>> option
>>>>> 1, these all
>>>>> make sense to me.
>>>>> 
>>>>> Also nice catch about conversion support of LocalZonedTimestampType, I
>>>>> think it actually
>>>>> makes sense to support java.sql.Timestamp as well as
>>>>> java.time.LocalDateTime. It also has
>>>>> a slight benefit that we might have a chance to run the udf which took
>> them
>>>>> as input parameter
>>>>> after we change the return type.
>>>>> 
>>>>> Regarding to the return type of CURRENT_TIME, I also think timezone
>>>>> information is not useful.
>>>>> To not expand this FLIP further, I'm lean to keep it as it is.
>>>>> 
>>>>> Best,
>>>>> Kurt
>>>>> 
>>>>> 
>>>>> On Mon, Jan 25, 2021 at 8:50 PM Leonard Xu <xbjt...@gmail.com> wrote:
>>>>> 
>>>>>> Hi, All
>>>>>> 
>>>>>> Thanks for your comments. I think all of the thread have agreed that:
>>>>>> (1) The return values of
>> CURRENT_TIME/CURRENT_TIMESTAMP/NOW()/PROCTIME()
>>>>>> are wrong.
>>>>>> (2) The LOCALTIME/LOCALTIMESTAMP and CURRENT_TIME/CURRENT_TIMESTAMP
>>>>> should
>>>>>> be different whether from SQL standard’s perspective or mature
>> systems.
>>>>>> (3) The semantics of three TIMESTAMP types in Flink SQL follows the
>> SQL
>>>>>> standard and also keeps the same with other 'good' vendors.
>>>>>>    TIMESTAMP                                   =>  A literal in
>>>>>> ‘yyyy-MM-dd HH:mm:ss’ format to describe a time, does not contain
>>>>> timezone
>>>>>> info, can not represent an absolute time point.
>>>>>>    TIMESTAMP WITH LOCAL ZONE =>  Records the elapsed time from
>> absolute
>>>>>> time point origin, can represent an absolute time point, requires
>> local
>>>>>> time zone when expressed with ‘yyyy-MM-dd HH:mm:ss’ format.
>>>>>>    TIMESTAMP WITH TIME ZONE    =>  Consists of time zone info and a
>>>>>> literal in ‘yyyy-MM-dd HH:mm:ss’ format to describe time, can
>> represent
>>>>> an
>>>>>> absolute time point.
>>>>>> 
>>>>>> 
>>>>>> Currently we've two ways to correct
>>>>>> CURRENT_TIME/CURRENT_TIMESTAMP/NOW()/PROCTIME().
>>>>>> 
>>>>>> option (1): As the FLIP proposed, change the return value  from UTC
>>>>>> timezone to local timezone.
>>>>>>        Pros:   (1) The change looks smaller to users and developers
>> (2)
>>>>>> There're many SQL engines adopted this way
>>>>>>        Cons:  (1) connector devs may confuse the underlying value of
>>>>>> TimestampData which needs to change according to data type  (2) I
>> thought
>>>>>> about this weekend. Unfortunately I found a bad case:
>>>>>> 
>>>>>> The proposal is fine if we only use it in FLINK SQL world, but we
>> need to
>>>>>> consider the conversion between Table/DataStream, assume a record
>>>>> produced
>>>>>> in UTC+0 timezone with TIMESTAMP '1970-01-01 08:00:44'  and the Flink
>> SQL
>>>>>> processes the data with session time zone 'UTC+8', if the sql program
>>>>> need
>>>>>> to convert the Table to DataStream, then we need to calculate the
>>>>> timestamp
>>>>>> in StreamRecord with session time zone (UTC+8), then we will get 44 in
>>>>>> DataStream program, but it is wrong because the expected value should
>> be
>>>>> (8
>>>>>> * 60 * 60 + 44). The corner case tell us that the ROWTIME/PROCTIME in
>>>>> Flink
>>>>>> are based on UTC+0, when correct the PROCTIME() function, the better
>> way
>>>>> is
>>>>>> to use TIMESTAMP WITH LOCAL TIME ZONE which keeps same long value with
>>>>> time
>>>>>> based on UTC+0 and can be expressed with  local timezone.
>>>>>> 
>>>>>> option (2) : As we considered in the FLIP as well as @Timo suggested,
>>>>>> change the return type to TIMESTAMP WITH LOCAL TIME ZONE, the
>> expressed
>>>>>> value depends on the local time zone.
>>>>>>        Pros: (1) Make Flink SQL more close to SQL standard  (2) Can
>> deal
>>>>>> the conversion between Table/DataStream well
>>>>>>        Cons: (1) We need to discuss the return value/type of
>>>>> CURRENT_TIME
>>>>>> function (2) The change is bigger to users, we need to support
>> TIMESTAMP
>>>>>> WITH LOCAL TIME ZONE in connectors/formats as well as custom
>> connectors.
>>>>>>                   (3)The TIMESTAMP WITH LOCAL TIME ZONE support is
>> weak
>>>>>> in Flink, thus we need some improvement,but the workload does not
>> matter
>>>>>> as long as we are doing the right thing ^_^
>>>>>> 
>>>>>> Due to the above bad case for option (1). I think option 2 should be
>>>>>> adopted,
>>>>>> But we also need to consider some problems:
>>>>>> (1) More conversion classes like LocalDateTime, sql.Timestamp should
>> be
>>>>>> supported for LocalZonedTimestampType to resolve the UDF compatibility
>>>>> issue
>>>>>> (2) The timezone offset for window size of one day should still be
>>>>>> considered
>>>>>> (3) All connectors/formats should supports TIMESTAMP WITH LOCAL TIME
>> ZONE
>>>>>> well and we also should record in document
>>>>>> I’ll update these sections of FLIP-162.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> We also need to discuss the CURRENT_TIME function. I know the standard
>>>>> way
>>>>>> is using TIME WITH TIME ZONE(there's no TIME WITH LOCAL TIME ZONE),
>> but
>>>>> we
>>>>>> don't support this type yet and I don't see strong motivation to
>> support
>>>>> it
>>>>>> so far.
>>>>>> Compared to CURRENT_TIMESTAMP, the CURRENT_TIME can not represent an
>>>>>> absolute time point which should be considered as a string consisting
>> of
>>>>> a
>>>>>> time with 'HH:mm:ss' format and time zone info.  We have several
>> options
>>>>>> for this:
>>>>>> (1) We can forbid CURRENT_TIME as @Timo proposed to make all Flink SQL
>>>>>> functions follow the standard well,  in this way, we need to offer
>> some
>>>>>> guidance for user upgrading Flink versions.
>>>>>> (2) We can also support it from a user's perspective who has used
>>>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP, btw,Snowflake also
>> returns
>>>>>> TIME type.
>>>>>> (3) Returns TIMESTAMP WITH LOCAL TIME ZONE to make it equal to
>>>>>> CURRENT_TIMESTAMP as Calcite did.
>>>>>> 
>>>>>> I can image (1) which we don't want to left a bad smell in Flink SQL,
>>>>> and
>>>>>> I also accept (2) because I think users do not consider time zone
>> issues
>>>>>> when they use CURRENT_DATE/CURRENT_TIME, and the timezone info in
>> time is
>>>>>> not very useful.
>>>>>> 
>>>>>> I don’t have a strong opinion  for them.  What do others think?
>>>>>> 
>>>>>> 
>>>>>> I hope I've addressed your concerns. @Timo @Kurt
>>>>>> 
>>>>>> Best,
>>>>>> Leonard
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> Most of the mature systems have a clear difference between
>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. I wouldn't take Spark or Hive
>> as a
>>>>>> good example. Snowflake decided for TIMESTAMP WITH LOCAL TIME ZONE.
>> As I
>>>>>> mentioned in the last comment, I could also imagine this behavior for
>>>>>> Flink. But in any case, there should be some time zone information
>>>>>> considered in order to cast to all other types.
>>>>>>> 
>>>>>>>>>> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL
>>>>>> standard, but
>>>>>>>>>> LOCALDATE not, I don’t think it’s a good idea that dropping
>>>>>> functions which
>>>>>>>>>> SQL standard supported and introducing a replacement which SQL
>>>>>> standard not
>>>>>>>>>> reminded.
>>>>>>> 
>>>>>>> We can still add those functions in the future. But since we don't
>>>>> offer
>>>>>> a TIME WITH TIME ZONE, it is better to not support this function at
>> all
>>>>> for
>>>>>> now. And by the way, this is exactly the behavior that also Microsoft
>> SQL
>>>>>> Server does: it also just supports CURRENT_TIMESTAMP (but it returns
>>>>>> TIMESTAMP without a zone which completes the confusion).
>>>>>>> 
>>>>>>>>>> I also agree returning  TIMESTAMP WITH LOCAL TIME ZONE for
>> PROCTIME
>>>>>> has
>>>>>>>>>> more clear semantics, but I realized that user didn’t care the
>> type
>>>>>> but
>>>>>>>>>> more about the expressed value they saw, and change the type from
>>>>>> TIMESTAMP
>>>>>>>>>> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that we
>> need
>>>>>>>>>> consider all places where the TIMESTAMP type used
>>>>>>> 
>>>>>>> From a UDF perspective, I think nothing will change. The new type
>>>>> system
>>>>>> and type inference were designed to support all these cases. There is
>> a
>>>>>> reason why Java has adopted Joda time, because it is hard to come up
>>>>> with a
>>>>>> good time library. That's why also we and the other Hadoop ecosystem
>>>>> folks
>>>>>> have decided for 3 different kinds of LocalDateTime, ZonedDateTime,
>> and
>>>>>> Instance. It makes the library more complex, but time is a complex
>> topic.
>>>>>>> 
>>>>>>> I also doubt that many users work with only one time zone. Take the
>> US
>>>>>> as an example, a country with 3 different timezones. Somebody working
>>>>> with
>>>>>> US data cannot properly see the data points with just LOCAL TIME ZONE.
>>>>> But
>>>>>> on the other hand, a lot of event data is stored using a UTC
>> timestamp.
>>>>>>> 
>>>>>>> 
>>>>>>>>> Before jumping into technique details, let's take a step back to
>>>>>> discuss
>>>>>>>>> user experience.
>>>>>>>>> 
>>>>>>>>> The first important question is what kind of date and time will
>>>>> Flink
>>>>>>>>> display when users call
>>>>>>>>>  CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they are
>>>>>> similar).
>>>>>>>>> 
>>>>>>>>> Should it always display the date and time in UTC or in the user's
>>>>>> time
>>>>>>>>> zone?
>>>>>>> 
>>>>>>> @Kurt: I think we all agree that the current behavior with just
>> showing
>>>>>> UTC is wrong. Also, we all agree that when calling CURRENT_TIMESTAMP
>> or
>>>>>> PROCTIME a user would like to see the time in it's current time zone.
>>>>>>> 
>>>>>>> As you said, "my wall clock time".
>>>>>>> 
>>>>>>> However, the question is what is the data type of what you "see". If
>>>>> you
>>>>>> pass this record on to a different system, operator, or different
>>>>> cluster,
>>>>>> should the "my" get lost or materialized into the record?
>>>>>>> 
>>>>>>> TIMESTAMP -> completely lost and could cause confusion in a different
>>>>>> system
>>>>>>> 
>>>>>>> TIMESTAMP WITH LOCAL TIME ZONE -> at least the UTC is correct, so you
>>>>>> can provide a new local time zone
>>>>>>> 
>>>>>>> TIMESTAMP WITH TIME ZONE -> also "your" location is persisted
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Timo
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 22.01.21 09:38, Kurt Young wrote:
>>>>>>>> Forgot one more thing. Continue with displaying in UTC. As a user,
>> if
>>>>>> Flink
>>>>>>>> want to display the timestamp
>>>>>>>> in UTC, why don't we offer something like UTC_TIMESTAMP?
>>>>>>>> Best,
>>>>>>>> Kurt
>>>>>>>> On Fri, Jan 22, 2021 at 4:33 PM Kurt Young <ykt...@gmail.com>
>> wrote:
>>>>>>>>> Before jumping into technique details, let's take a step back to
>>>>>> discuss
>>>>>>>>> user experience.
>>>>>>>>> 
>>>>>>>>> The first important question is what kind of date and time will
>> Flink
>>>>>>>>> display when users call
>>>>>>>>> CURRENT_TIMESTAMP and maybe also PROCTIME (if we think they are
>>>>>> similar).
>>>>>>>>> 
>>>>>>>>> Should it always display the date and time in UTC or in the user's
>>>>> time
>>>>>>>>> zone? I think this part is the
>>>>>>>>> reason that surprised lots of users. If we forget about the type
>> and
>>>>>>>>> internal representation of these
>>>>>>>>> two methods, as a user, my instinct tells me that these two methods
>>>>>> should
>>>>>>>>> display my wall clock time.
>>>>>>>>> 
>>>>>>>>> Display time in UTC? I'm not sure, why I should care about UTC
>> time?
>>>>> I
>>>>>>>>> want to get my current timestamp.
>>>>>>>>> For those users who have never gone abroad, they might not even be
>>>>>> able to
>>>>>>>>> realize that this is affected
>>>>>>>>> by the time zone.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Kurt
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Fri, Jan 22, 2021 at 12:25 PM Leonard Xu <xbjt...@gmail.com>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks @Timo for the detailed reply, let's go on this topic on
>> this
>>>>>>>>>> discussion,  I've merged all mails to this discussion.
>>>>>>>>>> 
>>>>>>>>>>> LOCALDATE / LOCALTIME / LOCALTIMESTAMP
>>>>>>>>>>> 
>>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP
>>>>>>>>>>> 
>>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP
>>>>>>>>>>> 
>>>>>>>>>>> I'm very sceptical about this behavior. Almost all mature systems
>>>>>>>>>> (Oracle, Postgres) and new high quality systems (Presto,
>> Snowflake)
>>>>>> use a
>>>>>>>>>> data type with some degree of time zone information encoded. In a
>>>>>>>>>> globalized world with businesses spanning different regions, I
>> think
>>>>>> we
>>>>>>>>>> should do this as well. There should be a difference between
>>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to
>>>>>> choose
>>>>>>>>>> which behavior they prefer for their pipeline.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I know that the two series should be different at first glance,
>> but
>>>>>>>>>> different SQL engines can have their own explanations,for example,
>>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP are synonyms in Snowflake[1]
>>>>> and
>>>>>> has
>>>>>>>>>> no difference, and Spark only supports the later one and doesn’t
>>>>>> support
>>>>>>>>>> LOCALTIME/LOCALTIMESTAMP[2].
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> If we would design this from scatch, I would suggest the
>> following:
>>>>>>>>>>> 
>>>>>>>>>>> - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE /
>>>>>>>>>> LOCALTIME for materialized timestamp parts
>>>>>>>>>> 
>>>>>>>>>> The function CURRENT_DATE/CURRENT_TIME is supporting in SQL
>>>>> standard,
>>>>>> but
>>>>>>>>>> LOCALDATE not, I don’t think it’s a good idea that dropping
>>>>> functions
>>>>>> which
>>>>>>>>>> SQL standard supported and introducing a replacement which SQL
>>>>>> standard not
>>>>>>>>>> reminded.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to
>>>>>>>>>> materialize all session time information into every record. It it
>>>>> the
>>>>>> most
>>>>>>>>>> generic data type and allows to cast to all other timestamp data
>>>>>> types.
>>>>>>>>>> This generic ability can be used for filter predicates as well
>>>>> either
>>>>>>>>>> through implicit or explicit casting.
>>>>>>>>>> 
>>>>>>>>>> TIMESTAMP WITH TIME ZONE indeed contains more information to
>>>>> describe
>>>>>> a
>>>>>>>>>> time point, but the type TIMESTAMP  can cast to all other
>> timestamp
>>>>>> data
>>>>>>>>>> types combining with session time zone as well, and it also can be
>>>>>> used for
>>>>>>>>>> filter predicates. For type casting between BIGINT and TIMESTAMP,
>> I
>>>>>> think
>>>>>>>>>> the function way using TO_TIMEMTAMP()/FROM_UNIXTIMESTAMP() is more
>>>>>> clear.
>>>>>>>>>> 
>>>>>>>>>>> PROCTIME/ROWTIME should be time functions based on a long value.
>>>>> Both
>>>>>>>>>> System.currentMillis() and our watermark system work on long
>> values.
>>>>>> Those
>>>>>>>>>> should return TIMESTAMP WITH LOCAL TIME ZONE because the main
>>>>>> calculation
>>>>>>>>>> should always happen based on UTC.
>>>>>>>>>>> We discussed it in a different thread, but we should allow
>> PROCTIME
>>>>>>>>>> globally. People need a way to create instances of TIMESTAMP WITH
>>>>>> LOCAL
>>>>>>>>>> TIME ZONE. This is not considered in the current design doc.
>>>>>>>>>>> Many pipelines contain UTC timestamps and thus it should be easy
>> to
>>>>>>>>>> create one.
>>>>>>>>>>> Also, both CURRENT_TIMESTAMP and LOCALTIMESTAMP can work with
>> this
>>>>>> type
>>>>>>>>>> because we should remember that TIMESTAMP WITH LOCAL TIME ZONE
>>>>>> accepts all
>>>>>>>>>> timestamp data types as casting target [1]. We could allow
>> TIMESTAMP
>>>>>> WITH
>>>>>>>>>> TIME ZONE in the future for ROWTIME.
>>>>>>>>>>> In any case, windows should simply adapt their behavior to the
>>>>> passed
>>>>>>>>>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is
>>>>>> defined by
>>>>>>>>>> considering the current session time zone.
>>>>>>>>>> 
>>>>>>>>>> I also agree returning  TIMESTAMP WITH LOCAL TIME ZONE for
>> PROCTIME
>>>>>> has
>>>>>>>>>> more clear semantics, but I realized that user didn’t care the
>> type
>>>>>> but
>>>>>>>>>> more about the expressed value they saw, and change the type from
>>>>>> TIMESTAMP
>>>>>>>>>> to TIMESTAMP WITH LOCAL TIME ZONE brings huge refactor that we
>> need
>>>>>>>>>> consider all places where the TIMESTAMP type used, and many
>> builtin
>>>>>>>>>> functions and UDFs doest not support  TIMESTAMP WITH LOCAL TIME
>> ZONE
>>>>>> type.
>>>>>>>>>> That means both user and Flink devs need to refactor the code(UDF,
>>>>>> builtin
>>>>>>>>>> functions, sql pipeline), to be honest, I didn’t see strong
>>>>>> motivation that
>>>>>>>>>> we have to do the pretty big refactor from user’s perspective and
>>>>>>>>>> developer’s perspective.
>>>>>>>>>> 
>>>>>>>>>> In one word, both your suggestion and my proposal can resolve
>> almost
>>>>>> all
>>>>>>>>>> user problems,the divergence is whether we need to spend pretty
>>>>>> energy just
>>>>>>>>>> to get a bit more accurate semantics?   I think we need a
>> tradeoff.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Leonard
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>> 
>> https://trino.io/docs/current/functions/datetime.html#current_timestamp
>>>>> <
>>>>>>>>>> 
>>>>>> 
>> https://trino.io/docs/current/functions/datetime.html#current_timestamp>
>>>>>>>>>> [2] https://issues.apache.org/jira/browse/SPARK-30374 <
>>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-30374>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2021-01-22,00:53,Timo Walther <twal...@apache.org> :
>>>>>>>>>>> 
>>>>>>>>>>> Hi Leonard,
>>>>>>>>>>> 
>>>>>>>>>>> thanks for working on this topic. I agree that time handling is
>> not
>>>>>>>>>> easy in Flink at the moment. We added new time data types (and
>> some
>>>>>> are
>>>>>>>>>> still not supported which even further complicates things like
>>>>>> TIME(9)). We
>>>>>>>>>> should definitely improve this situation for users.
>>>>>>>>>>> 
>>>>>>>>>>> This is a pretty opinionated topic and it seems that the SQL
>>>>> standard
>>>>>>>>>> is not really deciding this but is at least supporting. So let me
>>>>>> express
>>>>>>>>>> my opinion for the most important functions:
>>>>>>>>>>> 
>>>>>>>>>>> LOCALDATE / LOCALTIME / LOCALTIMESTAMP
>>>>>>>>>>> 
>>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP
>>>>>>>>>>> 
>>>>>>>>>>> I think those are the most obvious ones because the LOCAL
>> indicates
>>>>>>>>>> that the locality should be materialized into the result and any
>>>>> time
>>>>>> zone
>>>>>>>>>> information (coming from session config or data) is not important
>>>>>>>>>> afterwards.
>>>>>>>>>>> 
>>>>>>>>>>> CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP
>>>>>>>>>>> 
>>>>>>>>>>> --> uses session time zone, returns DATE/TIME/TIMESTAMP
>>>>>>>>>>> 
>>>>>>>>>>> I'm very sceptical about this behavior. Almost all mature systems
>>>>>>>>>> (Oracle, Postgres) and new high quality systems (Presto,
>> Snowflake)
>>>>>> use a
>>>>>>>>>> data type with some degree of time zone information encoded. In a
>>>>>>>>>> globalized world with businesses spanning different regions, I
>> think
>>>>>> we
>>>>>>>>>> should do this as well. There should be a difference between
>>>>>>>>>> CURRENT_TIMESTAMP and LOCALTIMESTAMP. And users should be able to
>>>>>> choose
>>>>>>>>>> which behavior they prefer for their pipeline.
>>>>>>>>>>> 
>>>>>>>>>>> If we would design this from scatch, I would suggest the
>> following:
>>>>>>>>>>> 
>>>>>>>>>>> - drop CURRENT_DATE / CURRENT_TIME and let users pick LOCALDATE /
>>>>>>>>>> LOCALTIME for materialized timestamp parts
>>>>>>>>>>> 
>>>>>>>>>>> - CURRENT_TIMESTAMP should return a TIMESTAMP WITH TIME ZONE to
>>>>>>>>>> materialize all session time information into every record. It it
>>>>> the
>>>>>> most
>>>>>>>>>> generic data type and allows to cast to all other timestamp data
>>>>>> types.
>>>>>>>>>> This generic ability can be used for filter predicates as well
>>>>> either
>>>>>>>>>> through implicit or explicit casting.
>>>>>>>>>>> 
>>>>>>>>>>> PROCTIME/ROWTIME should be time functions based on a long value.
>>>>> Both
>>>>>>>>>> System.currentMillis() and our watermark system work on long
>> values.
>>>>>> Those
>>>>>>>>>> should return TIMESTAMP WITH LOCAL TIME ZONE because the main
>>>>>> calculation
>>>>>>>>>> should always happen based on UTC. We discussed it in a different
>>>>>> thread,
>>>>>>>>>> but we should allow PROCTIME globally. People need a way to create
>>>>>>>>>> instances of TIMESTAMP WITH LOCAL TIME ZONE. This is not
>> considered
>>>>>> in the
>>>>>>>>>> current design doc. Many pipelines contain UTC timestamps and thus
>>>>> it
>>>>>>>>>> should be easy to create one. Also, both CURRENT_TIMESTAMP and
>>>>>>>>>> LOCALTIMESTAMP can work with this type because we should remember
>>>>> that
>>>>>>>>>> TIMESTAMP WITH LOCAL TIME ZONE accepts all timestamp data types as
>>>>>> casting
>>>>>>>>>> target [1]. We could allow TIMESTAMP WITH TIME ZONE in the future
>>>>> for
>>>>>>>>>> ROWTIME.
>>>>>>>>>>> 
>>>>>>>>>>> In any case, windows should simply adapt their behavior to the
>>>>> passed
>>>>>>>>>> timestamp type. And with TIMESTAMP WITH LOCAL TIME ZONE a day is
>>>>>> defined by
>>>>>>>>>> considering the current session time zone.
>>>>>>>>>>> 
>>>>>>>>>>> If we would like to design this with less effort required, we
>> could
>>>>>>>>>> think about returning TIMESTAMP WITH LOCAL TIME ZONE also for
>>>>>>>>>> CURRENT_TIMESTAMP.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I will try to involve more people into this discussion.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Timo
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3
>>>>>>>>>> <
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/Data-Types.html#GUID-E7CA339A-2093-4FE4-A36E-1D09593591D3
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2021-01-21,22:32,Leonard Xu <xbjt...@gmail.com> :
>>>>>>>>>>>> Before the changes, as I am writing this reply, the local time
>>>>> here
>>>>>> is
>>>>>>>>>> 2021-01-21 12:03:35 (Beijing time, UTC+8).
>>>>>>>>>>>> And I tried these 5 functions in sql client, and got:
>>>>>>>>>>>> 
>>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP,
>>>>>> CURRENT_DATE,
>>>>>>>>>> CURRENT_TIME;
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>>> |                  EXPR$0 |                  EXPR$1 |
>>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |
>>>>>>>>>> 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>>> After the changes, the expected behavior will change to:
>>>>>>>>>>>> 
>>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP,
>>>>>> CURRENT_DATE,
>>>>>>>>>> CURRENT_TIME;
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>>> |                  EXPR$0 |                  EXPR$1 |
>>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |
>>>>>>>>>> 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>>> The return type of now(), proctime() and CURRENT_TIMESTAMP still
>>>>> be
>>>>>>>>>> TIMESTAMP;
>>>>>>>>>>> 
>>>>>>>>>>> To Kurt, thanks  for the intuitive case, it really clear, you’re
>>>>>> wright
>>>>>>>>>> that I want to propose to change the return value of these
>>>>> functions.
>>>>>> It’s
>>>>>>>>>> the most important part of the topic from user's perspective.
>>>>>>>>>>> 
>>>>>>>>>>>> I think this definitely deserves a FLIP.
>>>>>>>>>>> To Jark,  nice suggestion, I prepared a FLIP for this topic, and
>>>>> will
>>>>>>>>>> start the FLIP discussion soon.
>>>>>>>>>>> 
>>>>>>>>>>>>> If use the default Flink SQL,&nbsp; the window time range of
>> the
>>>>>>>>>>>>> statistics is incorrect, then the statistical results will
>>>>>> naturally
>>>>>>>>>> be
>>>>>>>>>>>>> incorrect.
>>>>>>>>>>> To zhisheng, sorry to hear that this problem influenced your
>>>>>> production
>>>>>>>>>> jobs,  Could you share your SQL pattern?  we can have more inputs
>>>>> and
>>>>>> try
>>>>>>>>>> to resolve them.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Leonard
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2021-01-21,14:19,Jark Wu <imj...@gmail.com> :
>>>>>>>>>>> 
>>>>>>>>>>> Great examples to understand the problem and the proposed
>> changes,
>>>>>>>>>> @Kurt!
>>>>>>>>>>> 
>>>>>>>>>>> Thanks Leonard for investigating this problem.
>>>>>>>>>>> The time-zone problems around time functions and windows have
>>>>>> bothered a
>>>>>>>>>>> lot of users. It's time to fix them!
>>>>>>>>>>> 
>>>>>>>>>>> The return value changes sound reasonable to me, and keeping the
>>>>>> return
>>>>>>>>>>> type unchanged will minimize the surprise to the users.
>>>>>>>>>>> Besides that, I think it would be better to mention how this
>>>>> affects
>>>>>> the
>>>>>>>>>>> window behaviors, and the interoperability with DataStream.
>>>>>>>>>>> 
>>>>>>>>>>> I think this definitely deserves a FLIP.
>>>>>>>>>>> 
>>>>>>>>>>> ====================================================
>>>>>>>>>>> 
>>>>>>>>>>> Hi zhisheng,
>>>>>>>>>>> 
>>>>>>>>>>> Do you have examples to illustrate which case will get the wrong
>>>>>> window
>>>>>>>>>>> boundaries?
>>>>>>>>>>> That will help to verify whether the proposed changes can solve
>>>>> your
>>>>>>>>>>> problem.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jark
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2021-01-21,12:54,zhisheng <173855...@qq.com> :
>>>>>>>>>>> 
>>>>>>>>>>> Thanks to Leonard Xu for discussing this tricky topic. At
>> present,
>>>>>>>>>> there are many Flink jobs in our production environment that are
>>>>> used
>>>>>> to
>>>>>>>>>> count day-level reports (eg: count PV/UV ).&nbsp;
>>>>>>>>>>> 
>>>>>>>>>>> If use the default Flink SQL,&nbsp; the window time range of the
>>>>>>>>>> statistics is incorrect, then the statistical results will
>> naturally
>>>>>> be
>>>>>>>>>> incorrect.&nbsp;
>>>>>>>>>>> 
>>>>>>>>>>> The user needs to deal with the time zone manually in order to
>>>>> solve
>>>>>>>>>> the problem.&nbsp;
>>>>>>>>>>> 
>>>>>>>>>>> If Flink itself can solve these time zone issues, then I think it
>>>>>> will
>>>>>>>>>> be user-friendly.
>>>>>>>>>>> 
>>>>>>>>>>> Thank you
>>>>>>>>>>> 
>>>>>>>>>>> Best!;
>>>>>>>>>>> zhisheng
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2021-01-21,12:11,Kurt Young <ykt...@gmail.com> :
>>>>>>>>>>> 
>>>>>>>>>>> cc this to user & user-zh mailing list because this will affect
>>>>> lots
>>>>>> of
>>>>>>>>>> users, and also quite a lot of users
>>>>>>>>>>> were asking questions around this topic.
>>>>>>>>>>> 
>>>>>>>>>>> Let me try to understand this from user's perspective.
>>>>>>>>>>> 
>>>>>>>>>>> Your proposal will affect five functions, which are:
>>>>>>>>>>> PROCTIME()
>>>>>>>>>>> NOW()
>>>>>>>>>>> CURRENT_DATE
>>>>>>>>>>> CURRENT_TIME
>>>>>>>>>>> CURRENT_TIMESTAMP
>>>>>>>>>>> Before the changes, as I am writing this reply, the local time
>> here
>>>>>> is
>>>>>>>>>> 2021-01-21 12:03:35 (Beijing time, UTC+8).
>>>>>>>>>>> And I tried these 5 functions in sql client, and got:
>>>>>>>>>>> 
>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP,
>>>>> CURRENT_DATE,
>>>>>>>>>> CURRENT_TIME;
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>> |                  EXPR$0 |                  EXPR$1 |
>>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 |
>>>>>>>>>> 2021-01-21T04:03:35.228 |   2021-01-21 | 04:03:35.228 |
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>> After the changes, the expected behavior will change to:
>>>>>>>>>>> 
>>>>>>>>>>> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP,
>>>>> CURRENT_DATE,
>>>>>>>>>> CURRENT_TIME;
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>> |                  EXPR$0 |                  EXPR$1 |
>>>>>>>>>> CURRENT_TIMESTAMP | CURRENT_DATE | CURRENT_TIME |
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 |
>>>>>>>>>> 2021-01-21T12:03:35.228 |   2021-01-21 | 12:03:35.228 |
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> 
>>>>> 
>> +-------------------------+-------------------------+-------------------------+--------------+--------------+
>>>>>>>>>>> The return type of now(), proctime() and CURRENT_TIMESTAMP still
>> be
>>>>>>>>>> TIMESTAMP;
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Kurt

Reply via email to