Now we have 2 discussion threads on 3 mailing lists. Which one should
have prioity? Should I repost my large email here again?
I think it is good to inform and invite in the user mailing lists but
let's keep the FLIP discussion on the dev@ ML only.
Regards,
Timo
On 21.01.21 16:50, Leonard Xu wrote:
Hello, everyone
I want to start the discussion of FLIP-162: Consistent Flink SQL time
function behavior[1].
We’ve some initial discussion of several problematic functions in dev
mail list[2], and I think it's the right time to resolve them by a FLIP.
Currently some time function behaviors are wired to user, user can not
get local date/time/timestamp in their local time zone for time functions:
* CURRENT_DATE
* CURRENT_TIME
* CURRENT_TIMESTAMP
* NOW()
* PROCTIME()
Assume user's clock time is '*2021-01-20 07:52:52.270'* in Beijing
time(UTC+8), currently the unexpected values are returned when user
SELECT above functions in Flink SQL client
*Flink SQL> **SELECT NOW(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE,
CURRENT_TIME;*
*+-------------------------+-------------------------+-------------------------+--------------+--------------+*
*|**NOW() ** | PROCTIME() | CURRENT_TIMESTAMP |
CURRENT_DATE | CURRENT_TIME |*
*+-------------------------+-------------------------+-------------------------+--------------+--------------+*
*| 2021-01-19T23:52:52.270 | **2021-01-19T23:52:52.270** |
**2021-01-19T23:52:52.270** | 2021-01-19 | 23**:52:52.270** |*
*+-------------------------+-------------------------+-------------------------+--------------+--------------+*
Besides, the window with interval one day width based on PROCTIME() can
not collect correct data that belongs to the date '2021-01-20', because
some data was assigned to window '2021-01-19' due to the PROCTIME() does
not return local TIMESTAMP as user expected.
These problems come from these time-related functions like PROCTIME(),
NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning
time values based on UTC+0 time zone, this is an incorrect behavior from
my investigation[3].
I Invested all Flink time-related functions and compared with other DB
vendors like Pg,Presto, Hive, Spark, Snowflake, this topic will lead to
a comparison of the three types, i.e.
* TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE
* TIMESTAMP WITH LOCAL TIME ZONE
* TIMESTAMP WITH TIME ZONE
In order to better understand above three types, I wrote a document[4]
to help understand them better. You will found the behavior of them is
same with in Hadoop ecosystem from the document.*The document is
detailed and pretty long, it’s necessary to make the semantics clear(You
can focus on the FLIP and skip the document).*
In one word, to correct the behavior of above functions, we can change
the function return type or function return value. Both of them are
valid because SQL:2011 does not specify the function return type, and
every SQL engine vendor has its own implementation. For example the
CURRENT_TIMESTAMP function in the document[3], *Spark, Presto, Snowflake
*have different behaviors.
*I tend to only change the return value for these problematic functions
and introduce an option for compatibility consideration,* the detailed
proposal can be found in FLIP-162[1].
After corrected these function, user can get their expected return
values as following:
*Flink SQL> **SELECT NOW(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE,
CURRENT_TIME;*
*+-------------------------+-------------------------+-------------------------+--------------+--------------+*
*|**NOW() ** | PROCTIME() | CURRENT_TIMESTAMP |
CURRENT_DATE | CURRENT_TIME |*
*+-------------------------+-------------------------+-------------------------+--------------+--------------+*
*| 2021-01-20T07:52:52.270 | **2021-01-20T07:52:52.270** |
**2021-01-20T07:52:52.270** | 2021-01-20 | **07:52:52.270** |*
*+-------------------------+-------------------------+-------------------------+--------------+--------------+*
Looking forward to your feedback.
Best,
Leonard
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior>
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Correct-time-related-function-behavior-in-Flink-SQL-tc47989.html
<http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Correct-time-related-function-behavior-in-Flink-SQL-tc47989.html>
[3]
https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
<https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing>
[4]
https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing
<https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing>