Hey Dawid, > val sink = new TestingAppendSink > tEnv.sqlQuery(sql).toDataStream.addSink(sink) > env.execute() > } > > and you get the misleading results for timestamp_ltz: > > > a,2020-10-10T00:00,2020-10-10T00:00:05,2020-10-09T16:00:04.999Z,4,11.10,5.0,1.0,2,Hi|Comment#1
The reason why we saw the misleading result is that TestingAppendSink store timestamp_ltz(Instant type)data in string format and we read it from the storage without considering session timezone, the recommend practice should be that we store timestamp_ltz data in epoch milliseconds (long format) and read it in long format, if we want to see the timestamp literal we should use local(session) timezone to translate the epoch milliseconds, and this is the semantics of SQL type TIMESTAMP WITH LOCAL TIME ZONE. If you deep dive to test testEventTimeTumbleWindow, you will find this test’s session time zone is Asia/Shanghai, and with this timezone, the serialized string value 2020-10-09T16:00:04.999Z of Instant should be translated to 2020-10-10 00:00:04.999. Another reason why the result is misleading, as a test connector( memory storage), TestingAppendSink support argument timezone, but unfortunately it doesn’t consider Instant type, it serialized Instant as timestamp literal string directly without considering session time zone, please see TestSinkUtils#fieldToString, it’s incorrect as if we want to obtain a local timestamp literal from Instant, we’d use local session time zone, instead of use default zone like UTC0, vice versa. If the test storage TestingAppendSink does not support timestamp_ltz type, and it’s only support timestamp type, it’s common case like various external storages(DB, Lake format, Warehouse), it’s flink connector should convert data with timestamp_ltz type to timestamp, suppose the TestingAppendSink does not support timestamp_ltz type and only support timestamp type, I add a method TestSinkUtil #instantToTimestampString(value: Instant, tz: TimeZone) , the data stored in TestingAppendSink looks like as following: a,2020-10-10T00:00,2020-10-10T00:00:05,2020-10-10 00:00:04.999,4,11.10,5.0,1.0,2,Hi|Comment#1 a,2020-10-10T00:00:05,2020-10-10T00:00:10,2020-10-10 00:00:09.999,1,3.33,null,3.0,1,Comment#2 b,2020-10-10T00:00:05,2020-10-10T00:00:10,2020-10-10 00:00:09.999,2,6.66,6.0,3.0,2,Hello|Hi b,2020-10-10T00:00:15,2020-10-10T00:00:20,2020-10-10 00:00:19.999,1,4.44,4.0,4.0,1,Hi b,2020-10-10T00:00:30,2020-10-10T00:00:35,2020-10-10 00:00:34.999,1,3.33,3.0,3.0,1,Comment#3 null,2020-10-10T00:00:30,2020-10-10T00:00:35,2020-10-10 00:00:34.999,1,7.77,7.0,7.0,0,null After clarified the misleading result, let me explain more with an example why we chose SQL TIMESTAMP type for window_start and window_end instead of TIMESTAMP_LTZ type, as you’re living in DST timezone area, I believe you know two special days that one day it only has 23 hours and one day it has 25 hours. And please imaging the user scenarios that User have a cumulate window with step 4 hours and size 24 hours(1day), what’s the expected window_start and window_end semantics in this case? Is the timestamp literal cross 24 hours or an instant interval 24 hours? The timestamp literal value from 2021-03-14 00:00:00 to 2021-03-15 00:00:00 can represent one day in 23 hours (The DaylightTime in Los_Angele start at time 2021-03-14 02:00:00), but instant interval 24 hours' value is 1615708800000L( 2021-03-14 00:00:00) to 1615795200000L( 2021-03-15 01:00:00) , the window_end is in correct here, right? What users want is the timestamp literal interval window not instant interval window, that’s the window boundary semantics from my understanding, and that’s the reason why we always choose SQL TIMESTAMP type for window_start and window_end. Best, Leonard