2019-07-16 01:24:49 UTC - Sijie Guo: @Jerry Peng : <https://github.com/apache/pulsar/pull/2414/files#diff-a8128de57fe9d119317f47fabb30abf7R136>
any idea why did you switch the type of publish_time from TIMESTAMP to TIMESTAMP_WITH_TIME_ZONE ? ---- 2019-07-16 01:24:58 UTC - Sijie Guo: but not for event_time ---- 2019-07-16 01:27:03 UTC - Sijie Guo: do you remember what drove that change? ---- 2019-07-16 01:29:59 UTC - Jerry Peng: @Sijie Guo I think its because publish_time is set by the cluster thus we know the time zone while event_time is user defined and could be any time zone ---- 2019-07-16 01:38:21 UTC - Jerry Peng: We could also change event_time to be TIMESTAMP_WITH_TIME_ZONE since everything presented as UTC time anyways ---- 2019-07-16 01:39:17 UTC - Jerry Peng: and timestamps are always in UTC anyways ---- 2019-07-16 01:39:57 UTC - Jerry Peng: but its user defined and technically users can be anything in that field ---- 2019-07-16 01:48:37 UTC - Jerry Peng: though both event_time and publish_time are displayed in UTC time anyways ---- 2019-07-16 01:54:43 UTC - Sijie Guo: in traditional databases, timestamp and timestamptz (timestamp with time zone). timestamp doesn’t store any tz information. and timestamptz will be stored with tz information. in pulsar’s case, the publish time is used using System.currentTimeMillis, which only contains the epoch information but not tz. We don’t store any tz information for both event time and publish time. So I don’t think we should use TIMESTAMP_WITH_TIME_ZONE. It should always be TIMESTAMP. If you want to display it to a date in a timezone, users can use any udf to do so. ---- 2019-07-16 01:55:41 UTC - Sijie Guo: I think it is more a matter whether pulsar stores tz information. if pulsar doesn’t store tz information, we should use timestamp. if we store tz information, we should use timestamp_with_timezone. ---- 2019-07-16 01:56:31 UTC - Jerry Peng: For unix timestamps, the timezone is always UTC ---- 2019-07-16 01:58:15 UTC - Jerry Peng: Thus I think we should use TIMESTAMP_WITH_TIME_ZONE set to UTC time for both publish_time and event_time ---- 2019-07-16 01:59:04 UTC - Jerry Peng: just to be explicit what time zone the those times represent ---- 2019-07-16 02:00:18 UTC - Sijie Guo: I mean pulsar only stores timestamps, it doesn’t store the tz information. ---- 2019-07-16 02:00:29 UTC - Sijie Guo: when you implement timestamp_tz in a database ---- 2019-07-16 02:00:36 UTC - Sijie Guo: you need to store the tz information ---- 2019-07-16 02:00:56 UTC - Sijie Guo: pulsar doesn’t store the tz information. so it can’t be timestamp_tz ---- 2019-07-16 02:01:14 UTC - Sijie Guo: that’s a standard in relational database world ---- 2019-07-16 02:01:41 UTC - Jerry Peng: yes but the time will be in UTC ---- 2019-07-16 02:02:11 UTC - Sijie Guo: it is a bit awkward we are treating a data (without tz information) as timestamp_with_tz ---- 2019-07-16 02:02:38 UTC - Jerry Peng: if UTC time is not added then the sql client assumes local time zone ---- 2019-07-16 02:02:46 UTC - Jerry Peng: which is going to be incorrect ---- 2019-07-16 02:03:05 UTC - Jerry Peng: as you can see in the screen shot I provided above ---- 2019-07-16 02:04:25 UTC - Sijie Guo: so when you select a timestamp, the time is always UTC. If you want to display with a local timezone, you use SELECT timezone(‘America/New_York’, __publish_time__); ---- 2019-07-16 02:05:03 UTC - Sijie Guo: that’s how database works for timestamp based types. ---- 2019-07-16 02:06:12 UTC - Jerry Peng: ok but I just tested from presto and if you don’t specify the timezone it assumes localtime zone ---- 2019-07-16 02:06:35 UTC - Jerry Peng: i.e. select * from pulsar.“public/default”.generator where __publish_time__ > timestamp ‘2019-07-16 01:47:28.901 UTC’; vs select * from pulsar.“public/default”.generator where __publish_time__ > timestamp ‘2019-07-16 01:47:28.901’; ---- 2019-07-16 02:06:46 UTC - Jerry Peng: one I get the correct results the other I don’t get anything ---- 2019-07-16 02:07:50 UTC - Jerry Peng: oh i see what you are saying ---- 2019-07-16 02:08:33 UTC - Sijie Guo: the problem in your query is the `timestamp` function. not the timestamp field. ---- 2019-07-16 02:11:43 UTC - Sijie Guo: from the field perspective, __publish_time_ should be a filed of `timestamp` not `timestamp_tz`. because __publish_time__ doesn’t contain any tz information. ---- 2019-07-16 02:15:56 UTC - Jerry Peng: you are right we don’t have use TIMESTAMP_WITH_TIME_ZONE ---- 2019-07-16 02:16:01 UTC - Jerry Peng: we can just use TIMESTAMP ---- 2019-07-16 02:20:16 UTC - Sijie Guo: cool. will fix it. ---- 2019-07-16 02:22:38 UTC - Jerry Peng: :+1: ---- 2019-07-16 07:28:31 UTC - Jerry Peng: @Sijie Guo actually after testing changing publish time from TIMESTAMP_WITH_TIME_ZONE -> TIMESTAMP, I think the reason I changed that in the first place is because I could not get predicate pushdown to work if the type is TIMESTAMP. There are some issues within presto in which the predicate returned is not correct. Not sure if updating the version of presto would fix the issue. I updated the issue with some more information +1 : Penghui Li, Karthik Ramasamy ---- 2019-07-16 08:51:05 UTC - Guillaume Braibant: @Guillaume Braibant has joined the channel ----