2019-07-16 01:24:49 UTC - Sijie Guo: @Jerry Peng :

<https://github.com/apache/pulsar/pull/2414/files#diff-a8128de57fe9d119317f47fabb30abf7R136>

any idea why did you switch the type of publish_time from TIMESTAMP to 
TIMESTAMP_WITH_TIME_ZONE ?
----
2019-07-16 01:24:58 UTC - Sijie Guo: but not for event_time
----
2019-07-16 01:27:03 UTC - Sijie Guo: do you remember what drove that change?
----
2019-07-16 01:29:59 UTC - Jerry Peng: @Sijie Guo I think its because 
publish_time is set by the cluster thus we know the time zone while event_time 
is user defined and could be any time zone
----
2019-07-16 01:38:21 UTC - Jerry Peng: We could also change event_time to be 
TIMESTAMP_WITH_TIME_ZONE since everything presented as UTC time anyways
----
2019-07-16 01:39:17 UTC - Jerry Peng: and timestamps are always in UTC anyways
----
2019-07-16 01:39:57 UTC - Jerry Peng: but its user defined and technically 
users can be anything in that field
----
2019-07-16 01:48:37 UTC - Jerry Peng: though both event_time and publish_time 
are displayed in UTC time anyways
----
2019-07-16 01:54:43 UTC - Sijie Guo: in traditional databases, timestamp and 
timestamptz (timestamp with time zone). timestamp doesn’t store any tz 
information. and timestamptz will be stored with tz information.

in pulsar’s case, the publish time is used using System.currentTimeMillis, 
which only contains the epoch information but not tz. We don’t store any tz 
information for both event time and publish time. So I don’t think we should  
use TIMESTAMP_WITH_TIME_ZONE. It should always be TIMESTAMP. If you want to 
display it to a date in a timezone, users can use any udf to do so.
----
2019-07-16 01:55:41 UTC - Sijie Guo: I think it is more a matter whether pulsar 
stores tz information. if pulsar doesn’t store tz information, we should use 
timestamp. if we store tz information, we should use timestamp_with_timezone.
----
2019-07-16 01:56:31 UTC - Jerry Peng: For unix timestamps, the timezone is 
always UTC
----
2019-07-16 01:58:15 UTC - Jerry Peng: Thus I think we should use 
TIMESTAMP_WITH_TIME_ZONE set to UTC time for both publish_time and event_time
----
2019-07-16 01:59:04 UTC - Jerry Peng: just to be explicit what time zone the 
those times represent
----
2019-07-16 02:00:18 UTC - Sijie Guo: I mean pulsar only stores timestamps, it 
doesn’t store the tz information.
----
2019-07-16 02:00:29 UTC - Sijie Guo: when you implement timestamp_tz in a 
database
----
2019-07-16 02:00:36 UTC - Sijie Guo: you need to store the tz information
----
2019-07-16 02:00:56 UTC - Sijie Guo: pulsar doesn’t store the tz information. 
so it can’t be timestamp_tz
----
2019-07-16 02:01:14 UTC - Sijie Guo: that’s a standard in relational database 
world
----
2019-07-16 02:01:41 UTC - Jerry Peng: yes but the time will be in UTC
----
2019-07-16 02:02:11 UTC - Sijie Guo: it is a bit awkward we are treating a data 
(without tz information) as timestamp_with_tz
----
2019-07-16 02:02:38 UTC - Jerry Peng: if UTC time is not added then the sql 
client assumes local time zone
----
2019-07-16 02:02:46 UTC - Jerry Peng: which is going to be incorrect
----
2019-07-16 02:03:05 UTC - Jerry Peng: as you can see in the screen shot I 
provided above
----
2019-07-16 02:04:25 UTC - Sijie Guo: so when you select a timestamp, the time 
is always UTC. If you want to display with a local timezone, you use SELECT 
timezone(‘America/New_York’, __publish_time__);
----
2019-07-16 02:05:03 UTC - Sijie Guo: that’s how database works for timestamp 
based types.
----
2019-07-16 02:06:12 UTC - Jerry Peng: ok but I just tested from presto and if 
you don’t specify the timezone it assumes localtime zone
----
2019-07-16 02:06:35 UTC - Jerry Peng: i.e. select * from 
pulsar.“public/default”.generator where __publish_time__ &gt; timestamp 
‘2019-07-16 01:47:28.901 UTC’;
vs
select * from pulsar.“public/default”.generator where __publish_time__ &gt; 
timestamp ‘2019-07-16 01:47:28.901’;
----
2019-07-16 02:06:46 UTC - Jerry Peng: one I get the correct results the other I 
don’t get anything
----
2019-07-16 02:07:50 UTC - Jerry Peng: oh i see what you are saying
----
2019-07-16 02:08:33 UTC - Sijie Guo: the problem in your query is the 
`timestamp` function. not the timestamp field.
----
2019-07-16 02:11:43 UTC - Sijie Guo: from the field perspective, 
__publish_time_ should be a filed of `timestamp` not `timestamp_tz`. because 
__publish_time__ doesn’t contain any tz information.
----
2019-07-16 02:15:56 UTC - Jerry Peng: you are right we don’t have use 
TIMESTAMP_WITH_TIME_ZONE
----
2019-07-16 02:16:01 UTC - Jerry Peng: we can just use TIMESTAMP
----
2019-07-16 02:20:16 UTC - Sijie Guo: cool. will fix it.
----
2019-07-16 02:22:38 UTC - Jerry Peng: :+1:
----
2019-07-16 07:28:31 UTC - Jerry Peng: @Sijie Guo actually after testing 
changing publish time from TIMESTAMP_WITH_TIME_ZONE -&gt; TIMESTAMP, I think 
the reason I changed that in the first place is because I could not get 
predicate pushdown to work if the type is TIMESTAMP.  There are some issues 
within presto in which the predicate returned is not correct.  Not sure if 
updating the version of presto would fix the issue.  I updated the issue with 
some more information
+1 : Penghui Li, Karthik Ramasamy
----
2019-07-16 08:51:05 UTC - Guillaume Braibant: @Guillaume Braibant has joined 
the channel
----

Reply via email to