Fra: Magnus Hagander <mag...@hagander.net>
Sendt: 15. april 2020 20:05
Til: Niels Jespersen <n...@dst.dk>
Cc: pgsql-general@lists.postgresql.org
Emne: Re: timestamp and timestamptz



On Wed, Apr 15, 2020 at 7:50 PM Niels Jespersen 
<n...@dst.dk<mailto:n...@dst.dk>> wrote:
Hello all

We have some data that have entered a timestamp column from a csv. The data in 
the csv are in utc. We want to access the data in our native timezone (CET).

I am considering a few alternatives:


1.       Early in the process, convert to timestamptz and keep this datatype.

2.       Early in the process, convert to timestamp as understood in CET.  This 
will imply by convention that the data in the timestamp column represents CET. 
Users will need to be told that data represents CET, even if data is somwhere 
in the future kept in another country in another timezone.

I probably should choose 1 over 2. But I am a bit hesitant, probably because we 
almost never have used timestamptz.

Yes, you should.


Can we agree that the below query is selecting both the original utc timestamps 
and 2 and 1 (as decribed above)?

set timezone to 'cet';
select read_time read_time_utc, (read_time at time zone 'utc')::timestamp 
read_time_cet, (read_time at time zone 'utc')::timestamptz read_time_tz from t 
limit 10;


As long as you use option 1:

SELECT read_time
will return the time in CET (as a timestamptz) after you've set timezone to 
'cet'. If you set timezone to 'utc' it will directly return utc.

SELECT read_time AT TIME ZONE 'utc'
will return the time in UTC (as a timestamp)


And just make sure you have done a "set time zone 'utc'" before you *load* the 
data, and everything should just work automatically.

--
 Magnus Hagander
 Me: https://www.hagander.net/<http://www.hagander.net/>
 Work: https://www.redpill-linpro.com/<http://www.redpill-linpro.com/>


Thank you Magnus (and others) for your replies.

The raw input data are in this, slightly strange format: 2019.05.01 00:00:00. 
No timezone indicator, just an informal guarantee from the supplier that it is 
indeed utc. And no real chance of changing the format. We know, from experience.

The data volume is a bit high, a few billion rows pr month. So, table 
partitioning is very helpful (aka really indispensable). Data will be 
aggregated in several ways for analytics. Time aggregations must be according 
to our local timezone (cet). We do not want data from one day being aggregated 
into the wrong date because of timezone issues. This means that partition 
boundaries (monthly pratitions most often, sometimes day partitions) must be on 
CET-boundaries so that partition pruning will pull data from the relevant 
cet-month not the utc-month.

Now, if I load data into a timestamptz with timezone set to utc, partition to 
cet-boundaries, query and aggredate with timezone set to cet, everything wil be 
ok, I think. My small testcase below shows that the row goes into the 
april-partition (as it should). The planner does the correct partition pruning 
according to specified filtering and set timezone. All good.

create table t (t_id bigserial, ts timestamptz) partition by range (ts);
create table t_2020_02 partition of t for values from ('2020-02-01 
00:00:00+01') to ('2020-03-01 00:00:00+01');
create table t_2020_03 partition of t for values from ('2020-03-01 
00:00:00+01') to ('2020-04-01 00:00:00+02');
create table t_2020_04 partition of t for values from ('2020-04-01 
00:00:00+02') to ('2020-05-01 00:00:00+02');

set timezone to 'utc';
insert into t (ts) values('2020-03-31 23:30:00');

Once again, thank you for invaluable feedback.

Niels Jespersen

Reply via email to