Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

Alban Hertroys Wed, 04 Oct 2023 14:11:05 -0700


> On 4 Oct 2023, at 21:30, Lincoln Swaine-Moore <lswainemo...@gmail.com> wrote:
> 
> > What I do in such cases is to add an extra column with the UTC timestamp to 
> > serve as a linear scale to the local timestamps. That also helps with 
> > ordering buckets in reports and such during DST changes (especially the 
> > ones where an hour repeats).
> 
> > For hours and quarter hours I found it to be fairly convenient to base a 
> > view on a join between a date calendar and an (quarter of an) hour per UTC 
> > day table, but materialising that with some indexes may perform better (at 
> > the cost of disk space). I do materialise that currently, but our database 
> > server doesn’t have a lot of memory so I’m often not hitting the cache and 
> > performance suffers a bit (infrastructure is about to change for the better 
> > though).
> 
> That's an interesting idea, but I'm not sure I fully understand. Assuming 
> you're aggregating data: what do you group by? For instance, at an hourly 
> resolution, if you group by both the UTC timestamp and the local one, you 
> might end up, say, dividing an hour-long bucket in two for time zones with 
> half-hour-based offsets, no? 
> 
> Thanks for the detailed writeup! Definitely helpful to learn more about what 
> people are using in production to handle this sort of thing.


Frankly, I haven’t had to deal with half-hour-based offsets since I got this 
idea. I’m using it with whole-hour-offsets, where it doesn’t affect bin 
boundaries.

I suppose you could enrich your data in a similar fashion by adding a (virtual) 
column with the (client) time zone offset, so you could group by local 
timestamp + offset. That’s not going to match index expressions though, I fear…

For sorting, UTC timestamps would probably still be a useful addition, but 
they’re simple to add by either converting back from the local timestamps or by 
taking the min and max of the UTC-based column on the above grouping. Both 
solutions require that offset, obviously.

Now of course there are only 2 hours a year where this happens. Our data 
scientists chose to ignore the problem for simplicity’s sake and be slightly 
off with their numbers on those dates.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: Strategies for converting UTC data to local windows for arbitrary resolutions and timezones

Reply via email to