Abdullahsab3 opened a new issue, #10368:
URL: https://github.com/apache/datafusion/issues/10368
### Is your feature request related to a problem or challenge?
It would be nice if Datafusion has support for better timezone
functionalities. Though painful, local time is often used in energy analytics
and data reporting (especially when the reporting is meant for official
authorities).
Specifically, I would like to have the following functionalities:
- given a UTC timestamp, I would like to have that timestamp in local time
of a given timezone
- given a UTC timestamp, data should be aggregated with local time from that
UTC timestamp by providing a timezone
- Extracting the time offset (using `date_part`) for example, might be a
nice to have
- Daylight savings hour should be handled correctly. `chrono` seems to
prefer failing/returning `None` when there is ambigous results regarding
daylight savings, which I assume is an inherited behavior for Arrow. If we can
have some kind of flag to not fail, but either do _something else_ , it would
be great.
### Describe the solution you'd like
I am not sure how this can be implemented in a generalized way. Perhaps by
adding a function `to_local_time` that takes a timestamp/column as an argument,
and a timezone as another argument. For example
```
> select to_local_time('2020-11-30T08:00:00.000Z'::timestamp,
'Europe/Brussels');
-- translates to '2020-11-30T08:00:00.000Z'::timestamp + interval '1 hour'
< '2020-11-30T09:00:00'
```
The difference between this and the `at time zone` operator is:
The `at time zone` operator seems to be only adding timezone/offset
information to the timestamp if I understood correctly. the proposed
`to_local_time` function should ideally apply that offset to the provided
timestamp. The same example above would be:
```
> select '2020-11-30T08:00:00.000Z'::timestamp at time zone
'Europe/Brussels';
+----------------------------------+
| Utf8("2020-11-30T08:00:00.000Z") |
+----------------------------------+
| 2020-11-30T08:00:00+01:00 |
+----------------------------------+
1 row in set. Query took 0.002 seconds.
```
### Describe alternatives you've considered
I have currently a hacky alternative to aggregate using local time. What I
do basically is the following:
- Get the `time` column
- Get the offset of the `time` using: `((time AT TIME ZONE 'UTC' at time
zone 'Europe/Brussels') - (time AT TIME ZONE 'Europe/Brussels' at time zone
'UTC'))::interval`
- - We need to consider the daylight savings hour here and treat it
differently, given that `(time AT TIME ZONE 'Europe/Brussels' at time zone
'UTC')` fails during daylight savings hour. This then becomes for example:
```
CASE
-- daylight savings in 2021. Dont add an offset
WHEN time >= '2021-03-28T02:00:00' AND time < '2021-03-28T03:00:00' THEN
interval '0'
-- add the time offset for the 'Europe/Brussels' timezone
ELSE ((time AT TIME ZONE 'UTC' at time zone 'Europe/Brussels') - (time
AT TIME ZONE 'Europe/Brussels' at time zone 'UTC'))::interval
END
```
The daylight savings predicates are dynamically generated in an application
layer
- Add the offset to `time`
This all becomes something like:
```
SELECT
Sum(delta) AS value,
DATE_BIN(
interval '1 day',
time
+
CASE
-- daylight savings in 2021. Dont add an offset
WHEN time >= '2021-03-28T02:00:00' AND time < '2021-03-28T03:00:00' THEN
interval '0'
-- add the time offset for the 'Europe/Brussels' timezone
ELSE ((time AT TIME ZONE 'UTC' at time zone 'Europe/Brussels') - (time
AT TIME ZONE 'Europe/Brussels' at time zone 'UTC'))::interval
END)
FROM
...
```
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]