Hi Spark Dev Team,

I believe I've encountered a potential bug in spark 3.5.1 concerning the
UNIX_SECONDS function when used with TO_UTC_TIMESTAMP.

When converting a timestamp from a specific timezone (e.g.,
'Europe/Amsterdam') to UTC and then getting its Unix seconds, the result
seems incorrect. TO_UTC_TIMESTAMP produces the correct UTC timestamp, but
UNIX_SECONDS yields a value corresponding to a different UTC time.

Code:

spark.conf.set('spark.sql.session.timezone', 'UTC')
spark.sql("""
SELECT
    TO_UTC_TIMESTAMP('2024-04-01 23:59:59', 'Europe/Amsterdam') AS tz_ts,
    UNIX_SECONDS(TO_UTC_TIMESTAMP('2024-04-01 23:59:59',
'Europe/Amsterdam')) as unix
FROM some_table -- Replace with a valid source/dummy table
""").show()

Output:

+-------------------+----------+
|              tz_ts|      unix|
+-------------------+----------+
|2024-04-01 21:59:59|1712001599|
+-------------------+----------+

Issue:

The tz_ts column correctly shows 2024-04-01 21:59:59 (UTC). However, the
unix value 1712001599 corresponds to 2024-04-01 19:59:59 UTC when verified
externally (e.g., datetime.utcfromtimestamp(1712001599) in Python). This is
a 2-hour discrepancy.

It appears UNIX_SECONDS might be incorrectly calculating the epoch seconds
from the timestamp provided by TO_UTC_TIMESTAMP in this scenario.

Would love to hear your thoughts, am I missing something here?

Thanks,
---
Miguel Leite
Machine Learning Scientist

[image: Adyen]  <https://www.adyen.com/>[image: Adyen LinkedIn]
<https://www.linkedin.com/company/adyen>[image: Adyen X]
<https://twitter.com/Adyen>[image: Adyen Facebook]
<https://www.facebook.com/AdyenPayments/>

Reply via email to