Hey Sean Thanks for the reply. Indeed I could check that the values are consistent with Java. So basically the issue is related to Java and not spark. Thanks !
Regards. Ankit Prakash Gupta On Fri, Sep 6, 2024 at 8:37 AM Sean Owen <sro...@gmail.com> wrote: > Are you sure those are incorrect? Or at least are they not consistent with > java? For dates really far in the past, the exact mapping gets really > complex due to calendar changes over time. Specifically the calendar we all > use today didn't exist 2000 years ago even > > On Thu, Sep 5, 2024, 9:55 PM Ankit Gupta <info.ank...@gmail.com> wrote: > >> Hi Dev Community >> >> I came across a weird bug in spark sql function `from_utc_timestamp`, the >> values are not consistent. When converting to any other timezone from UTC >> like IST below the values are erratic. I have already created a jira ticket >> for the same. >> >> Any thoughts on this, how we can avoid this? >> >> For example >> >> >>> java.util.TimeZone.setDefault(java.util.TimeZone.getTimeZone("UTC")) >>> val df = Seq(java.sql.Timestamp.valueOf("0001-01-01 00:00:00"), >>> java.sql.Timestamp.valueOf("1900-01-01 00:00:00"), >>> java.sql.Timestamp.valueOf("1799-12-31 00:00:00"), >>> java.sql.Timestamp.valueOf("1850-12-31 00:00:00"), new >>> java.sql.Timestamp(0)).toDF("ts") >>> df.withColumn("ts_trans", from_utc_timestamp($"ts", "IST")).show >>> >>> >>> // Exiting paste mode, now interpreting. >>> >>> +-------------------+-------------------+ >>> | ts| ts_trans| >>> +-------------------+-------------------+ >>> |0001-01-01 00:00:00|0001-01-01 05:53:28| >>> |1900-01-01 00:00:00|1900-01-01 05:21:10| >>> |1799-12-31 00:00:00|1799-12-31 05:53:28| >>> |1850-12-31 00:00:00|1850-12-31 05:53:28| >>> |1970-01-01 00:00:00|1970-01-01 05:30:00| >>> +-------------------+-------------------+ >> >> >> Thanks and Regards. >> >> Ankit Prakash Gupta >> >