Hello,
First of all, Thanks to you guys for maintaining and improving Spark.

We just updated to Spark 3.0.1 and are facing some issues with the new 
Proleptic Gregorian calendar.

We have data from different sources in our platform and we saw there were some 
date/timestamp columns that go back to years before 1500.

According to 
this<https://www.waitingforcode.com/apache-spark-sql/whats-new-apache-spark-3-proleptic-calendar-date-time-management/read>
 post, data written with spark 2.4 and read with 3.0 should result in some 
difference in dates/timestamps but we are not able to replicate this issue. We 
only encounter an exception that suggests us to set 
spark.sql.legacy.parquet.datetimeRebaseModeInRead/Write config options to make 
it work.

So, our main concern is:

  *   How can we test/replicate this behavior? Since it's not very clear to 
us/nor we see any docs for this change, we can't decide with certainty which 
parameters to set and why.
  *   What config options should we set,
     *    if we are always going to read old data written from Spark2.4 using 
Spark 3.0
     *   will always be writing newer data with Spark3.0.

We couldn't make a deterministic/informed choice so it's a better idea to ask 
the community what scenarios will be impacted and what will still work fine.

Thanks
Saurabh


Reply via email to