aldenlau-db commented on code in PR #50315:
URL: https://github.com/apache/spark/pull/50315#discussion_r2004449691


##########
connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala:
##########
@@ -963,6 +964,28 @@ abstract class AvroSuite
     }
   }
 
+  test("SPARK-49082: Widening date to timestampNTZ in AvroDeserializer") {
+    withTempPath { tempPath =>
+      val datePath = s"$tempPath/date_data"
+      val dateDf =
+        Seq(LocalDate.of(2024, 1, 1),
+          LocalDate.of(2024, 1, 2),
+          LocalDate.of(1312, 2, 27),
+          LocalDate.of(-5877641, 6, 23),
+          LocalDate.of(5881580, 7, 11))
+        .toDF("col")
+      dateDf.write.format("avro").save(datePath)
+      checkAnswer(
+        spark.read.schema("col TIMESTAMP_NTZ").format("avro").load(datePath),
+        Seq(Row(LocalDateTime.of(2024, 1, 1, 0, 0)),
+          Row(LocalDateTime.of(2024, 1, 2, 0, 0)),
+          Row(LocalDateTime.of(1312, 2, 27, 0, 0)),
+          Row(LocalDateTime.of(-5877641, 6, 23, 0, 0)),
+          Row(LocalDateTime.of(5881580, 7, 11, 0, 0)))

Review Comment:
   @johanl-db These test cases [fail with 
an](https://github.com/aldenlau-db/spark/actions/runs/13936923971/job/39006544565)
 `ArithmeticException`.  However, this implementation is based on the Parquet 
implementation. I noticed that the Parquet reader will also fail when 
attempting to upcast any date earlier than `-290308-12-22 BCE` and later than 
`+294247-01-10 CE` with the same `ArithmeticException` due to `long` overflow.
   
   I think this is because the [Date Spark type supports 
dates](https://docs.databricks.com/aws/en/sql/language-manual/data-types/date-type)
 from `June 23 -5877641 CE to July 11 +5881580 CE`, but [TimestampNTZ 
supports](https://docs.databricks.com/aws/en/sql/language-manual/data-types/timestamp-ntz-type)
 `-290308-12-21 BCE 19:59:06 to +294247-01-10 CE 04:00:54`, which is a smaller 
range. In the context of type widening, shouldn't this widening be unsupported 
by all readers since Date stores a larger range of values than TimestampNTZ 
(even though it has less precision)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to