aldenlau-db commented on code in PR #50315: URL: https://github.com/apache/spark/pull/50315#discussion_r2004449691
########## connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala: ########## @@ -963,6 +964,28 @@ abstract class AvroSuite } } + test("SPARK-49082: Widening date to timestampNTZ in AvroDeserializer") { + withTempPath { tempPath => + val datePath = s"$tempPath/date_data" + val dateDf = + Seq(LocalDate.of(2024, 1, 1), + LocalDate.of(2024, 1, 2), + LocalDate.of(1312, 2, 27), + LocalDate.of(-5877641, 6, 23), + LocalDate.of(5881580, 7, 11)) + .toDF("col") + dateDf.write.format("avro").save(datePath) + checkAnswer( + spark.read.schema("col TIMESTAMP_NTZ").format("avro").load(datePath), + Seq(Row(LocalDateTime.of(2024, 1, 1, 0, 0)), + Row(LocalDateTime.of(2024, 1, 2, 0, 0)), + Row(LocalDateTime.of(1312, 2, 27, 0, 0)), + Row(LocalDateTime.of(-5877641, 6, 23, 0, 0)), + Row(LocalDateTime.of(5881580, 7, 11, 0, 0))) Review Comment: @johanl-db These test cases [fail with an](https://github.com/aldenlau-db/spark/actions/runs/13936923971/job/39006544565) `ArithmeticException`. However, this implementation is based on the Parquet implementation. I noticed 2 issues: 1. The Parquet reader will also fail when attempting to upcast dates such as these with the same `ArithmeticException` due to `long` overflow. 2. The [Date Spark type supports dates](https://docs.databricks.com/aws/en/sql/language-manual/data-types/date-type) from `June 23 -5877641 CE to July 11 +5881580 CE`, but [TimestampNTZ supports](https://docs.databricks.com/aws/en/sql/language-manual/data-types/timestamp-ntz-type) `-290308-12-21 BCE 19:59:06 to +294247-01-10 CE 04:00:54`, which is a smaller range. In the context of type widening, shouldn't this widening be unsupported by all readers since Date stores a larger range of values than TimestampNTZ (even though it has less precision)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org