[PR] [SPARK-49858][SQL] Fix Spark JSON reader incorrectly considers a string of digits a timestamp [spark]

via GitHub Sat, 22 Mar 2025 15:16:29 -0700


lam1051999 opened a new pull request, #50354:
URL: https://github.com/apache/spark/pull/50354


   ### What changes were proposed in this pull request?
   
   Spark JSON reader uses `DefaultTimestampFormatter` for inferring timestamps 
from strings if user does not specify any timestamp patterns, which can cause 
confusion in case the string only has Year or  Year + Month segments in the 
string with regular strings. This change is to remove the conversion from 
string to timestamp if JSON property value is in one of the formats:
   - `[+-]yyyy*`
   - `[+-]yyyy*-[m]m`
   
   ### Why are the changes needed?
   
   To avoid confusion between regular strings and strings that are in the 
formats having only Year or Year + Month segments, as per reported in this 
issue: https://issues.apache.org/jira/browse/SPARK-49858
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes
   
   - Previous behavior: "23456" string is considered as a Timestamp, below is 
captured from Spark Scala
   
   <img width="783" alt="image" 
src="https://github.com/user-attachments/assets/9f9b82ef-1004-4761-bc22-2dcfd15affcc";
 />
   
   The issue is even worse when is in PySpark when pulling result from JVM to a 
Python datetime, and Python datetime cannot handle Year part that is greater 
than 9999
   
   <img width="870" alt="image" 
src="https://github.com/user-attachments/assets/e4f1b513-4edc-4db6-b9f7-2a01ad95b668";
 />
   
   ### How was this patch tested?
   
   Unit tests are provided for above cases.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[PR] [SPARK-49858][SQL] Fix Spark JSON reader incorrectly considers a string of digits a timestamp [spark]

Reply via email to