Hi Spark community,



I recently found a difference in the behavior of the now() function 
in Spark SQL 3.5.1 when running under different JVM versions (Java 8 vs Java 
21).




Background:

In Spark 3.5.1, the implementation of now() eventually 
calls Instant.now() in Java.

However, the precision of Instant.now() differs by Java version:


Java 8: millisecond precision



Java 21: microsecond precision





When Spark converts this value into a timestamp and then formats it 
(via TimestampFormatter),

the trailing zeros are trimmed — meaning:


On Java 8, the result shows up to 3 fractional digits (for example, 2025-10-25 
13:04:05.123)



On Java 21, it may show up to 6 fractional digits (for example, 2025-10-25 
13:04:05.123456)





From the source code, I noticed 
that DefaultTimestampFormatter extends Iso8601TimestampFormatter.

It seems that the display behavior is consistent with the ISO-8601 standard, 
which does not constrain

the number of fractional digits to a fixed length. I’m not entirely sure if my 
understanding is correct,

so I would appreciate any clarification from the community.




This seems consistent with both Java and Spark behavior, as neither explicitly 
guarantees a fixed number of fractional digits.

However, this difference brings a backward-compatibility issue in our 
environment.




Problem:

In our platform, some business tables store timestamp values as string columns 
with a fixed length.

I’m aware that storing timestamps as strings is not ideal, but due to 
historical reasons and existing dependencies,

it’s not something we can easily change at the platform level.




Upgrading Spark’s JVM from Java 8 to Java 21 will therefore cause these values 
to exceed the column length limit.

Since we maintain a shared Spark platform used by many business teams,

it’s very difficult to coordinate changes to all downstream systems.




My Questions and Proposal:


Is my understanding correct regarding how Spark 3.5.1 derives 
its now() precision from the 
underlying Instant.now() and Iso8601TimestampFormatter?



Does the Spark community have any plan or discussion to stabilize or unify the 
precision of now() output across Java versions?



I’m considering adding a new optional configuration to address this issue, for 
example:

spark.sql.timestamp.now.compat.java8 = false

When set to false (the default), Spark will keep the current behavior (using 
the JVM’s native Instant.now() precision).

When set to true, Spark will apply a controlled formatter that limits the 
fractional part of seconds to at most three digits,

effectively simulating Java 8’s millisecond precision for display and string 
conversion.

This approach would not affect computation precision internally — only the 
formatted string representation —

and could help users maintain backward compatibility with older data models.





If the community thinks this direction makes sense, I would be happy to open a 
JIRA and submit a PR for review.

I’d also like to hear whether there are existing conventions or related 
discussions about timestamp precision compatibility

that I should take into account before implementing this.




Thanks for your time and insights.




Best regards,

Yu Hong

Reply via email to