Hi Spark community,
I recently found a difference in the behavior of the now() function in Spark SQL 3.5.1 when running under different JVM versions (Java 8 vs Java 21). Background: In Spark 3.5.1, the implementation of now() eventually calls Instant.now() in Java. However, the precision of Instant.now() differs by Java version: Java 8: millisecond precision Java 21: microsecond precision When Spark converts this value into a timestamp and then formats it (via TimestampFormatter), the trailing zeros are trimmed — meaning: On Java 8, the result shows up to 3 fractional digits (for example, 2025-10-25 13:04:05.123) On Java 21, it may show up to 6 fractional digits (for example, 2025-10-25 13:04:05.123456) From the source code, I noticed that DefaultTimestampFormatter extends Iso8601TimestampFormatter. It seems that the display behavior is consistent with the ISO-8601 standard, which does not constrain the number of fractional digits to a fixed length. I’m not entirely sure if my understanding is correct, so I would appreciate any clarification from the community. This seems consistent with both Java and Spark behavior, as neither explicitly guarantees a fixed number of fractional digits. However, this difference brings a backward-compatibility issue in our environment. Problem: In our platform, some business tables store timestamp values as string columns with a fixed length. I’m aware that storing timestamps as strings is not ideal, but due to historical reasons and existing dependencies, it’s not something we can easily change at the platform level. Upgrading Spark’s JVM from Java 8 to Java 21 will therefore cause these values to exceed the column length limit. Since we maintain a shared Spark platform used by many business teams, it’s very difficult to coordinate changes to all downstream systems. My Questions and Proposal: Is my understanding correct regarding how Spark 3.5.1 derives its now() precision from the underlying Instant.now() and Iso8601TimestampFormatter? Does the Spark community have any plan or discussion to stabilize or unify the precision of now() output across Java versions? I’m considering adding a new optional configuration to address this issue, for example: spark.sql.timestamp.now.compat.java8 = false When set to false (the default), Spark will keep the current behavior (using the JVM’s native Instant.now() precision). When set to true, Spark will apply a controlled formatter that limits the fractional part of seconds to at most three digits, effectively simulating Java 8’s millisecond precision for display and string conversion. This approach would not affect computation precision internally — only the formatted string representation — and could help users maintain backward compatibility with older data models. If the community thinks this direction makes sense, I would be happy to open a JIRA and submit a PR for review. I’d also like to hear whether there are existing conventions or related discussions about timestamp precision compatibility that I should take into account before implementing this. Thanks for your time and insights. Best regards, Yu Hong
