> > I think the key issue is the format. The proposed 10-byte format doesn't > seem like a standard and the one in Iceberg/Parquet does not support the > required range by ANSI SQL: year 0001 to year 9999. We should address this > issue first. Note that Parquet has an INT96 timestamp that supports > nanosecond precision, but it's deprecated. Shall we work with the Parquet > community to revive it?
It would be great to discuss a plan for this in parquet. This has come up in passing in some of the recent parquet syncs. I don't think resurrecting int96 is necessarily a great idea since it is defined in terms of Julian days [1], and most systems these days are standardizing on proleptic-Gregorian. A fair number of OSS implementations that do interact with int96 I've seen do conversion assuming all timestamps are post Unix epoch timestamps and therefore have errors/idiosyncrasies when translating dates prior to the Gregorian cutover. Cheers, Micah [1] https://github.com/apache/parquet-format/pull/49 On Thu, Mar 27, 2025 at 7:02 PM Wenchen Fan <cloud0...@gmail.com> wrote: > Maybe we should discuss the key issues on the dev list as it's easy to > lose track of Google Doc comments. > > I think all the proposals for adding new data types need to prove that the > new data type is common/standard in the ecosystem. This means 3 things: > - it has common/standard semantic. TIMESTAMP with nanosecond precision is > definitely a standard data type, in both ANSI SQL and mainstream databases. > - it has common/standard storage format. Parquet/Iceberg supports > nanosecond timestamp using int64, which is different from what is proposed > here. > - it has common/standard processing methods. The java datetime library > Spark is using now already support nanosecond, so we are fine here. > > I think the key issue is the format. The proposed 10-byte format doesn't > seem like a standard and the one in Iceberg/Parquet does not support the > required range by ANSI SQL: year 0001 to year 9999. We should address this > issue first. Note that Parquet has an INT96 timestamp that supports > nanosecond precision, but it's deprecated. Shall we work with the Parquet > community to revive it? > > On Fri, Mar 28, 2025 at 7:03 AM DB Tsai <dbt...@dbtsai.com> wrote: > >> Thanks!!! >> >> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >> >> On Mar 27, 2025, at 3:56 PM, Qi Tan <qi.tan.j...@gmail.com> wrote: >> >> Thanks DB, >> >> I just noticed a few more comments came in after I initiated the vote. >> I'm going to postpone the voting process and address those outstanding >> comments. >> >> Qi Tan >> >> DB Tsai <dbt...@dbtsai.com> 于2025年3月27日周四 15:12写道: >> >>> Hello Qi, >>> >>> I'm supportive of the NanoSecond Timestamps proposal; however, before we >>> initiate the vote, there are a few outstanding comments in the SPIP >>> document that haven't been addressed yet. Since the vote is on the document >>> itself, could we resolve these items beforehand? >>> >>> For example: >>> >>> - >>> >>> The default precision of TimestampNsNTZType is set to 6, which >>> overlaps with the existing TimestampNTZ. >>> - >>> >>> The specified range exceeds the capacity of an int64, but the >>> document doesn't clarify how this type will be represented in memory or >>> serialized in data sources. >>> - >>> >>> Schema inference details for data sources are missing. >>> >>> These points still need discussion. >>> >>> I appreciate your efforts in putting the doc together and look forward >>> to your contribution! >>> >>> Thanks, >>> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >>> >>> On Mar 27, 2025, at 1:24 PM, huaxin gao <huaxin.ga...@gmail.com> wrote: >>> >>> +1 >>> >>> On Thu, Mar 27, 2025 at 1:22 PM Qi Tan <qi.tan.j...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I would like to start a vote on adding support for nanoseconds >>>> timestamps. >>>> >>>> *Discussion thread: * >>>> https://lists.apache.org/thread/y2vzrjl1499j5dvbpg3m81jxdhf4b6of >>>> *SPIP:* >>>> https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?usp=sharing >>>> *JIRA:* https://issues.apache.org/jira/browse/SPARK-50532 >>>> >>>> Please vote on the SPIP for the next 72 hours: >>>> >>>> [ ] +1: Accept the proposal as an official SPIP >>>> [ ] +0 >>>> [ ] -1: I don’t think this is a good idea because >>>> >>> >>> >>