Maybe we should discuss the key issues on the dev list as it's easy to lose track of Google Doc comments.
I think all the proposals for adding new data types need to prove that the new data type is common/standard in the ecosystem. This means 3 things: - it has common/standard semantic. TIMESTAMP with nanosecond precision is definitely a standard data type, in both ANSI SQL and mainstream databases. - it has common/standard storage format. Parquet/Iceberg supports nanosecond timestamp using int64, which is different from what is proposed here. - it has common/standard processing methods. The java datetime library Spark is using now already support nanosecond, so we are fine here. I think the key issue is the format. The proposed 10-byte format doesn't seem like a standard and the one in Iceberg/Parquet does not support the required range by ANSI SQL: year 0001 to year 9999. We should address this issue first. Note that Parquet has an INT96 timestamp that supports nanosecond precision, but it's deprecated. Shall we work with the Parquet community to revive it? On Fri, Mar 28, 2025 at 7:03 AM DB Tsai <dbt...@dbtsai.com> wrote: > Thanks!!! > > DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 > > On Mar 27, 2025, at 3:56 PM, Qi Tan <qi.tan.j...@gmail.com> wrote: > > Thanks DB, > > I just noticed a few more comments came in after I initiated the vote. I'm > going to postpone the voting process and address those outstanding > comments. > > Qi Tan > > DB Tsai <dbt...@dbtsai.com> 于2025年3月27日周四 15:12写道: > >> Hello Qi, >> >> I'm supportive of the NanoSecond Timestamps proposal; however, before we >> initiate the vote, there are a few outstanding comments in the SPIP >> document that haven't been addressed yet. Since the vote is on the document >> itself, could we resolve these items beforehand? >> >> For example: >> >> - >> >> The default precision of TimestampNsNTZType is set to 6, which >> overlaps with the existing TimestampNTZ. >> - >> >> The specified range exceeds the capacity of an int64, but the >> document doesn't clarify how this type will be represented in memory or >> serialized in data sources. >> - >> >> Schema inference details for data sources are missing. >> >> These points still need discussion. >> >> I appreciate your efforts in putting the doc together and look forward to >> your contribution! >> >> Thanks, >> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >> >> On Mar 27, 2025, at 1:24 PM, huaxin gao <huaxin.ga...@gmail.com> wrote: >> >> +1 >> >> On Thu, Mar 27, 2025 at 1:22 PM Qi Tan <qi.tan.j...@gmail.com> wrote: >> >>> Hi all, >>> >>> I would like to start a vote on adding support for nanoseconds >>> timestamps. >>> >>> *Discussion thread: * >>> https://lists.apache.org/thread/y2vzrjl1499j5dvbpg3m81jxdhf4b6of >>> *SPIP:* >>> https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?usp=sharing >>> *JIRA:* https://issues.apache.org/jira/browse/SPARK-50532 >>> >>> Please vote on the SPIP for the next 72 hours: >>> >>> [ ] +1: Accept the proposal as an official SPIP >>> [ ] +0 >>> [ ] -1: I don’t think this is a good idea because >>> >> >> >