Maybe we should discuss the key issues on the dev list as it's easy to lose
track of Google Doc comments.

I think all the proposals for adding new data types need to prove that the
new data type is common/standard in the ecosystem. This means 3 things:
- it has common/standard semantic. TIMESTAMP with nanosecond precision is
definitely a standard data type, in both ANSI SQL and mainstream databases.
- it has common/standard storage format. Parquet/Iceberg supports
nanosecond timestamp using int64, which is different from what is proposed
here.
- it has common/standard processing methods. The java datetime library
Spark is using now already support nanosecond, so we are fine here.

I think the key issue is the format. The proposed 10-byte format doesn't
seem like a standard and the one in Iceberg/Parquet does not support the
required range by ANSI SQL: year 0001 to year 9999. We should address this
issue first. Note that Parquet has an INT96 timestamp that supports
nanosecond precision, but it's deprecated. Shall we work with the Parquet
community to revive it?

On Fri, Mar 28, 2025 at 7:03 AM DB Tsai <dbt...@dbtsai.com> wrote:

> Thanks!!!
>
> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>
> On Mar 27, 2025, at 3:56 PM, Qi Tan <qi.tan.j...@gmail.com> wrote:
>
> Thanks DB,
>
> I just noticed a few more comments came in after I initiated the vote. I'm
> going to postpone the voting process and address those outstanding
> comments.
>
> Qi Tan
>
> DB Tsai <dbt...@dbtsai.com> 于2025年3月27日周四 15:12写道:
>
>> Hello Qi,
>>
>> I'm supportive of the NanoSecond Timestamps proposal; however, before we
>> initiate the vote, there are a few outstanding comments in the SPIP
>> document that haven't been addressed yet. Since the vote is on the document
>> itself, could we resolve these items beforehand?
>>
>> For example:
>>
>>    -
>>
>>    The default precision of TimestampNsNTZType is set to 6, which
>>    overlaps with the existing TimestampNTZ.
>>    -
>>
>>    The specified range exceeds the capacity of an int64, but the
>>    document doesn't clarify how this type will be represented in memory or
>>    serialized in data sources.
>>    -
>>
>>    Schema inference details for data sources are missing.
>>
>> These points still need discussion.
>>
>> I appreciate your efforts in putting the doc together and look forward to
>> your contribution!
>>
>> Thanks,
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>
>> On Mar 27, 2025, at 1:24 PM, huaxin gao <huaxin.ga...@gmail.com> wrote:
>>
>> +1
>>
>> On Thu, Mar 27, 2025 at 1:22 PM Qi Tan <qi.tan.j...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I would like to start a vote on adding support for nanoseconds
>>> timestamps.
>>>
>>> *Discussion thread: *
>>> https://lists.apache.org/thread/y2vzrjl1499j5dvbpg3m81jxdhf4b6of
>>> *SPIP:*
>>> https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?usp=sharing
>>> *JIRA:*  https://issues.apache.org/jira/browse/SPARK-50532
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because
>>>
>>
>>
>

Reply via email to