>
> I think the key issue is the format. The proposed 10-byte format doesn't
> seem like a standard and the one in Iceberg/Parquet does not support the
> required range by ANSI SQL: year 0001 to year 9999. We should address this
> issue first. Note that Parquet has an INT96 timestamp that supports
> nanosecond precision, but it's deprecated. Shall we work with the Parquet
> community to revive it?


It would be great to discuss a plan for this in parquet.  This has come up
in passing in some of the recent parquet syncs.  I don't think resurrecting
int96 is necessarily a great idea since it is defined in terms of Julian
days [1], and most systems these days are standardizing on
proleptic-Gregorian.

A fair number of OSS implementations that do interact with int96 I've seen
do conversion assuming all timestamps are post Unix epoch timestamps and
therefore have errors/idiosyncrasies when translating dates prior to the
Gregorian cutover.

Cheers,
Micah

[1] https://github.com/apache/parquet-format/pull/49

On Thu, Mar 27, 2025 at 7:02 PM Wenchen Fan <cloud0...@gmail.com> wrote:

> Maybe we should discuss the key issues on the dev list as it's easy to
> lose track of Google Doc comments.
>
> I think all the proposals for adding new data types need to prove that the
> new data type is common/standard in the ecosystem. This means 3 things:
> - it has common/standard semantic. TIMESTAMP with nanosecond precision is
> definitely a standard data type, in both ANSI SQL and mainstream databases.
> - it has common/standard storage format. Parquet/Iceberg supports
> nanosecond timestamp using int64, which is different from what is proposed
> here.
> - it has common/standard processing methods. The java datetime library
> Spark is using now already support nanosecond, so we are fine here.
>
> I think the key issue is the format. The proposed 10-byte format doesn't
> seem like a standard and the one in Iceberg/Parquet does not support the
> required range by ANSI SQL: year 0001 to year 9999. We should address this
> issue first. Note that Parquet has an INT96 timestamp that supports
> nanosecond precision, but it's deprecated. Shall we work with the Parquet
> community to revive it?
>
> On Fri, Mar 28, 2025 at 7:03 AM DB Tsai <dbt...@dbtsai.com> wrote:
>
>> Thanks!!!
>>
>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>
>> On Mar 27, 2025, at 3:56 PM, Qi Tan <qi.tan.j...@gmail.com> wrote:
>>
>> Thanks DB,
>>
>> I just noticed a few more comments came in after I initiated the vote.
>> I'm going to postpone the voting process and address those outstanding
>> comments.
>>
>> Qi Tan
>>
>> DB Tsai <dbt...@dbtsai.com> 于2025年3月27日周四 15:12写道:
>>
>>> Hello Qi,
>>>
>>> I'm supportive of the NanoSecond Timestamps proposal; however, before we
>>> initiate the vote, there are a few outstanding comments in the SPIP
>>> document that haven't been addressed yet. Since the vote is on the document
>>> itself, could we resolve these items beforehand?
>>>
>>> For example:
>>>
>>>    -
>>>
>>>    The default precision of TimestampNsNTZType is set to 6, which
>>>    overlaps with the existing TimestampNTZ.
>>>    -
>>>
>>>    The specified range exceeds the capacity of an int64, but the
>>>    document doesn't clarify how this type will be represented in memory or
>>>    serialized in data sources.
>>>    -
>>>
>>>    Schema inference details for data sources are missing.
>>>
>>> These points still need discussion.
>>>
>>> I appreciate your efforts in putting the doc together and look forward
>>> to your contribution!
>>>
>>> Thanks,
>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Mar 27, 2025, at 1:24 PM, huaxin gao <huaxin.ga...@gmail.com> wrote:
>>>
>>> +1
>>>
>>> On Thu, Mar 27, 2025 at 1:22 PM Qi Tan <qi.tan.j...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I would like to start a vote on adding support for nanoseconds
>>>> timestamps.
>>>>
>>>> *Discussion thread: *
>>>> https://lists.apache.org/thread/y2vzrjl1499j5dvbpg3m81jxdhf4b6of
>>>> *SPIP:*
>>>> https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?usp=sharing
>>>> *JIRA:*  https://issues.apache.org/jira/browse/SPARK-50532
>>>>
>>>> Please vote on the SPIP for the next 72 hours:
>>>>
>>>> [ ] +1: Accept the proposal as an official SPIP
>>>> [ ] +0
>>>> [ ] -1: I don’t think this is a good idea because
>>>>
>>>
>>>
>>

Reply via email to