Trying to catch up on this, Serge's suggestion in the doc seems the best
way forward,
https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?disco=AAABe5AUnWU.
Spark would support the full ANSI SQL timestamp range, and Iceberg /
Parquet/ other data source will throw runtime error if it trying to write a
value outside its supported range, until we get a wider timestamp type in
Parquet (Iceberg's V3 timestamp_ns type is just built on top of that)

Thanks,
Szehon

On Thu, Mar 27, 2025 at 9:45 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> I think the key issue is the format. The proposed 10-byte format doesn't
>> seem like a standard and the one in Iceberg/Parquet does not support the
>> required range by ANSI SQL: year 0001 to year 9999. We should address this
>> issue first. Note that Parquet has an INT96 timestamp that supports
>> nanosecond precision, but it's deprecated. Shall we work with the Parquet
>> community to revive it?
>
>
> It would be great to discuss a plan for this in parquet.  This has come up
> in passing in some of the recent parquet syncs.  I don't think resurrecting
> int96 is necessarily a great idea since it is defined in terms of Julian
> days [1], and most systems these days are standardizing on
> proleptic-Gregorian.
>
> A fair number of OSS implementations that do interact with int96 I've seen
> do conversion assuming all timestamps are post Unix epoch timestamps and
> therefore have errors/idiosyncrasies when translating dates prior to the
> Gregorian cutover.
>
> Cheers,
> Micah
>
> [1] https://github.com/apache/parquet-format/pull/49
>
> On Thu, Mar 27, 2025 at 7:02 PM Wenchen Fan <cloud0...@gmail.com> wrote:
>
>> Maybe we should discuss the key issues on the dev list as it's easy to
>> lose track of Google Doc comments.
>>
>> I think all the proposals for adding new data types need to prove that
>> the new data type is common/standard in the ecosystem. This means 3 things:
>> - it has common/standard semantic. TIMESTAMP with nanosecond precision is
>> definitely a standard data type, in both ANSI SQL and mainstream databases.
>> - it has common/standard storage format. Parquet/Iceberg supports
>> nanosecond timestamp using int64, which is different from what is proposed
>> here.
>> - it has common/standard processing methods. The java datetime library
>> Spark is using now already support nanosecond, so we are fine here.
>>
>> I think the key issue is the format. The proposed 10-byte format doesn't
>> seem like a standard and the one in Iceberg/Parquet does not support the
>> required range by ANSI SQL: year 0001 to year 9999. We should address this
>> issue first. Note that Parquet has an INT96 timestamp that supports
>> nanosecond precision, but it's deprecated. Shall we work with the Parquet
>> community to revive it?
>>
>> On Fri, Mar 28, 2025 at 7:03 AM DB Tsai <dbt...@dbtsai.com> wrote:
>>
>>> Thanks!!!
>>>
>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Mar 27, 2025, at 3:56 PM, Qi Tan <qi.tan.j...@gmail.com> wrote:
>>>
>>> Thanks DB,
>>>
>>> I just noticed a few more comments came in after I initiated the vote.
>>> I'm going to postpone the voting process and address those outstanding
>>> comments.
>>>
>>> Qi Tan
>>>
>>> DB Tsai <dbt...@dbtsai.com> 于2025年3月27日周四 15:12写道:
>>>
>>>> Hello Qi,
>>>>
>>>> I'm supportive of the NanoSecond Timestamps proposal; however, before
>>>> we initiate the vote, there are a few outstanding comments in the SPIP
>>>> document that haven't been addressed yet. Since the vote is on the document
>>>> itself, could we resolve these items beforehand?
>>>>
>>>> For example:
>>>>
>>>>    -
>>>>
>>>>    The default precision of TimestampNsNTZType is set to 6, which
>>>>    overlaps with the existing TimestampNTZ.
>>>>    -
>>>>
>>>>    The specified range exceeds the capacity of an int64, but the
>>>>    document doesn't clarify how this type will be represented in memory or
>>>>    serialized in data sources.
>>>>    -
>>>>
>>>>    Schema inference details for data sources are missing.
>>>>
>>>> These points still need discussion.
>>>>
>>>> I appreciate your efforts in putting the doc together and look forward
>>>> to your contribution!
>>>>
>>>> Thanks,
>>>> DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1
>>>>
>>>> On Mar 27, 2025, at 1:24 PM, huaxin gao <huaxin.ga...@gmail.com> wrote:
>>>>
>>>> +1
>>>>
>>>> On Thu, Mar 27, 2025 at 1:22 PM Qi Tan <qi.tan.j...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I would like to start a vote on adding support for nanoseconds
>>>>> timestamps.
>>>>>
>>>>> *Discussion thread: *
>>>>> https://lists.apache.org/thread/y2vzrjl1499j5dvbpg3m81jxdhf4b6of
>>>>> *SPIP:*
>>>>> https://docs.google.com/document/d/1wjFsBdlV2YK75x7UOk2HhDOqWVA0yC7iEiqOMnNnxlA/edit?usp=sharing
>>>>> *JIRA:*  https://issues.apache.org/jira/browse/SPARK-50532
>>>>>
>>>>> Please vote on the SPIP for the next 72 hours:
>>>>>
>>>>> [ ] +1: Accept the proposal as an official SPIP
>>>>> [ ] +0
>>>>> [ ] -1: I don’t think this is a good idea because
>>>>>
>>>>
>>>>
>>>

Reply via email to