As an update/TL;DR; The current proposal in parquet is (there is still some
active discussions):

Two new logical types.
1.  YearMonth interval annotates an int32 (interval is stored as the number
of months)
2.  DurationNanos annotates an int64 (interval is stored as the number of
nanoseconds)

Two key decisions here are:
1.  not block progress on defining a larger width physical type (either
FLBA[10 or16] or int128).  This topic is being discussed separately in
Parquet. This means that the storable value will not fulfill ANSI SQL's +/-
10000  year requirement but still provide a reasonable range.
2.  The naming of DurationNanos better reflects that the semantics at least
as proposed do not involve any "calendar logic" (a day is always 24
hours).  While the ANSI SQL standard uses these definitions other engines
record the number of days separately (and days can be more than or less
than 24 hours).

Iceberg is not bound by the naming convention chosen in Parquet but if
there are strong opinions, either of  these decisions please chime in on
the Parquet discussion (
https://lists.apache.org/thread/n8jdft4mltdcf91v7t8qf1hz5cg8nbnz).
Hopefully that will help avoid rehashing the conversations on this mailing
list.

Thanks,
Micah



On Thu, Jul 3, 2025 at 6:33 PM yun zou <yunzou.colost...@gmail.com> wrote:

> Hi Laurent,
>
> Thank you for raising the Parquet and Arrow compatibility topic. The
> discussion is currently ongoing in the Parquet community.
> You can follow the thread here:
> https://lists.apache.org/thread/n8jdft4mltdcf91v7t8qf1hz5cg8nbnz
>
> Best Regards,
> Yun
>
> On Thu, Jul 3, 2025 at 8:42 AM Laurent Goujon <laur...@dremio.com.invalid>
> wrote:
>
>> Like Russell, addition of new types which are widely used in analytics
>> seems like a good thing.
>>
>> The document still has various open comments regarding the
>> representation, and so I wonder if things have been settled or not. I'm
>> also curious if this proposal will also be joined by proposals on Parquet
>> and Arrow projects to align the types and representations, similar to what
>> happened with the variant type?
>>
>> Laurent
>>
>>
>> On Wed, Jun 18, 2025 at 3:46 PM yun zou <yunzou.colost...@gmail.com>
>> wrote:
>>
>>> Dear Community,
>>>
>>> I would like to bump this thread for the discussion of adding Interval
>>> Type support.
>>>
>>> How does everyone feel about moving forward with the support of
>>> Year-Month and Day-Time Intervals, especially for the part about having
>>> 16-byte signed values to represent nanoseconds.
>>>
>>> The change will first be made on the parquet community, and here is the
>>> PR :
>>> https://github.com/apache/parquet-format/pull/496/files
>>>
>>> Please feel free to provide any feedback or suggestions!
>>>
>>> Best Regards,
>>> Yun
>>>
>>>
>>>
>>> On Mon, Apr 21, 2025 at 10:29 AM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> I think this is a pretty good idea for us to adopt in terms of
>>>> compatibility with other systems
>>>> and I really appreciate that Naren made sure to use a broad enough
>>>> definition to support all
>>>> available engines. I'm really interested to know how other folks feel
>>>> about this proposal and
>>>> I hope we can reach some common ground here.
>>>>
>>>> On Mon, Apr 21, 2025 at 12:24 PM Naren Krishna
>>>> <naren.kris...@snowflake.com.invalid> wrote:
>>>>
>>>>> Dear Community,
>>>>>
>>>>> I want to propose the addition of the Interval types to the Iceberg
>>>>> spec. A value of an Interval type represents a duration of time, and can 
>>>>> be
>>>>> calculated by the difference between two dates or times. Intervals are
>>>>> supported across a variety of different engines (e.g. Parquet, Spark,
>>>>> Arrow, Oracle, Snowflake) and are widely used in time-series analysis for
>>>>> calculations and comparisons of time spans and date arithmetic.
>>>>>
>>>>> For more information, see this high-level proposal
>>>>> <https://docs.google.com/document/d/12ghQxWxyAhSQeZyy0IWiwJ02gTqFOgfYm8x851HZFLk/edit?usp=sharing>
>>>>> providing a recommendation to build Interval types in Iceberg following 
>>>>> the
>>>>> ANSI SQL standard specification. Per ANSI SQL standard, this proposal
>>>>> recommends the creation of two types of Intervals: Year-Month and Day-Time
>>>>> Intervals. The linked document also details the implementations of 
>>>>> Interval
>>>>> types in various engines and is intended to spur discussion in the Iceberg
>>>>> community.
>>>>>
>>>>> Thanks,
>>>>> Naren Krishna
>>>>>
>>>>

Reply via email to