Thanks for the summary @Micah, also sorry I couldn't be in the meeting yesterday. I do hope we can get the wider physical type size for both Duration and Timestamp Nanos in the near future as well.
On Thu, Jul 10, 2025 at 11:49 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > As an update/TL;DR; The current proposal in parquet is (there is still > some active discussions): > > Two new logical types. > 1. YearMonth interval annotates an int32 (interval is stored as the > number of months) > 2. DurationNanos annotates an int64 (interval is stored as the number of > nanoseconds) > > Two key decisions here are: > 1. not block progress on defining a larger width physical type (either > FLBA[10 or16] or int128). This topic is being discussed separately in > Parquet. This means that the storable value will not fulfill ANSI SQL's +/- > 10000 year requirement but still provide a reasonable range. > 2. The naming of DurationNanos better reflects that the semantics at > least as proposed do not involve any "calendar logic" (a day is always 24 > hours). While the ANSI SQL standard uses these definitions other engines > record the number of days separately (and days can be more than or less > than 24 hours). > > Iceberg is not bound by the naming convention chosen in Parquet but if > there are strong opinions, either of these decisions please chime in on > the Parquet discussion ( > https://lists.apache.org/thread/n8jdft4mltdcf91v7t8qf1hz5cg8nbnz). > Hopefully that will help avoid rehashing the conversations on this mailing > list. > > Thanks, > Micah > > > > On Thu, Jul 3, 2025 at 6:33 PM yun zou <yunzou.colost...@gmail.com> wrote: > >> Hi Laurent, >> >> Thank you for raising the Parquet and Arrow compatibility topic. The >> discussion is currently ongoing in the Parquet community. >> You can follow the thread here: >> https://lists.apache.org/thread/n8jdft4mltdcf91v7t8qf1hz5cg8nbnz >> >> Best Regards, >> Yun >> >> On Thu, Jul 3, 2025 at 8:42 AM Laurent Goujon <laur...@dremio.com.invalid> >> wrote: >> >>> Like Russell, addition of new types which are widely used in analytics >>> seems like a good thing. >>> >>> The document still has various open comments regarding the >>> representation, and so I wonder if things have been settled or not. I'm >>> also curious if this proposal will also be joined by proposals on Parquet >>> and Arrow projects to align the types and representations, similar to what >>> happened with the variant type? >>> >>> Laurent >>> >>> >>> On Wed, Jun 18, 2025 at 3:46 PM yun zou <yunzou.colost...@gmail.com> >>> wrote: >>> >>>> Dear Community, >>>> >>>> I would like to bump this thread for the discussion of adding Interval >>>> Type support. >>>> >>>> How does everyone feel about moving forward with the support of >>>> Year-Month and Day-Time Intervals, especially for the part about having >>>> 16-byte signed values to represent nanoseconds. >>>> >>>> The change will first be made on the parquet community, and here is the >>>> PR : >>>> https://github.com/apache/parquet-format/pull/496/files >>>> >>>> Please feel free to provide any feedback or suggestions! >>>> >>>> Best Regards, >>>> Yun >>>> >>>> >>>> >>>> On Mon, Apr 21, 2025 at 10:29 AM Russell Spitzer < >>>> russell.spit...@gmail.com> wrote: >>>> >>>>> I think this is a pretty good idea for us to adopt in terms of >>>>> compatibility with other systems >>>>> and I really appreciate that Naren made sure to use a broad enough >>>>> definition to support all >>>>> available engines. I'm really interested to know how other folks feel >>>>> about this proposal and >>>>> I hope we can reach some common ground here. >>>>> >>>>> On Mon, Apr 21, 2025 at 12:24 PM Naren Krishna >>>>> <naren.kris...@snowflake.com.invalid> wrote: >>>>> >>>>>> Dear Community, >>>>>> >>>>>> I want to propose the addition of the Interval types to the Iceberg >>>>>> spec. A value of an Interval type represents a duration of time, and can >>>>>> be >>>>>> calculated by the difference between two dates or times. Intervals are >>>>>> supported across a variety of different engines (e.g. Parquet, Spark, >>>>>> Arrow, Oracle, Snowflake) and are widely used in time-series analysis for >>>>>> calculations and comparisons of time spans and date arithmetic. >>>>>> >>>>>> For more information, see this high-level proposal >>>>>> <https://docs.google.com/document/d/12ghQxWxyAhSQeZyy0IWiwJ02gTqFOgfYm8x851HZFLk/edit?usp=sharing> >>>>>> providing a recommendation to build Interval types in Iceberg following >>>>>> the >>>>>> ANSI SQL standard specification. Per ANSI SQL standard, this proposal >>>>>> recommends the creation of two types of Intervals: Year-Month and >>>>>> Day-Time >>>>>> Intervals. The linked document also details the implementations of >>>>>> Interval >>>>>> types in various engines and is intended to spur discussion in the >>>>>> Iceberg >>>>>> community. >>>>>> >>>>>> Thanks, >>>>>> Naren Krishna >>>>>> >>>>>