emkornfield commented on code in PR #496:
URL: https://github.com/apache/parquet-format/pull/496#discussion_r2156105806
##########
src/main/thrift/parquet.thrift:
##########
@@ -461,6 +461,29 @@ struct GeographyType {
2: optional EdgeInterpolationAlgorithm algorithm;
}
+/**
+ * Year-Month Interval logical type annotation
+ *
+ * The data is stored as an 4 byte signed integer which represents the number
+ * of months associated with the time interval. The value can be negative to
+ * indicate a backward duration.
+ *
+ * Allowed for physical type: INT32
+ */
+struct IntervalYearMonthType {
+}
+
+/**
+ * Month-Day Interval logical type annotation
+ *
+ * The data is stored as a 16-byte signed value, which represents the number
Review Comment:
Short answer is MonthDayNanos more closely mimics Postgres's notion of
interval, where each field is separate (so a day is not guaranteed to be 24
hours). Apparently, according the SQL spec (which is not publicly available,
and I don't currently have access to and at least some [other DB
providers](https://stackoverflow.com/questions/61505068/is-interval-1-day-always-equal-to-interval-24-hours)
also follow the convention) is that a day is always 24 hours/86400 seconds as
documented.
The main difference that arise from the representations are:
1. MonthDayNanos in arrow allows for days that are not 24 hours, and you
can have more the 86400 seconds in the nanos field. One short-coming of the
arrow representation is it can't represent +/- 10000 years at nanosecond
precision which is typical in SQL.
2. This representations makes it so everything can be normalized. But I
think a fair question is why not encode it as days + nanoseconds and do the
multiplication instead of division to get backs (in this case I think it just
means more math then applying nanosecond addition directly to nanosecond
timestamps).
That being said, I think there might be a place for all 3 of these
representations just like we have all three in arrow, if someone wants to add
the third.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]