gh-yzou commented on code in PR #496:
URL: https://github.com/apache/parquet-format/pull/496#discussion_r2162697458
##########
LogicalTypes.md:
##########
@@ -521,7 +521,12 @@ as shown below.
</tr>
</table>
-### INTERVAL
+### Interval types
+
+#### INTERVAL
+
+`INTERVAL` is *deprecated*. Please use `INTERVAL_YEAR_MONTH` and
`INTERVAL_DAY_TIME`
Review Comment:
```
The new type is less trivially convertible from Arrow's month_day_nano,
though.
```
I think the problem exists regardless whether have two types
"YearMonthInterval" and "DayTimeInterval", or just have MonthDayNano type.
Since the number of days in a month changes, converting from MonthDayNano to
YearMonthInterval or DayTimeInterval is not possible unless we force that
either the Month component is empty or the day nano components are empty.
If we want to provide full interoperability across different engines with
different types, we might eventually need to support all three types.
##########
LogicalTypes.md:
##########
@@ -539,6 +544,26 @@ The sort order used for `INTERVAL` is undefined. When
writing data, no min/max
statistics should be saved for this type and if such non-compliant statistics
are found during reading, they must be ignored.
+#### INTERVAL_YEAR_MONTH
+
+`INTERVAL_YEAR_MONTH` is used to represent a year-month time interval, such as
+`4 years and 6 months`. It must annotate an `int32` that stores the total
number
+of months as a signed integer, which represents the interval and can be
negative.
+The time duration is independent of any timezone.
+
+#### INTERVAL_DAY_TIME
+
+`INTERVAL_DAY_TIME` is used to represent a day-time time interval, such as
+`5 days, 10 hours and 30 minutes`. It must annotate a 16-byte
`FIXED_LEN_BYTE_ARRAY`
+that stores the total number of nanoseconds representing the interval. The
value is
Review Comment:
The name is mainly choose based on the SQL Standard, and which is also close
to the type name used by engines like spark. I think the type name doesn't have
to be exactly the same as the underlying storage used. Similarly the
YearMonthInterval type stores the total number of months.
##########
src/main/thrift/parquet.thrift:
##########
@@ -461,6 +461,29 @@ struct GeographyType {
2: optional EdgeInterpolationAlgorithm algorithm;
}
+/**
+ * Year-Month Interval logical type annotation
+ *
+ * The data is stored as an 4 byte signed integer which represents the number
+ * of months associated with the time interval. The value can be negative to
+ * indicate a backward duration.
+ *
+ * Allowed for physical type: INT32
+ */
+struct IntervalYearMonthType {
+}
+
+/**
+ * Month-Day Interval logical type annotation
+ *
+ * The data is stored as a 16-byte signed value, which represents the number
Review Comment:
@lidavidm Regarding to the point about "can we unify DayTimeInterval and
MonthDayNano", Is the suggestion that we can just have MonthDayNano type, and
DayTimeIntervalType will only not have month component?
If that is the case, it does sounds more complicated to me, because that two
type is only interoperable when the month component of MonthDayNano is empty.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]