gh-yzou commented on code in PR #496:
URL: https://github.com/apache/parquet-format/pull/496#discussion_r2162697458


##########
LogicalTypes.md:
##########
@@ -521,7 +521,12 @@ as shown below.
     </tr>
 </table>
 
-### INTERVAL
+### Interval types
+
+#### INTERVAL
+
+`INTERVAL` is *deprecated*. Please use `INTERVAL_YEAR_MONTH` and 
`INTERVAL_DAY_TIME`

Review Comment:
   ```
   The new type is less trivially convertible from Arrow's month_day_nano, 
though.
   ```
   I think the problem exists regardless whether have  two types 
"YearMonthInterval" and "DayTimeInterval", or just have MonthDayNano type.  
Since the number of days in a month changes, converting from MonthDayNano to 
YearMonthInterval or DayTimeInterval is not possible unless we force that 
either the Month component is empty or the day nano components are empty. 
   If we want to provide full interoperability across different engines with 
different types, we might eventually need to support all three types.



##########
LogicalTypes.md:
##########
@@ -539,6 +544,26 @@ The sort order used for `INTERVAL` is undefined. When 
writing data, no min/max
 statistics should be saved for this type and if such non-compliant statistics
 are found during reading, they must be ignored.
 
+#### INTERVAL_YEAR_MONTH
+
+`INTERVAL_YEAR_MONTH` is used to represent a year-month time interval, such as
+`4 years and 6 months`. It must annotate an `int32` that stores the total 
number
+of months as a signed integer, which represents the interval and can be 
negative.
+The time duration is independent of any timezone.
+
+#### INTERVAL_DAY_TIME
+
+`INTERVAL_DAY_TIME` is used to represent a day-time time interval, such as
+`5 days, 10 hours and 30 minutes`. It must annotate a 16-byte 
`FIXED_LEN_BYTE_ARRAY`
+that stores the total number of nanoseconds representing the interval. The 
value is

Review Comment:
   The name is mainly choose based on the SQL Standard, and which is also close 
to the type name used by engines like spark. I think the type name doesn't have 
to be exactly the same as the underlying storage used. Similarly the 
YearMonthInterval type stores the total number of months.



##########
src/main/thrift/parquet.thrift:
##########
@@ -461,6 +461,29 @@ struct GeographyType {
   2: optional EdgeInterpolationAlgorithm algorithm;
 }
 
+/**
+ * Year-Month Interval logical type annotation
+ *
+ * The data is stored as an 4 byte signed integer which represents the number
+ * of months associated with the time interval. The value can be negative to
+ * indicate a backward duration.
+ *
+ * Allowed for physical type: INT32
+ */
+struct IntervalYearMonthType {
+}
+
+/**
+ * Month-Day Interval logical type annotation
+ *
+ * The data is stored as a 16-byte signed value, which represents the number

Review Comment:
   @lidavidm Regarding to the point about "can we unify DayTimeInterval and 
MonthDayNano", Is the suggestion that we can just have MonthDayNano type, and 
DayTimeIntervalType will only not have month component? 
   If that is the case, it does sounds more complicated to me, because that two 
type is only interoperable when the month component of MonthDayNano is empty.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to