On Mon, Aug 10, 2020 at 6:19 PM Eric Erhardt <eric.erha...@microsoft.com.invalid> wrote: > > I don't understand what the value of the Date64 type is over using Date32: > > From https://github.com/apache/arrow/blob/master/format/Schema.fbs#L193-L206 > > enum DateUnit: short { > DAY, > MILLISECOND > } > > /// Date is either a 32-bit or 64-bit type representing elapsed time since > UNIX > /// epoch (1970-01-01), stored in either of two units: > /// > /// * Milliseconds (64 bits) indicating UNIX time elapsed since the epoch (no > /// leap seconds), where the values are evenly divisible by 86400000 > /// * Days (32 bits) since the UNIX epoch > table Date { > unit: DateUnit = MILLISECOND; > } > > If the spec specifies that Date64 must be evenly divisible by 86400000, I > don't see the point in using millisecond units. I can't represent any > different information in my data. So why would I take up double the space to > represent the same information? > > Can someone explain when Date64 is useful?
As I recall the motivation of the date64 type is to allow for zero-copy of dates-as-milliseconds, which are used in some other libraries / platforms. For example Joda in uses a millisecond-based "instant". I'm not sure which others do off hand. That said, it would be perfectly reasonable for a data processing system to use date32 throughout and convert any date64 data to date32 if desired. > Eric