I think we should validate optionally in ValidateFull in C++. I think to validate unconditionally would be too computationally expensive
https://issues.apache.org/jira/browse/ARROW-9705 On Tue, Aug 11, 2020 at 1:34 PM Eric Erhardt <eric.erha...@microsoft.com.invalid> wrote: > > Thanks for the info, Wes. > > Looking through the Java implementation, I don't see any validation that > "where the values are evenly divisible by 86400000" is enforced in > DateMilliVector. We are having a conversation on the C# implementation > whether we should allow values that are not evenly divisible by 86400000. > > https://github.com/apache/arrow/pull/7654#discussion_r463886892 > > I'm wondering if C# should allow any values in Date64, or if it should > force/coerce the values to be divisible by 86400000. > > It doesn't look to me that C++ or Java have these enforcements. How do other > languages handle this? > > Eric > > -----Original Message----- > From: Wes McKinney <wesmck...@gmail.com> > Sent: Tuesday, August 11, 2020 12:18 PM > To: dev <dev@arrow.apache.org> > Subject: [EXTERNAL] Re: Value of Date64 type over Date32 > > On Mon, Aug 10, 2020 at 6:19 PM Eric Erhardt > <eric.erha...@microsoft.com.invalid> wrote: > > > > I don't understand what the value of the Date64 type is over using Date32: > > > > From > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith > > ub.com%2Fapache%2Farrow%2Fblob%2Fmaster%2Fformat%2FSchema.fbs%23L193-L > > 206&data=02%7C01%7CEric.Erhardt%40microsoft.com%7Cc8a2cc1d706349ab > > 0d5408d83e1a9fb4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C63732763 > > 1350456279&sdata=AzQj1SEjvsIcoMSbGTFi1rubuJyoL955zcpEvRLSKWg%3D&am > > p;reserved=0 > > > > enum DateUnit: short { > > DAY, > > MILLISECOND > > } > > > > /// Date is either a 32-bit or 64-bit type representing elapsed time > > since UNIX /// epoch (1970-01-01), stored in either of two units: > > /// > > /// * Milliseconds (64 bits) indicating UNIX time elapsed since the epoch > > (no > > /// leap seconds), where the values are evenly divisible by 86400000 > > /// * Days (32 bits) since the UNIX epoch table Date { > > unit: DateUnit = MILLISECOND; > > } > > > > If the spec specifies that Date64 must be evenly divisible by 86400000, I > > don't see the point in using millisecond units. I can't represent any > > different information in my data. So why would I take up double the space > > to represent the same information? > > > > Can someone explain when Date64 is useful? > > As I recall the motivation of the date64 type is to allow for zero-copy of > dates-as-milliseconds, which are used in some other libraries / platforms. > For example Joda in uses a millisecond-based "instant". I'm not sure which > others do off hand. > > That said, it would be perfectly reasonable for a data processing system to > use date32 throughout and convert any date64 data to date32 if desired. > > > Eric