Thank you for starting this discussion. Clearly the Hive semantics on timestamp are very messed up, but has been moving in the right direction of becoming more SQL standard compliant. I'm pulling this discussion back to the list rather than the personal GoogleDoc, which isn't very collaborative.
I like your breakdown of the semantics: - Instant - point in time that will appear different depending on the reader time zone - LocalDateTime - consistent hour and minute regardless of the reader time zone. - OffsetDateTime - consistent hour and minute with the offset of the writer time zone The SQL standard has: - Timestamp & Timestamp without time zone = LocalDateTime - Timestamp with time zone = OffsetDateTime Hive 2 had very confused semantics for timestamp: - When storage was ORC, text, or RCFile with a text serde it was LocalDateTime - When storage was Avro, Parquet, or RCFile with a binary serde it was Instant Hive 3.1 has moved toward the SQL standard extended with Oracles' timestamp with local time zone: - Timestamp = LocalDateTime - Timestamp with local time zone = Instant This leaves us with a few problems: - The Hive bindings to Parquet and Avro don't handle timestamps correctly. - ORC doesn't support timestamps with local time zone. I start working on it in ORC-189. - We don't have timestamp with time zone support. .. Owen On Thu, Dec 6, 2018 at 7:55 AM Marta Kuczora <kuczo...@cloudera.com.invalid> wrote: > Hi Hive Community, > > I would like to share the following document on our "Consistent Timestamp > types in Hadoop" plans for review. > > https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit > > With this plan we would like to get an agreement on consistent timestamp > behavior on Hive, Spark and Impala and in order to achieve this, we are > sharing this document with all three communities. > > Please review and comment, any feedback is much appreciated! > > Regards, > Marta >