Thank you for starting this discussion. Clearly the Hive semantics on
timestamp are very messed up, but has been moving in the right direction of
becoming more SQL standard compliant. I'm pulling this discussion back to
the list rather than the personal GoogleDoc, which isn't very
collaborative.

I like your breakdown of the semantics:

   - Instant - point in time that will appear different depending on the
   reader time zone
   - LocalDateTime - consistent hour and minute regardless of the reader
   time zone.
   - OffsetDateTime - consistent hour and minute with the offset of the
   writer time zone

The SQL standard has:

   - Timestamp & Timestamp without time zone = LocalDateTime
   - Timestamp with time zone = OffsetDateTime

Hive 2 had very confused semantics for timestamp:

   - When storage was ORC, text, or RCFile with a text serde it was
   LocalDateTime
   - When storage was Avro, Parquet, or RCFile with a binary serde it was
   Instant

Hive 3.1 has moved toward the SQL standard extended with Oracles' timestamp
with local time zone:

   - Timestamp = LocalDateTime
   - Timestamp with local time zone = Instant

This leaves us with a few problems:

   - The Hive bindings to Parquet and Avro don't handle timestamps
   correctly.
   - ORC doesn't support timestamps with local time zone. I start working
   on it in ORC-189.
   - We don't have timestamp with time zone support.

.. Owen

On Thu, Dec 6, 2018 at 7:55 AM Marta Kuczora <kuczo...@cloudera.com.invalid>
wrote:

> Hi Hive Community,
>
> I would like to share the following document on our "Consistent Timestamp
> types in Hadoop" plans for review.
>
> https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit
>
> With this plan we would like to get an agreement on consistent timestamp
> behavior on Hive, Spark and Impala and in order to achieve this, we are
> sharing this document with all three communities.
>
> Please review and comment, any feedback is much appreciated!
>
> Regards,
> Marta
>

Reply via email to