I think this is the right direction to go, but I'm wondering how can Spark
support these new types if the underlying data sources(like parquet files)
do not support them yet.

I took a quick look at the new doc for file formats, but not sure what's
the proposal. Are we going to implement these new types in Parquet/Orc
first? Or are we going to use low-level physical types directly and add
Spark-specific metadata to Parquet/Orc files?

On Wed, Feb 20, 2019 at 10:57 PM Zoltan Ivanfi <z...@cloudera.com.invalid>
wrote:

> Hi,
>
> Last december we shared a timestamp harmonization proposal
> <https://goo.gl/VV88c5> with the Hive, Spark and Impala communities. This
> was followed by an extensive discussion in January that lead to various
> updates and improvements to the proposal, as well as the creation of a new
> document for file format components. February has been quiet regarding this
> topic and the latest revision of the proposal has been steady in the recent
> weeks.
>
> In short, the following is being proposed (please see the document for
> details):
>
>    - The TIMESTAMP WITHOUT TIME ZONE type should have LocalDateTime
>    semantics.
>    - The TIMESTAMP WITH LOCAL TIME ZONE type should have Instant
>    semantics.
>    - The TIMESTAMP WITH TIME ZONE type should have OffsetDateTime
>    semantics.
>
> This proposal is in accordance with the SQL standard and many major DB
> engines.
>
> Based on the feedback we got I believe that the latest revision of the
> proposal addresses the needs of all affected components, therefore I would
> like to move forward and create JIRA-s and/or roadmap documentation pages
> for the desired semantics of the different SQL types according to the
> proposal.
>
> Please let me know if you have any remaning concerns about the proposal or
> about the course of action outlined above.
>
> Thanks,
>
> Zoltan
>

Reply via email to