Imran,

Thanks for sharing this. When working on interop between Spark and
Pandas/Arrow in the past, we also faced some issues due to the different
definitions of timestamp in Spark and Pandas/Arrow, because Spark timestamp
has Instant semantics and Pandas/Arrow timestamp has either LocalDateTime
or OffsetDateTime semantics. (Detailed discussion is in the PR:
https://github.com/apache/spark/pull/18664#issuecomment-316554156.)

For one I am excited to see this effort going but also would love to see
interop of Python to be included/considered in the picture. I don't think
it adds much to what has already been proposed already because Python
timestamps are basically LocalDateTime or OffsetDateTime.

Li



On Thu, Dec 6, 2018 at 11:03 AM Imran Rashid <iras...@cloudera.com.invalid>
wrote:

> Hi,
>
> I'd like to discuss the future of timestamp support in Spark, in
> particular with respect of handling timezones in different SQL types.   In
> a nutshell:
>
> * There are at least 3 different ways of handling the timestamp type
> across timezone changes
> * We'd like Spark to clearly distinguish the 3 types (it currently
> implements 1 of them), in a way that is backwards compatible, and also
> compliant with the SQL standard.
> * We'll get agreement across Spark, Hive, and Impala.
>
> Zoltan Ivanfi (Parquet PMC, also my coworker) has written up a detailed
> doc, describing the problem in more detail, the state of various SQL
> engines, and how we can get to a better state without breaking any current
> use cases.  The proposal is good for Spark by itself.  We're also going to
> the Hive & Impala communities with this proposal, as its better for
> everyone if everything is compatible.
>
> Note that this isn't proposing a specific implementation in Spark as yet,
> just a description of the overall problem and our end goal.  We're going to
> each community to get agreement on the overall direction.  Then each
> community can figure out specifics as they see fit.  (I don't think there
> are any technical hurdles with this approach eg. to decide whether this
> would be even possible in Spark.)
>
> Here's a link to the doc Zoltan has put together.  It is a bit long, but
> it explains how such a seemingly simple concept has become such a mess and
> how we can get to a better state.
>
>
> https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.dq3b1mwkrfky
>
> Please review the proposal and let us know your opinions, concerns and
> suggestions.
>
> thanks,
> Imran
>

Reply via email to