Imran, Thanks for sharing this. When working on interop between Spark and Pandas/Arrow in the past, we also faced some issues due to the different definitions of timestamp in Spark and Pandas/Arrow, because Spark timestamp has Instant semantics and Pandas/Arrow timestamp has either LocalDateTime or OffsetDateTime semantics. (Detailed discussion is in the PR: https://github.com/apache/spark/pull/18664#issuecomment-316554156.)
For one I am excited to see this effort going but also would love to see interop of Python to be included/considered in the picture. I don't think it adds much to what has already been proposed already because Python timestamps are basically LocalDateTime or OffsetDateTime. Li On Thu, Dec 6, 2018 at 11:03 AM Imran Rashid <iras...@cloudera.com.invalid> wrote: > Hi, > > I'd like to discuss the future of timestamp support in Spark, in > particular with respect of handling timezones in different SQL types. In > a nutshell: > > * There are at least 3 different ways of handling the timestamp type > across timezone changes > * We'd like Spark to clearly distinguish the 3 types (it currently > implements 1 of them), in a way that is backwards compatible, and also > compliant with the SQL standard. > * We'll get agreement across Spark, Hive, and Impala. > > Zoltan Ivanfi (Parquet PMC, also my coworker) has written up a detailed > doc, describing the problem in more detail, the state of various SQL > engines, and how we can get to a better state without breaking any current > use cases. The proposal is good for Spark by itself. We're also going to > the Hive & Impala communities with this proposal, as its better for > everyone if everything is compatible. > > Note that this isn't proposing a specific implementation in Spark as yet, > just a description of the overall problem and our end goal. We're going to > each community to get agreement on the overall direction. Then each > community can figure out specifics as they see fit. (I don't think there > are any technical hurdles with this approach eg. to decide whether this > would be even possible in Spark.) > > Here's a link to the doc Zoltan has put together. It is a bit long, but > it explains how such a seemingly simple concept has become such a mess and > how we can get to a better state. > > > https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.dq3b1mwkrfky > > Please review the proposal and let us know your opinions, concerns and > suggestions. > > thanks, > Imran >