Re: Inconsistent schema on Encoders.bean (reported issues from user@)

2020-06-26 Thread Jungtaek Lim
Somehow I just revisited the issue, and realized the issue is resolved in Spark 3.0.0. ExpressionEncoder is refactored in Spark 3.0.0 and schema is removed as a part of refactor, which seems to be a root cause as schema and the data types of serializer don't match in such case. ExpressionEncoder in

Re: [Spark SQL] Question about support for TimeType columns in Apache Parquet files

2020-06-26 Thread Bart Samwel
On Fri, Jun 26, 2020 at 3:38 PM Maxim Gekk wrote: > Hi Bart, > > > Isn't it a best practice to have those event timestamps be recorded in > UTC? > > The triple (date, time, time zone) from my examples can be mapped to > timestamp in UTC w/o ambigues (almost). Having separate types for dates and >

Re: [Spark SQL] Question about support for TimeType columns in Apache Parquet files

2020-06-26 Thread Maxim Gekk
Hi Bart, > Isn't it a best practice to have those event timestamps be recorded in UTC? The triple (date, time, time zone) from my examples can be mapped to timestamp in UTC w/o ambigues (almost). Having separate types for dates and times allows us to de-normalize your data logically, and simplify

Re: [Spark SQL] Question about support for TimeType columns in Apache Parquet files

2020-06-26 Thread Bart Samwel
On Fri, Jun 26, 2020 at 12:24 PM Maxim Gekk wrote: > Hi Bart, > > > But is it useful by itself? Not that much. > > I see at least the following use cases. Let's say we need to analyze some > events from devices installed in different places or time zones like > > At Europe/Amsterdam: > (2020-06-2

Re: [Spark SQL] Question about support for TimeType columns in Apache Parquet files

2020-06-26 Thread Maxim Gekk
Hi Bart, > But is it useful by itself? Not that much. I see at least the following use cases. Let's say we need to analyze some events from devices installed in different places or time zones like At Europe/Amsterdam: (2020-06-25, 08:10, event1) (2020-06-26, 12:10, event1) (2020-06-26, 18:30, ev

Re: [Spark SQL] Question about support for TimeType columns in Apache Parquet files

2020-06-26 Thread Bart Samwel
I can't comment on that myself, I haven't been part of the community so I don't know what is customary for this kind of thing. W.r.t. "compatibility with Parquet's TimeType", I'd like to argue that that isn't a use case by itself. The use case is "what people do with it". All in all, TIME is just c