Thanks Enrico, Magnus

On Thu, Apr 2, 2020 at 11:49 AM Enrico Minack <m...@enrico.minack.dev>
wrote:

> Once parsed into a Timestamp the timestamp is store internally as UTC and
> printed as your local timezone (e.g. as defined by
> spark.sql.session.timeZone). Spark is good at hiding timezone information
> from you.
>
> You can get the timezone information via date_format(column, format):
>
> import org.apache.spark.sql.types.TimestampType
> import org.apache.spark.sql.functions._
>
> val sampleDF = Seq("2020-04-11T20:40:00-05:00").toDF("value")
> val timestampDF = sampleDF.select($"value".cast(TimestampType))
> timestampDF.select(date_format($"value",
> "yyyy-MM-dd'T'HH:mm:ssZZZZ")).show(false)
> +---------------------------------------------+
> |date_format(value, yyyy-MM-dd'T'HH:mm:ssZZZZ)|
> +---------------------------------------------+
> |2020-04-12T03:40:00+0200                     |
> +---------------------------------------------+
>
> If you want the timezone only, use timestampDF.select(date_format($"value",
> "ZZZZ")).show.
> +------------------------+
> |date_format(value, ZZZZ)|
> +------------------------+
> |                   +0200|
> +------------------------+
>
> It all depends how you get the data "downstream". If you go through
> parquet or csv files, they will retain the timezone information. If you go
> through strings, you should format them as above. If you use Dataset.map
> you can access the timestamps as java.sql.Timestamp objects (but that might
> not be necessary):
>
> import java.sql.Timestamp
> case class Times(value: Timestamp)
> timestampDF.as[Times].map(t => t.value.getTimezoneOffset).show
> +-----+
> |value|
> +-----+
> | -120|
> +-----+
>
>
> Enrico
>
>
> Am 31.03.20 um 21:40 schrieb Chetan Khatri:
>
> Sorry misrepresentation the question also. Thanks for your great help.
>
> What I want is the time zone information as it is
> 2020-04-11T20:40:00-05:00 in timestamp datatype. so I can write to
> downstream application as it is. I can correct the lacking UTC offset info.
>
>
> On Tue, Mar 31, 2020 at 1:15 PM Magnus Nilsson <ma...@kth.se> wrote:
>
>> And to answer your question (sorry, read too fast). The string is not in
>> proper ISO8601. Extended form must be used throughout, ie
>> 2020-04-11T20:40:00-05:00, there's a colon (:) lacking in the UTC offset
>> info.
>>
>> br,
>>
>> Magnus
>>
>> On Tue, Mar 31, 2020 at 7:11 PM Magnus Nilsson <ma...@kth.se> wrote:
>>
>>> Timestamps aren't timezoned. If you parse ISO8601 strings they will be
>>> converted to UTC automatically.
>>>
>>> If you parse timestamps without timezone they will converted to the the
>>> timezone the server Spark is running on uses. You can change the timezone
>>> Spark uses with spark.conf.set("spark.sql.session.timeZone", "UTC").
>>> Timestamps represent a point in time, the clock representation of that
>>> instant is dependent on sparks timezone settings both for parsing (non
>>> ISO8601) strings and showing timestamps.
>>>
>>> br,
>>>
>>> Magnus
>>>
>>> On Tue, Mar 31, 2020 at 6:14 PM Chetan Khatri <
>>> chetan.opensou...@gmail.com> wrote:
>>>
>>>> Hi Spark Users,
>>>>
>>>> I am losing the timezone value from below format, I tried couple of
>>>> formats but not able to make it. Can someone throw lights?
>>>>
>>>> scala> val sampleDF = Seq("2020-04-11T20:40:00-0500").toDF("value")
>>>> sampleDF: org.apache.spark.sql.DataFrame = [value: string]
>>>>
>>>> scala> sampleDF.select('value, to_timestamp('value,
>>>> "yyyy-MM-dd\'T\'HH:mm:ss")).show(false)
>>>>
>>>> +------------------------+------------------------------------------------+
>>>> |value                   |to_timestamp(`value`,
>>>> 'yyyy-MM-dd\'T\'HH:mm:ss')|
>>>>
>>>> +------------------------+------------------------------------------------+
>>>> |2020-04-11T20:40:00-0500|2020-04-11 20:40:00
>>>>   |
>>>>
>>>> +------------------------+------------------------------------------------+
>>>>
>>>> Thanks
>>>>
>>>
>

Reply via email to