I've had pretty good success using Joda-Time <http://www.joda.org/joda-time/index.html> for date/time manipulations within Spark applications. You may be able to use the *DateTIme* constructor below, if you are starting with milliseconds.
DateTime public DateTime(long instant) Constructs an instance set to the milliseconds from 1970-01-01T00:00:00Z using ISOChronology in the default time zone. Parameters:instant - the milliseconds from 1970-01-01T00:00:00Z On Mon, Feb 1, 2016 at 5:51 PM, Andy Davidson <a...@santacruzintegration.com > wrote: > What little I know about working with timestamps is based on > https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html > > Using the example of dates formatted into human friend strings -> > timeStamps I was able to figure out how to convert Epoch times to > timestamps. The same trick did not work for millisecond times. > > Any suggestions would be greatly appreciated. > > > Andy > > Working with epoch times > <http://localhost:8888/notebooks/sparkTimeSeriesExperiments.ipynb#Working-with-epock-times> > > ref: http://www.epochconverter.com/ > > Epoch timestamp: 1456050620 > > Timestamp in milliseconds: 1456050620000 > > Human time (GMT): Sun, 21 Feb 2016 10:30:20 GMT > > Human time (your time zone): 2/21/2016, 2:30:20 AM > > > # Epoch time stamp example > > data = [ > > ("1456050620", "1456050621", 1), > > ("1456050622", "14560506203", 2), > > ("14560506204", "14560506205", 3)] > > df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"]) > > > > # convert epoch time strings in to spark timestamps > > df = df.select( > > df.start_time.cast("long").alias("start_time"), > > df.end_time.cast("long").alias("end_time"), > > df.id) > > df.printSchema() > > df.show(truncate=False) > > > > # convert longs to timestamps > > df = df.select( > > df.start_time.cast("timestamp").alias("start_time"), > > df.end_time.cast("timestamp").alias("end_time"), > > df.id) > > df.printSchema() > > df.show(truncate=False) > > > > root > |-- start_time: long (nullable = true) > |-- end_time: long (nullable = true) > |-- id: long (nullable = true) > > +-----------+-----------+---+ > |start_time |end_time |id | > +-----------+-----------+---+ > |1456050620 |1456050621 |1 | > |1456050622 |14560506203|2 | > |14560506204|14560506205|3 | > +-----------+-----------+---+ > > root > |-- start_time: timestamp (nullable = true) > |-- end_time: timestamp (nullable = true) > |-- id: long (nullable = true) > > +---------------------+---------------------+---+ > |start_time |end_time |id | > +---------------------+---------------------+---+ > |2016-02-21 02:30:20.0|2016-02-21 02:30:21.0|1 | > |2016-02-21 02:30:22.0|2431-05-28 02:03:23.0|2 | > |2431-05-28 02:03:24.0|2431-05-28 02:03:25.0|3 | > +---------------------+---------------------+---+ > > > In [21]: > > # working with millisecond times > > data = [ > > ("1456050620000", "1456050620000", 1)] > > df = sqlContext.createDataFrame(data, ["start_time", "end_time", "id"]) > > > > # convert epoch time strings in to spark timestamps > > df = df.select( > > df.start_time.cast("long").alias("start_time"), > > df.end_time.cast("long").alias("end_time"), > > df.id) > > df.printSchema() > > df.show(truncate=False) > > > > # convert longs to timestamps > > df = df.select( > > df.start_time.cast("timestamp").alias("start_time"), > > df.end_time.cast("timestamp").alias("end_time"), > > df.id) > > df.printSchema() > > df.show(truncate=False) > > root > |-- start_time: long (nullable = true) > |-- end_time: long (nullable = true) > |-- id: long (nullable = true) > > +-------------+-------------+---+ > |start_time |end_time |id | > +-------------+-------------+---+ > |1456050620000|1456050620000|1 | > +-------------+-------------+---+ > > root > |-- start_time: timestamp (nullable = true) > |-- end_time: timestamp (nullable = true) > |-- id: long (nullable = true) > > +----------------------+----------------------+---+ > |start_time |end_time |id | > +----------------------+----------------------+---+ > |48110-05-29 10:33:20.0|48110-05-29 10:33:20.0|1 | > +----------------------+----------------------+---+ > > >