You sure it is not just that it's displaying in your local TZ? Check the actual value as a long for example. That is likely the same time.
On Thu, Jun 8, 2023, 5:50 PM karan alang <karan.al...@gmail.com> wrote: > ref : > https://stackoverflow.com/questions/76436159/apache-spark-not-reading-utc-timestamp-from-mongodb-correctly > > Hello All, > I've data stored in MongoDB collection and the timestamp column is not > being read by Apache Spark correctly. I'm running Apache Spark on GCP > Dataproc. > > Here is sample data : > > ----- > > In Mongo : > > timeslot_date : > timeslot |timeslot_date | > +--------------------------+------1683527400|{2023-05-08T06:30:00Z}| > > > When I use pyspark to read this : > > +----------+-------------------+ > timeslot |timeslot_date | > +----------+-------------------+1683527400|2023-05-07 23:30:00| > +----------------+-------+----- > > ----- > > My understanding is, data in Mongo is in UTC format i.e. 2023-05-08T06:30:00Z > is in UTC format. I'm in PST timezone. I'm not clear why spark is reading it > a different timezone format (neither PST nor UTC) Note - it is not reading it > as PST timezone, if it was doing that it would advance the time by 7 hours, > instead it is doing the opposite. > > Where is the default timezone format taken from, when Spark is reading data > from MongoDB ? > > Any ideas on this ? > > tia! > > > > >