You sure it is not just that it's displaying in your local TZ? Check the
actual value as a long for example. That is likely the same time.

On Thu, Jun 8, 2023, 5:50 PM karan alang <karan.al...@gmail.com> wrote:

> ref :
> https://stackoverflow.com/questions/76436159/apache-spark-not-reading-utc-timestamp-from-mongodb-correctly
>
> Hello All,
> I've data stored in MongoDB collection and the timestamp column is not
> being read by Apache Spark correctly. I'm running Apache Spark on GCP
> Dataproc.
>
> Here is sample data :
>
> -----
>
> In Mongo :
>
> timeslot_date  :
> timeslot  |timeslot_date         |
> +--------------------------+------1683527400|{2023-05-08T06:30:00Z}|
>
>
> When I use pyspark to read this  :
>
> +----------+-------------------+
> timeslot  |timeslot_date      |
> +----------+-------------------+1683527400|2023-05-07 23:30:00|
> +----------------+-------+-----
>
> -----
>
> My understanding is, data in Mongo is in UTC format i.e. 2023-05-08T06:30:00Z 
> is in UTC format. I'm in PST timezone. I'm not clear why spark is reading it 
> a different timezone format (neither PST nor UTC) Note - it is not reading it 
> as PST timezone, if it was doing that it would advance the time by 7 hours, 
> instead it is doing the opposite.
>
> Where is the default timezone format taken from, when Spark is reading data 
> from MongoDB ?
>
> Any ideas on this ?
>
> tia!
>
>
>
>
>

Reply via email to