Hello All,
I am using pyspark structured streaming and I am getting timestamp fields
as plain long (milliseconds), so I have to modify these fields into a
timestamp type
a sample json object object:
{
"id":{
"value": "f40b2e22-4003-4d90-afd3-557bc013b05e",
"type": "UUID",
"system": "Test"
},
"status": "Active",
"timingPeriod": {
"startDateTime": 1611859271516,
"endDateTime": null
},
"eventDateTime": 1611859272122,
"isPrimary": true,
}
Here I want to convert "eventDateTime" and "startDateTime" and
"endDateTime" as timestamp types
So I have done following,
def transform_date_col(date_col):
return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000)
df.withColumn(
"eventDateTime",
transform_date_col("eventDateTime").cast("timestamp")).withColumn(
"timingPeriod.start",
transform_date_col("timingPeriod.start").cast("timestamp")).withColumn(
"timingPeriod.end",
transform_date_col("timingPeriod.end").cast("timestamp"))
the timingPeriod fields are not a struct anymore rather they become two
different fields with names "timingPeriod.start", "timingPeriod.end".
How can I get them as a struct as before?
Is there a generic way I can modify a single/multiple properties of nested
structs?
I have hundreds of entities where the long needs to convert to timestamp,
so a generic implementation will help my data ingestion pipeline a lot.
Regards,
Felix K Jose