I think they're voting on the next release candidate starting sometime next week. So hopefully barring any other major hurdles within the next few weeks.
On Fri, Jan 29, 2021, 1:01 PM Felix Kizhakkel Jose < felixkizhakkelj...@gmail.com> wrote: > Wow, that's really great to know. Thank you so much Adam. Do you know when > the 3.1 release is scheduled? > > Regards, > Felix K Jose > > On Fri, Jan 29, 2021 at 12:35 PM Adam Binford <adam...@gmail.com> wrote: > >> As of 3.0, the only way to do it is something that will recreate the >> whole struct: >> df.withColumn('timingPeriod', >> f.struct(f.col('timingPeriod.start').cast('timestamp').alias('start'), >> f.col('timingPeriod.end').cast('timestamp').alias('end'))) >> >> There's a new method coming in 3.1 on the column class called withField >> which was designed for this purpose. I backported it to my personal 3.0 >> build because of how useful it is. It works something like: >> df.withColumn('timingPeriod', f.col('timingPeriod').withField('start', >> f.col('timingPeriod.start').cast('timestamp')).withField('end', >> f.col('timingPeriod.end'))) >> >> And it works on multiple levels of nesting which is nice. >> >> On Fri, Jan 29, 2021 at 11:32 AM Felix Kizhakkel Jose < >> felixkizhakkelj...@gmail.com> wrote: >> >>> Hello All, >>> >>> I am using pyspark structured streaming and I am getting timestamp >>> fields as plain long (milliseconds), so I have to modify these fields into >>> a timestamp type >>> >>> a sample json object object: >>> >>> { >>> "id":{ >>> "value": "f40b2e22-4003-4d90-afd3-557bc013b05e", >>> "type": "UUID", >>> "system": "Test" >>> }, >>> "status": "Active", >>> "timingPeriod": { >>> "startDateTime": 1611859271516, >>> "endDateTime": null >>> }, >>> "eventDateTime": 1611859272122, >>> "isPrimary": true, >>> } >>> >>> Here I want to convert "eventDateTime" and "startDateTime" and >>> "endDateTime" as timestamp types >>> >>> So I have done following, >>> >>> def transform_date_col(date_col): >>> return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000) >>> >>> df.withColumn( >>> "eventDateTime", >>> transform_date_col("eventDateTime").cast("timestamp")).withColumn( >>> "timingPeriod.start", >>> transform_date_col("timingPeriod.start").cast("timestamp")).withColumn( >>> "timingPeriod.end", >>> transform_date_col("timingPeriod.end").cast("timestamp")) >>> >>> the timingPeriod fields are not a struct anymore rather they become two >>> different fields with names "timingPeriod.start", "timingPeriod.end". >>> >>> How can I get them as a struct as before? >>> Is there a generic way I can modify a single/multiple properties of >>> nested structs? >>> >>> I have hundreds of entities where the long needs to convert to >>> timestamp, so a generic implementation will help my data ingestion pipeline >>> a lot. >>> >>> Regards, >>> Felix K Jose >>> >>> >> >> -- >> Adam Binford >> >