I think they're voting on the next release candidate starting sometime next
week. So hopefully barring any other major hurdles within the next few
weeks.

On Fri, Jan 29, 2021, 1:01 PM Felix Kizhakkel Jose <
felixkizhakkelj...@gmail.com> wrote:

> Wow, that's really great to know. Thank you so much Adam. Do you know when
> the 3.1 release is scheduled?
>
> Regards,
> Felix K Jose
>
> On Fri, Jan 29, 2021 at 12:35 PM Adam Binford <adam...@gmail.com> wrote:
>
>> As of 3.0, the only way to do it is something that will recreate the
>> whole struct:
>> df.withColumn('timingPeriod',
>> f.struct(f.col('timingPeriod.start').cast('timestamp').alias('start'),
>> f.col('timingPeriod.end').cast('timestamp').alias('end')))
>>
>> There's a new method coming in 3.1 on the column class called withField
>> which was designed for this purpose. I backported it to my personal 3.0
>> build because of how useful it is. It works something like:
>> df.withColumn('timingPeriod', f.col('timingPeriod').withField('start',
>> f.col('timingPeriod.start').cast('timestamp')).withField('end',
>> f.col('timingPeriod.end')))
>>
>> And it works on multiple levels of nesting which is nice.
>>
>> On Fri, Jan 29, 2021 at 11:32 AM Felix Kizhakkel Jose <
>> felixkizhakkelj...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> I am using pyspark structured streaming and I am getting timestamp
>>> fields as plain long (milliseconds), so I have to modify these fields into
>>> a timestamp type
>>>
>>> a sample json object object:
>>>
>>> {
>>>   "id":{
>>>       "value": "f40b2e22-4003-4d90-afd3-557bc013b05e",
>>>       "type": "UUID",
>>>       "system": "Test"
>>>     },
>>>   "status": "Active",
>>>   "timingPeriod": {
>>>     "startDateTime": 1611859271516,
>>>     "endDateTime": null
>>>   },
>>>   "eventDateTime": 1611859272122,
>>>   "isPrimary": true,
>>> }
>>>
>>>   Here I want to convert "eventDateTime" and "startDateTime" and
>>> "endDateTime" as timestamp types
>>>
>>> So I have done following,
>>>
>>> def transform_date_col(date_col):
>>>     return f.when(f.col(date_col).isNotNull(), f.col(date_col) / 1000)
>>>
>>> df.withColumn(
>>>     "eventDateTime", 
>>> transform_date_col("eventDateTime").cast("timestamp")).withColumn(
>>>     "timingPeriod.start", 
>>> transform_date_col("timingPeriod.start").cast("timestamp")).withColumn(
>>>     "timingPeriod.end", 
>>> transform_date_col("timingPeriod.end").cast("timestamp"))
>>>
>>> the timingPeriod fields are not a struct anymore rather they become two
>>> different fields with names "timingPeriod.start", "timingPeriod.end".
>>>
>>> How can I get them as a struct as before?
>>> Is there a generic way I can modify a single/multiple properties of
>>> nested structs?
>>>
>>> I have hundreds of entities where the long needs to convert to
>>> timestamp, so a generic implementation will help my data ingestion pipeline
>>> a lot.
>>>
>>> Regards,
>>> Felix K Jose
>>>
>>>
>>
>> --
>> Adam Binford
>>
>

Reply via email to