Hi Team,

I am facing a major issue while transforming dataframe containing complex
datatype columns. I need to update the inner fields of complex datatype,
for eg: converting one inner field to UPPERCASE letters, and return the
same dataframe with new upper case values in it. Below is my issue
description. Kindly suggest/guide me a way forward.

*My suggestion: *can we have a new version of
*dataframe.withcolumn(<innerfieldreference>,
udf($innerfieldreference), <reference or colname indicator argument>)*, so
that when this method gets executed, i get same dataframe with transformed
values.


*Issue Description:*
Using dataframe.withColumn(<colname>,udf($colname)) for inner fields in
struct/complex datatype, results in a new dataframe with the a new column
appended to it. "colname" in the above argument is given as fullname with
dot notation to access the struct/complex fields.

For eg: hive table has columns: (id int, address struct<line1: struct<
buildname:string, stname:string>>, line2:string>)

I need to update the inner field 'buildname'. I can select the inner field
through dataframe as : df.select($"address.line1.buildname"), however when
I use df.withColumn("address.line1.buildname",
toUpperCaseUDF($"address.line1.buildname")), it is resulting in a new
dataframe with new column: "address.line1.buildname" appended, with
toUpperCaseUDF values from inner field buildname.

How can I update the inner fields of the complex data types. Kindly suggest.

Thanks in anticipation.

Best Regards,
Naveen Kumar.

Reply via email to