Hi Team, I am facing a major issue while transforming dataframe containing complex datatype columns. I need to update the inner fields of complex datatype, for eg: converting one inner field to UPPERCASE letters, and return the same dataframe with new upper case values in it. Below is my issue description. Kindly suggest/guide me a way forward.
*My suggestion: *can we have a new version of *dataframe.withcolumn(<innerfieldreference>, udf($innerfieldreference), <reference or colname indicator argument>)*, so that when this method gets executed, i get same dataframe with transformed values. *Issue Description:* Using dataframe.withColumn(<colname>,udf($colname)) for inner fields in struct/complex datatype, results in a new dataframe with the a new column appended to it. "colname" in the above argument is given as fullname with dot notation to access the struct/complex fields. For eg: hive table has columns: (id int, address struct<line1: struct< buildname:string, stname:string>>, line2:string>) I need to update the inner field 'buildname'. I can select the inner field through dataframe as : df.select($"address.line1.buildname"), however when I use df.withColumn("address.line1.buildname", toUpperCaseUDF($"address.line1.buildname")), it is resulting in a new dataframe with new column: "address.line1.buildname" appended, with toUpperCaseUDF values from inner field buildname. How can I update the inner fields of the complex data types. Kindly suggest. Thanks in anticipation. Best Regards, Naveen Kumar.