Hello, SPARK-21513 <https://issues.apache.org/jira/browse/SPARK-21513> proposes to support support using the to_json <https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/functions.html#to_json-org.apache.spark.sql.Column-> UDF on any column type, however it fails with the following error when operating on ArrayType columns of strings, ints, or other non struct data types:org.apache.spark.sql.AnalysisException: cannot resolve 'structstojson(`item`.`messages`)' due to data type mismatch: Input type array must be a struct, array of structs or a map or array of map.;;Would it be possible for someone with access to raise an issue to include this in a future release?Details are outlined on this StackOverflow post: https://stackoverflow.com/questions/50195796/convert-array-of-values-column-to-string-column-containing-serialised-json, and included below.Thank you,Kyle*Example datasets/schemas:*Given a dataset of string records such as:{ "item": { "messages": [ "test", "test2", "test3" ] }}Which when loaded with read().json(dataSetOfJsonStrings) produces a schema like:root |-- item: struct (nullable = true) | |-- messages: array (nullable = true) | | |-- element: string (containsNull = true)How might ArrayType columns be transformed to serialised json? Eg, this schema:root |-- item: struct (nullable = true) | |-- messages: string (nullable = true)Which might be written out in JSON format like:{ "item": { "messages": "[\"test\",\"test2\",\"test3\"]" }}Note: Example output not flattened, just illustrating to_json() usage.
-- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/