convert array of values column to string column (containing serialised json) (SPARK-21513)

summersk Mon, 18 Jun 2018 17:09:28 -0700

Hello, SPARK-21513 <https://issues.apache.org/jira/browse/SPARK-21513>  
proposes to support support using the  to_json
<https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/functions.html#to_json-org.apache.spark.sql.Column->
  
UDF on any column type, however it fails with the following error when
operating on ArrayType columns of strings, ints, or other non struct data
types:org.apache.spark.sql.AnalysisException: cannot resolve
'structstojson(`item`.`messages`)'   due to data type mismatch: Input type
array   must be a struct, array of structs or a map or array of map.;;Would
it be possible for someone with access to raise an issue to include this in
a future release?Details are outlined on this StackOverflow post:
https://stackoverflow.com/questions/50195796/convert-array-of-values-column-to-string-column-containing-serialised-json,
and included below.Thank you,Kyle*Example datasets/schemas:*Given a dataset
of string records such as:{  "item": {    "messages": [      "test",     
"test2",      "test3"    ]  }}Which when loaded with
read().json(dataSetOfJsonStrings) produces a schema like:root |-- item:
struct (nullable = true) |    |-- messages: array (nullable = true) |    |   
|-- element: string (containsNull = true)How might ArrayType columns be
transformed to serialised json? Eg, this schema:root |-- item: struct
(nullable = true) |    |-- messages: string (nullable = true)Which might be
written out in JSON format like:{  "item": {    "messages":
"[\"test\",\"test2\",\"test3\"]"  }}Note: Example output not flattened, just
illustrating to_json() usage.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

convert array of values column to string column (containing serialised json) (SPARK-21513)

Reply via email to