Hi, I have a JSON file with the following row format: {"cty":"United Kingdom","gzip":"H4sIAAAAAAAAAKtWystVslJQcs4rLVHSUUouqQTxQvMyS1JTFLwz89JT8nOB4hnFqSBxj/zS4lSF/DQFl9S83MSibKBMZVExSMbQwNBM19DA2FSpFgDvJUGVUwAAAA==","nm":"Edmund lronside","yrs":"1016"}
The gzip field is a compressed JSON by itself I want to read the file and build the full nested JSON as a row: {"cty":"United Kingdom","hse":{"nm": "Cnut","cty": "United Kingdom","hse": "House of Denmark","yrs": "1016-1035"},"nm":"Edmund lronside","yrs":"1016"} I already have the function which extract the compressed field to a string. Questions: *if I use the following code the build the RDD :* val jsonData = sqlContext.read.json(sourceFilesPath) // //loop through the DataFrame and manipulate the gzip Filed val jsonUnGzip = jsonData.map(r => Row(r.getString(0), GZipHelper.unCompress(r.getString(1)).get, r.getString(2), r.getString(3))) *I get a row with 4 columns (String,String,String,String)* org.apache.spark.sql.Row = [United Kingdom,{"nm": "Cnut","cty": "United Kingdom","hse": "House of Denmark","yrs": "1016-1035"},Edmund lronside,1016] *Now, I can't tell Spark to "re-parse" Col(1) as JSON, right?* I seen some post about using case classes or explode but I don't understand how this can help here? Eran