I've been trying to figure out this one for some time now, I have JSONs representing Products coming (physically) partitioned by Brand and I would like to create a DataFrame from the JSON but also keep the partitioning information (Brand)
``` case class Product(brand: String, value: String) val df = spark.createDataFrame(Seq(Product("something", """{"a": "b", "c": "d"}"""))) df.write.partitionBy("brand").mode("overwrite").json("/tmp/products5/") val df2 = spark.read.json("/tmp/products5/") df2.show /* +--------------------+------+ | value|brand| +--------------------+------+ |{"a": "b", "c": "d"}| something| +--------------------+------+ */ // This is simple and effective but it gets rid of the brand! spark.read.json(df2.select("value").as[String]).show /* +---+---+ | a| c| +---+---+ | b| d| +---+---+ */ ``` Ideally I'd like something similar to spark.read.json that would keep the partitioning values and merge it with the rest of the DataFrame End result I would like: ``` /* +---+---+---+ | a| c| brand| +---+---+---+ | b| d| something| +---+---+---+ */ ``` Best regards, Daniel Mateus Pires --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org