Something like: dstream.foreachRDD { rdd => val df = sqlContext.read.json(rdd) df.select(…) }
https://spark.apache.org/docs/latest/streaming-programming-guide.html#output-operations-on-dstreams Might be the place to start, it’ll convert each batch of dstream into an RDD then let you work it as if it were a standard RDD dataset. Ewan From: Daniel Haviv [mailto:daniel.ha...@veracity-group.com] Sent: 29 September 2015 15:03 To: user <user@spark.apache.org> Subject: Converting a DStream to schemaRDD Hi, I have a DStream which is a stream of RDD[String]. How can I pass a DStream to sqlContext.jsonRDD and work with it as a DF ? Thank you. Daniel