I understand the confusing. "json" format is for json encoded files being written in a directory. For Kafka, use "kafk" format. Then you decode the binary data as a json, you can use the function "from_json" (spark 2.1 and above). Here is our blog post on this.
https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html And my talk also explains this. https://spark-summit.org/east-2017/events/making-structured-streaming-ready-for-production-updates-and-future-directions/ On Sat, May 13, 2017 at 3:42 AM, kant kodali <kanth...@gmail.com> wrote: > HI All, > > What is the difference between sparkSession.readStream.format("kafka") vs > sparkSession.readStream.format("json") ? > I am sending json encoded messages in Kafka and I am not sure which one of > the above I should use? > > Thanks! > >