Hi, I am trying to do structured streaming with kafka as source. I am unable to get pass of this code.
val df = spark .readStream .format("org.apache.spark.sql.kafka010.KafkaSourceProvider") .option("kafka.bootstrap.servers", "localhost:8082") .option("subscribe", "jsontest") .load() The error i am getting is Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: org.apache.spark.sql.kafka010.KafkaSourceProvider. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635) I have the dependency in SBT libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0" // https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10 libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.3.0" % "provided" I have tried on spark-shell too, but same error scala> :require /Users/pulkit/Downloads/spark-sql-kafka-0-10_2.11-2.3.0.jar The path '/Users/pulkit/Downloads/spark-sql-kafka-0-10_2.11-2.3.0.jar' cannot be loaded, because existing classpath entries conflict. scala> val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "localhost:9092").option("subscribe", "topicName").load() java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635) at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:159) ... 49 elided I looked on google but didnt get any useful info. Does anyone faced same issues? and have a solution? Thanks Pulkit