Hi,

I am trying to do structured streaming with kafka as source.
I am unable to get pass of this code.

val df = spark
  .readStream
  .format("org.apache.spark.sql.kafka010.KafkaSourceProvider")
  .option("kafka.bootstrap.servers", "localhost:8082")
  .option("subscribe", "jsontest")
  .load()

The error i am getting is

Exception in thread "main" java.lang.ClassNotFoundException: Failed to find
data source: org.apache.spark.sql.kafka010.KafkaSourceProvider. Please find
packages at http://spark.apache.org/third-party-projects.html
at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)

I have the dependency in SBT

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10
libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" %
"2.3.0" % "provided"


I have tried on spark-shell too, but same error


scala> :require /Users/pulkit/Downloads/spark-sql-kafka-0-10_2.11-2.3.0.jar

The path '/Users/pulkit/Downloads/spark-sql-kafka-0-10_2.11-2.3.0.jar'
cannot be loaded, because existing classpath entries conflict.


scala> val df =
spark.readStream.format("kafka").option("kafka.bootstrap.servers",
"localhost:9092").option("subscribe", "topicName").load()

java.lang.ClassNotFoundException: Failed to find data source: kafka. Please
find packages at http://spark.apache.org/third-party-projects.html

  at
org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)

  at
org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:159)

  ... 49 elided


I looked on google but didnt get any useful info.
Does anyone faced same issues? and have a solution?

Thanks
Pulkit

Reply via email to