I have the following code to read data from Kafka topic using the structured streaming. The topic has 3 partitions:
val spark = SparkSession .builder .appName("TestPartition") .master("local[*]") .getOrCreate() import spark.implicits._ val dataFrame = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "1.2.3.184:9092,1.2.3.185:9092,1.2.3.186:9092") .option("subscribe", "partition_test") .option("failOnDataLoss", "false") .load() .selectExpr("CAST(value AS STRING)") My understanding is that Spark will launch 3 Kafka consumers (for 3 partitions) and these 3 consumers will be running on the worker nodes. Is my understanding right ? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org