I am using the lastest streaming kafka connector <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka_2.11</artifactId> <version>1.6.2</version>
I am facing the problem that a message is delivered two times to my consumers. these two deliveries are 10+ seconds apart, it looks this is caused by my lengthy message processing (took about 60 seconds), then I tried to solve this, but I am still stuck. 1. looks the kafka streaming connector supports kafka v0.8 and maybe v0.9 but not v.10 JavaPairInputDStream<String, String> ds = KafkaUtils.createDirectStream(jsc, String.class, String.class, StringDecoder.class, StringDecoder.class, kafkaParams, topicsSet); 2. after i got the message from the kafka streaming via consumer, how can I commit the message without finish the whole processing (the whole processing might take minutes), it looks I can't get the consumer from the KafkaUtils to execute the kafka commit API. 3. If I can't do the manual commit, then I need to tell Kafka Consumer to allow longer session or auto commit, for v0.8 or v0.9, I have tried to pass following properties to KafkaUtils kafkaParams.put("auto.commit.enable", "true"); kafkaParams.put("auto.commit.interval.ms", "1000"); kafkaParams.put("zookeeper.session.timeout.ms", "60000"); kafkaParams.put("zookeeper.connection.timeout.ms", "60000"); Still not working. Help is greatly appreciated ! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafka-connector-questions-tp27681.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org