I am using the lastest streaming kafka connector
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>1.6.2</version>

I am facing the problem that a message is delivered two times to my
consumers. these two deliveries are 10+ seconds apart, it looks this is
caused by my lengthy message processing (took about 60 seconds), then I
tried to solve this, but I am still stuck.

1. looks the kafka streaming connector supports kafka v0.8 and maybe v0.9
but not v.10

JavaPairInputDStream<String, String> ds = KafkaUtils.createDirectStream(jsc, 
                                        String.class, String.class, 
StringDecoder.class, StringDecoder.class,
kafkaParams, topicsSet);

2. after i got the message from the kafka streaming via consumer, how can I
commit the message without finish the whole processing (the whole processing
might take minutes), it looks I can't get the consumer from the KafkaUtils
to execute the kafka commit API.

3. If I can't do the manual commit, then I need to tell Kafka Consumer to
allow longer session or auto commit, for v0.8 or v0.9, I have tried to pass
following properties to KafkaUtils

kafkaParams.put("auto.commit.enable", "true");
kafkaParams.put("auto.commit.interval.ms", "1000");
kafkaParams.put("zookeeper.session.timeout.ms", "60000");
kafkaParams.put("zookeeper.connection.timeout.ms", "60000");

Still not working.
Help is greatly appreciated !




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafka-connector-questions-tp27681.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to