heartbeat.interval.ms default
group.max.session.timeout.ms default
session.timeout.ms 6
default values as of kafka 0.10.
complete Kafka params:
val kafkaParams = Map[String, String](
"bootstrap.servers" -> kafkaBrokers,
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> "false",
For you or anyone else having issues with consumer rebalance, what are
your settings for
heartbeat.interval.ms
session.timeout.ms
group.max.session.timeout.ms
relative to your batch time?
On Tue, Oct 11, 2016 at 10:19 AM, static-max wrote:
> Hi,
>
> I run into the same exception
> (org.apache.k
The new underlying kafka consumer prefetches data and is generally heavier
weight, so it is cached on executors. Group id is part of the cache key. I
assumed kafka users would use different group ids for consumers they wanted
to be distinct, since otherwise would cause problems even with the norma
good point, I changed the group id to be unique for the separate streams
and now it works. Thanks!
Although changing this is ok for us, i am interested in the why :-) With
the old connector this was not a problem nor is it afaik with the pure
kafka consumer api
2016-10-11 14:30 GMT+02:00 Cody Koe
Just out of curiosity, have you tried using separate group ids for the
separate streams?
On Oct 11, 2016 4:46 AM, "Matthias Niehoff"
wrote:
> I stripped down the job to just consume the stream and print it, without
> avro deserialization. When I only consume one topic, everything is fine. As
> s
I re-ran the job with DEBUG Log Level for org.apache.spark, kafka.consumer
and org.apache.kafka. Please find the output here:
http://pastebin.com/VgtRUQcB
most of the delay is introduced by *16/10/11 13:20:12 DEBUG RecurringTimer:
Callback for JobGenerator called at time x*, which repeats multiple
I stripped down the job to just consume the stream and print it, without
avro deserialization. When I only consume one topic, everything is fine. As
soon as I add a second stream it stucks after about 5 minutes. So I
basically bails down to:
val kafkaParams = Map[String, String](
"bootstrap
This Job will fail after about 5 minutes:
object SparkJobMinimal {
//read Avro schemas
var stream = getClass.getResourceAsStream("/avro/AdRequestLog.avsc")
val avroSchemaAdRequest =
scala.io.Source.fromInputStream(stream).getLines.mkString
stream.close
stream = getClass.getResourceAsSt
Yes, without commiting the data the consumer rebalances.
The job consumes 3 streams process them. When consuming only one stream it
runs fine. But when consuming three streams, even without joining them,
just deserialize the payload and trigger an output action it fails.
I will prepare code sample
OK, so at this point, even without involving commitAsync, you're
seeing consumer rebalances after a particular batch takes longer than
the session timeout?
Do you have a minimal code example you can share?
On Tue, Oct 4, 2016 at 2:18 AM, Matthias Niehoff
wrote:
> Hi,
> sry for the late reply. A
Hi,
sry for the late reply. A public holiday in Germany.
Yes, its using a unique group id which no other job or consumer group is
using. I have increased the session.timeout to 1 minutes and set the
max.poll.rate to 1000. The processing takes ~1 second.
2016-09-29 4:41 GMT+02:00 Cody Koeninger :
Well, I'd start at the first thing suggested by the error, namely that
the group has rebalanced.
Is that stream using a unique group id?
On Wed, Sep 28, 2016 at 5:17 AM, Matthias Niehoff
wrote:
> Hi,
>
> the stacktrace:
>
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot b
Hi,
the stacktrace:
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be
completed since the group has already rebalanced and assigned the
partitions to another member. This means that the time between subsequent
calls to poll() was longer than the configured session.timeout.
What's the actual stacktrace / exception you're getting related to
commit failure?
On Tue, Sep 27, 2016 at 9:37 AM, Matthias Niehoff
wrote:
> Hi everybody,
>
> i am using the new Kafka Receiver for Spark Streaming for my Job. When
> running with old consumer it runs fine.
>
> The Job consumes 3 T
14 matches
Mail list logo