[jira] [Comment Edited] (KAFKA-5154) Kafka Streams throws NPE during rebalance

Matthias J. Sax (JIRA) Mon, 15 May 2017 14:11:38 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011331#comment-16011331
 ]


Matthias J. Sax edited comment on KAFKA-5154 at 5/15/17 9:10 PM:
-----------------------------------------------------------------

Thanks for sharing the logs. We cycle back if we need more input. We see 
"Ignoring fetched records" before the error. Seems to be related but we don't 
know yet.
{noformat}
[m[36m2017-05-08T22:45:40,224 DEBUG StreamThread-1 
org.apache.kafka.clients.consumer.internals.Fetcher.drainRecords() @526 - 
Ignoring fetched records for poseidonIncidentFeed-38 at offset 21353 since the 
current position is 21354
[m[36m2017-05-08T22:45:40,224 DEBUG StreamThread-1 
org.apache.kafka.clients.consumer.internals.Fetcher.sendFetches() @180 - 
Sending fetch for partitions [poseidonIncidentFeed-38] to broker 
10.210.200.144:9092 (id: 3 rack: null)
[m[31m2017-05-08T22:45:40,227ƒ√ ERROR StreamThread-1 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop() @620 - 
Unexpected error: fetched partition poseidonIncidentFeed-38 does not belong to 
the active task partitions.
         tasksByPartition: {}
         assignedPartitions: [poseidonIncidentFeed-21, poseidonIncidentFeed-6, 
poseidonIncidentFeed-38, poseidonIncidentFeed-12]
{noformat}

To reason about the logs better, one more question: can it be, that partition 
38 from topic {{poseidonIncidentFeed}} does not get any data to process for 
some time? It seems, that there is not data, when new data is written to the 
partition the error hits, and after Streams somehow "progresses" over the burst 
of data, the error disappears again (as not data is fetched anymore). Could 
this be the case? Or do you constantly write new data to partition 38 and thus 
Stream constantly processes data but suddenly fails?

Another follow up question: in KAFKA-5242 you mention that you run with a 
single thread. Does this imply that your whole Streams application is single 
threaded (ie, you use only one JVM), or do you start up multiple JVMs and scale 
your app like this?

Last question: do you use pattern subscription by any change?


was (Author: mjsax):
Thanks for sharing the logs. We cycle back if we need more input. We see 
"Ignoring fetched records" before the error. Seems to be related but we don't 
know yet.
{noformat}
[m[36m2017-05-08T22:45:40,224 DEBUG StreamThread-1 
org.apache.kafka.clients.consumer.internals.Fetcher.drainRecords() @526 - 
Ignoring fetched records for poseidonIncidentFeed-38 at offset 21353 since the 
current position is 21354
[m[36m2017-05-08T22:45:40,224 DEBUG StreamThread-1 
org.apache.kafka.clients.consumer.internals.Fetcher.sendFetches() @180 - 
Sending fetch for partitions [poseidonIncidentFeed-38] to broker 
10.210.200.144:9092 (id: 3 rack: null)
[m[31m2017-05-08T22:45:40,227ƒ√ ERROR StreamThread-1 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop() @620 - 
Unexpected error: fetched partition poseidonIncidentFeed-38 does not belong to 
the active task partitions.
         tasksByPartition: {}
         assignedPartitions: [poseidonIncidentFeed-21, poseidonIncidentFeed-6, 
poseidonIncidentFeed-38, poseidonIncidentFeed-12]
{noformat}

To reason about the logs better, one more question: can it be, that partition 
38 from topic {{poseidonIncidentFeed}} does not get any data to process for 
some time? It seems, that there is not data, when new data is written to the 
partition the error hits, and after Streams somehow "progresses" over the burst 
of data, the error disappears again (as not data is fetched anymore). Could 
this be the case? Or do you constantly write new data to partition 38 and thus 
Stream constantly processes data but suddenly fails?

Another follow up question: in KAFKA-5242 you mention that you run with a 
single thread. Does this imply that your whole Streams application is single 
threaded (ie, you use only one JVM), or do you start up multiple JVMs and scale 
your app like this?

> Kafka Streams throws NPE during rebalance
> -----------------------------------------
>
>                 Key: KAFKA-5154
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5154
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.0
>            Reporter: Lukas Gemela
>            Assignee: Matthias J. Sax
>         Attachments: clio_reduced.gz, clio.txt.gz
>
>
> please see attached log, Kafka streams throws NullPointerException during 
> rebalance, which is caught by our custom exception handler
> {noformat}
> 2017-04-30T17:44:17,675 INFO  kafka-coordinator-heartbeat-thread | hades 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead()
>  @618 - Marking the coordinator 10.210.200.144:9092 (id: 2147483644 rack: 
> null) dead for group hades
> 2017-04-30T17:44:27,395 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onSuccess() 
> @573 - Discovered coordinator 10.210.200.144:9092 (id: 2147483644 rack: null) 
> for group hades.
> 2017-04-30T17:44:27,941 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare()
>  @393 - Revoking previously assigned partitions [poseidonIncidentFeed-27, 
> poseidonIncidentFeed-29, poseidonIncidentFeed-30, poseidonIncidentFeed-18] 
> for group hades
> 2017-04-30T17:44:27,947 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-04-30T17:44:48,468 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-04-30T17:44:53,628 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-04-30T17:45:09,587 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-04-30T17:45:11,961 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onSuccess() 
> @375 - Successfully joined group hades with generation 99
> 2017-04-30T17:45:13,126 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete()
>  @252 - Setting newly assigned partitions [poseidonIncidentFeed-11, 
> poseidonIncidentFeed-27, poseidonIncidentFeed-25, poseidonIncidentFeed-29, 
> poseidonIncidentFeed-19, poseidonIncidentFeed-18] for group hades
> 2017-04-30T17:46:37,254 INFO  kafka-coordinator-heartbeat-thread | hades 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead()
>  @618 - Marking the coordinator 10.210.200.144:9092 (id: 2147483644 rack: 
> null) dead for group hades
> 2017-04-30T18:04:25,993 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onSuccess() 
> @573 - Discovered coordinator 10.210.200.144:9092 (id: 2147483644 rack: null) 
> for group hades.
> 2017-04-30T18:04:29,401 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare()
>  @393 - Revoking previously assigned partitions [poseidonIncidentFeed-11, 
> poseidonIncidentFeed-27, poseidonIncidentFeed-25, poseidonIncidentFeed-29, 
> poseidonIncidentFeed-19, poseidonIncidentFeed-18] for group hades
> 2017-04-30T18:05:10,877 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.sendJoinGroupRequest()
>  @407 - (Re-)joining group hades
> 2017-05-01T00:01:55,707 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.coordinatorDead()
>  @618 - Marking the coordinator 10.210.200.144:9092 (id: 2147483644 rack: 
> null) dead for group hades
> 2017-05-01T00:01:59,027 INFO  StreamThread-1 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onSuccess() 
> @573 - Discovered coordinator 10.210.200.144:9092 (id: 2147483644 rack: null) 
> for group hades.
> 2017-05-01T00:01:59,031 ERROR StreamThread-1 
> org.apache.kafka.streams.processor.internals.StreamThread.run() @376 - 
> stream-thread [StreamThread-1] Streams application error during processing:
>  java.lang.NullPointerException
>       at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:619)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>       at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
>  [kafka-streams-0.10.2.0.jar!/:?]
> 2017-05-01T00:02:00,038 INFO  StreamThread-1 
> org.apache.kafka.clients.producer.KafkaProducer.close() @689 - Closing the 
> Kafka producer with timeoutMillis = 9223372036854775807 ms.
> 2017-05-01T00:02:00,949 WARN  StreamThread-1 
> org.apache.kafka.streams.processor.internals.StreamThread.setState() @160 - 
> Unexpected state transition from PARTITIONS_REVOKED to NOT_RUNNING
> 2017-05-01T00:02:00,951 ERROR StreamThread-1 
> com.williamhill.trading.platform.hades.kafka.KafkaStreamManager.uncaughtException()
>  @104 - UncaughtException in thread StreamThread-1, stopping kafka streams
>  java.lang.NullPointerException
>       at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:619)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
>       at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
>  ~[kafka-streams-0.10.2.0.jar!/:?]
> 2017-05-01T00:02:01,076 WARN  kafka-streams-close-thread 
> org.apache.kafka.streams.processor.internals.StreamThread.setState() @160 - 
> Unexpected state transition from NOT_RUNNING to PENDING_SHUTDOWN
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (KAFKA-5154) Kafka Streams throws NPE during rebalance

Reply via email to