[ 
https://issues.apache.org/jira/browse/KAFKA-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096726#comment-15096726
 ] 

ASF GitHub Bot commented on KAFKA-2886:
---------------------------------------

GitHub user hachikuji opened a pull request:

    https://github.com/apache/kafka/pull/767

    KAFKA-2886: handle sink task rebalance failures by stopping worker task

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hachikuji/kafka KAFKA-2886

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/767.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #767
    
----
commit eddfb3afc46ca9b0747ed57aca8dd5b50e4d4519
Author: Jason Gustafson <ja...@confluent.io>
Date:   2016-01-13T18:09:10Z

    KAFKA-2886: handle sink task rebalance failures by stopping worker task

----


> WorkerSinkTask doesn't catch exceptions from rebalance callbacks
> ----------------------------------------------------------------
>
>                 Key: KAFKA-2886
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2886
>             Project: Kafka
>          Issue Type: Bug
>          Components: copycat
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Jason Gustafson
>
> WorkerSinkTask exposes rebalance callbacks to tasks by invoking 
> onPartitionsRevoked and onPartitionsAssigned on the task. However, these 
> aren't guarded by try/catch blocks, so they can propagate the errors up to 
> the consumer:
> {quote}
> [2015-11-24 15:52:24,071] ERROR User provided listener 
> org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance failed on 
> partition assignment:  
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> java.lang.UnsupportedOperationException
>       at 
> java.util.Collections$UnmodifiableCollection.clear(Collections.java:1094)
>       at 
> io.confluent.connect.hdfs.DataWriter.onPartitionsAssigned(DataWriter.java:207)
>       at 
> io.confluent.connect.hdfs.HdfsSinkTask.onPartitionsAssigned(HdfsSinkTask.java:103)
>       at 
> org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:369)
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:189)
>       at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:227)
>       at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:306)
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:861)
>       at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:829)
>       at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:171)
>       at 
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.iteration(WorkerSinkTaskThread.java:90)
>       at 
> org.apache.kafka.connect.runtime.WorkerSinkTaskThread.execute(WorkerSinkTaskThread.java:58)
>       at 
> org.apache.kafka.connect.util.ShutdownableThread.run(ShutdownableThread.java:82)
> [2015-11-24 15:52:24,477] INFO Cannot acquire lease on WAL 
> hdfs://worker4:9000/logs/test/0/log (io.confluent.connect.hdfs.wal.FSWAL)
> {quote}
> This actually currently works ok for onPartitionsAssigned because the 
> callback is the last thing invoked. For onPartitionsRevoked, it causes 
> offsets to not be committed and the current message batch being processed to 
> not be cleared. Additionally, we may need to do something more to clean up, 
> e.g. the task may need to stop processing data entirely since the task may 
> now be in a bad state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to