[ https://issues.apache.org/jira/browse/KAFKA-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138093#comment-15138093 ]
Jason Gustafson commented on KAFKA-3093: ---------------------------------------- [~jinxing6...@126.com] I'm going to go ahead and pick this up. Let me know if you've already gotten started or if you want to help out. Maybe in this ticket, we can focus only on exposing status information. This is already a fairly big piece, so it might make more sense to do pause/restart commands in another JIRA (there is already KAFKA-2482 for pause). It'd be great if you want to help out with those issues. Here's a quick sketch of the proposed implementation: First, we differentiate the connector's target status as set by the user from the runtime state of the connector and its tasks. The target state is the persistent state of the Connector that will be resumed after every rebalance or cluster restart. Initially, the only target state will be "started," but we will add "paused" in KAFKA-2482. On the other hand, the runtime states represent the actual current states of the connector and its tasks. This will include the following states: rebalancing, running, and failed (we'll also add paused later). In the failed state, we'll add exception trace information so that users don't need to inspect the logs to find the actual problem. Connector target states will be persisted in the config topic. This works nicely since there is already a synchronization protocol on this topic which ensures that all workers have read up to the same offset. This guarantees that the workers will see the same target state of each connector after every rebalance/restart. Connector and task runtime states will be persisted in a new topic configured with "status.storage.topic," which is consumed by all workers. We could alternatively have only the leader consume this topic, but then the leader would have to handle all status requests. It would also delay leader failover since the new leader would have to read the entire log to catch up. The basic idea is to have the owner of each connector/task write status updates to this topic as they occur. For example, if the task raises an exception, the worker will catch it and immediately write the failed state to the topic (note that we won't attempt to implement restarting or any handling in this ticket). We will add two APIs: one to get the full status of the connector (including all of its tasks), and one task-level status API. > Keep track of connector and task status info, expose it via the REST API > ------------------------------------------------------------------------ > > Key: KAFKA-3093 > URL: https://issues.apache.org/jira/browse/KAFKA-3093 > Project: Kafka > Issue Type: Improvement > Components: copycat > Reporter: jin xing > Assignee: jin xing > > Relate to KAFKA-3054; > We should keep track of the status of connector and task during their > startup, execution, and handle exceptions thrown by connector and task; > Users should be able to fetch these informations by REST API and send some > necessary commands(reconfiguring, restarting, pausing, unpausing) to > connectors and tasks by REST API; -- This message was sent by Atlassian JIRA (v6.3.4#6332)