[ 
https://issues.apache.org/jira/browse/KAFKA-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15138093#comment-15138093
 ] 

Jason Gustafson commented on KAFKA-3093:
----------------------------------------

[~jinxing6...@126.com] I'm going to go ahead and pick this up. Let me know if 
you've already gotten started or if you want to help out. Maybe in this ticket, 
we can focus only on exposing status information. This is already a fairly big 
piece, so it might make more sense to do pause/restart commands in another JIRA 
(there is already KAFKA-2482 for pause). It'd be great if you want to help out 
with those issues.
 
Here's a quick sketch of the proposed implementation:

First, we differentiate the connector's target status as set by the user from 
the runtime state of the connector and its tasks. The target state is the 
persistent state of the Connector that will be resumed after every rebalance or 
cluster restart. Initially, the only target state will be "started," but we 
will add "paused" in KAFKA-2482. On the other hand, the runtime states 
represent the actual current states of the connector and its tasks. This will 
include the following states: rebalancing, running, and failed (we'll also add 
paused later). In the failed state, we'll add exception trace information so 
that users don't need to inspect the logs to find the actual problem.

Connector target states will be persisted in the config topic. This works 
nicely since there is already a synchronization protocol on this topic which 
ensures that all workers have read up to the same offset. This guarantees that 
the workers will see the same target state of each connector after every 
rebalance/restart.

Connector and task runtime states will be persisted in a new topic configured 
with "status.storage.topic," which is consumed by all workers. We could 
alternatively have only the leader consume this topic, but then the leader 
would have to handle all status requests. It would also delay leader failover 
since the new leader would have to read the entire log to catch up. The basic 
idea is to have the owner of each connector/task write status updates to this 
topic as they occur. For example, if the task raises an exception, the worker 
will catch it and immediately write the failed state to the topic (note that we 
won't attempt to implement restarting or any handling in this ticket).

We will add two APIs: one to get the full status of the connector (including 
all of its tasks), and one task-level status API.

> Keep track of connector and task status info, expose it via the REST API
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-3093
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3093
>             Project: Kafka
>          Issue Type: Improvement
>          Components: copycat
>            Reporter: jin xing
>            Assignee: jin xing
>
> Relate to KAFKA-3054;
> We should keep track of the status of connector and task during their 
> startup, execution, and handle exceptions thrown by connector and task;
> Users should be able to fetch these informations by REST API and send some 
> necessary commands(reconfiguring, restarting, pausing, unpausing) to 
> connectors and tasks by REST API;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to