Ewen Cheslack-Postava created KAFKA-2479:
--------------------------------------------

             Summary: Add CopycatExceptions to indicate transient and permanent 
errors in a connector/task
                 Key: KAFKA-2479
                 URL: https://issues.apache.org/jira/browse/KAFKA-2479
             Project: Kafka
          Issue Type: Sub-task
          Components: copycat
            Reporter: Ewen Cheslack-Postava
            Assignee: Ewen Cheslack-Postava


Sometimes the connector will need to indicate to the framework that an error 
occurred, but the error could have multiple responses by the framework.

For source connectors, there's not much they need to indicate since they can 
block indefinitely. They probably only need to indicate permanent errors for 
correctness, though we may want them to indicate transient errors so we can 
report health of the task in a metric.

For sink connectors, there are at least a couple of scenarios:
1. A task encounters some error while processing a {{put(records)}} call and 
was unable to fully process it, but thinks it could be resolved in the future. 
The task doesn't want to see any new records until the issue is resolved, but 
will need to see the same set of records again. (It would be nice if the task 
doesn't have to deal with saving these to a buffer itself.)
2. A task encounters some error while processing data, but it has 
enqueued/handled the data passed into the {{put(records)}} call. For example, 
it may have passed it to some library which buffers it, but then the library 
indicated that it is having some connection issues. The connector might be able 
accept more data, but the task is not in a healthy state.
3. The task encounters some error that it decides is unrecoverable. This might 
just be transient errors that repeat for long enough that the task thinks its 
time to give up. Unclear what to do here, but one option is relocating the task 
to another worker, hoping that the issue is specific to the worker.

Note that it is not, generally, safe for sink tasks to do their own backoff or 
we'd potentially starve the consumer, which needs to poll() in order to 
heartbeat. So we need to make sure whatever mechanism we implement encourages 
the user to throw an exception and pass control back to us instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to