[ 
https://issues.apache.org/jira/browse/KAFKA-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794652#comment-16794652
 ] 

Ewen Cheslack-Postava commented on KAFKA-2480:
----------------------------------------------

[~olkuznsmith] I see, you probably got confused because the PR for KAFKA-2481 
accidentally pointed here.

The difficulty is that both are semantics might be desired. The intent of 
timeout as implemented is to handle *buffering* connectors. In that case, they 
may *accept* data without necessarily committing it synchronously. If they 
encounter a failure that keeps them from even getting the data sent on the 
network (e.g. downstream system has availability issue), they want to express 
to the framework that they still have work to do, but are having some problem 
accomplishing it, so need to back off; but if *no more data* shows up, they 
don't want to wait indefinitely – they want to express the *maximum* amount of 
time the framework should wait before passing control back to the task to retry 
whatever operation was failing, even if there isn't new data available. But if 
new data becomes available, the connector may want to accept it immediately. It 
may be destined for some location that doesn't have the same issue, or can be 
buffered, etc. For example, this is how the HDFS connector uses this timeout 
functionality.

On the other hand, a connector that, e.g., deals with a rate-limited API may 
know exactly how long it needs to wait before it's worth passing control back 
*at all* (or any other case where you know the issue won't be resolved until 
*at least* some amount of time has passed). This has come up and been discussed 
as a possible improvement to `RetriableException` (since you should be throwing 
that if you can't even buffer the data that's being included in the `put()` 
call). I don't think there's a Jira (at least I'm not finding one), but it was 
probably discussed on the mailing list. There's also KAFKA-3819 on the source 
side, which is another variant of "time management" convenience utilities.

> Handle non-CopycatExceptions from SinkTasks
> -------------------------------------------
>
>                 Key: KAFKA-2480
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2480
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: KafkaConnect
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>            Priority: Major
>             Fix For: 0.9.0.0
>
>
> Currently we catch Throwable in WorkerSinkTask, but we just log the 
> exception. This can lead to data loss because it indicates the messages in 
> the {{put(records)}} call probably were not handled properly. We need to 
> decide what the policy for handling these types of exceptions should be -- 
> try repeating the same records again, risking duplication? or skip them, 
> risking loss? or kill the task immediately and require intervention since 
> it's unclear what happened?
> SourceTasks don't have the same concern -- they can throw other exceptions 
> and as long as we catch them, it is up to the connector to ensure that it 
> does not lose data as a result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to