[ https://issues.apache.org/jira/browse/KAFKA-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794652#comment-16794652 ]
Ewen Cheslack-Postava commented on KAFKA-2480: ---------------------------------------------- [~olkuznsmith] I see, you probably got confused because the PR for KAFKA-2481 accidentally pointed here. The difficulty is that both are semantics might be desired. The intent of timeout as implemented is to handle *buffering* connectors. In that case, they may *accept* data without necessarily committing it synchronously. If they encounter a failure that keeps them from even getting the data sent on the network (e.g. downstream system has availability issue), they want to express to the framework that they still have work to do, but are having some problem accomplishing it, so need to back off; but if *no more data* shows up, they don't want to wait indefinitely – they want to express the *maximum* amount of time the framework should wait before passing control back to the task to retry whatever operation was failing, even if there isn't new data available. But if new data becomes available, the connector may want to accept it immediately. It may be destined for some location that doesn't have the same issue, or can be buffered, etc. For example, this is how the HDFS connector uses this timeout functionality. On the other hand, a connector that, e.g., deals with a rate-limited API may know exactly how long it needs to wait before it's worth passing control back *at all* (or any other case where you know the issue won't be resolved until *at least* some amount of time has passed). This has come up and been discussed as a possible improvement to `RetriableException` (since you should be throwing that if you can't even buffer the data that's being included in the `put()` call). I don't think there's a Jira (at least I'm not finding one), but it was probably discussed on the mailing list. There's also KAFKA-3819 on the source side, which is another variant of "time management" convenience utilities. > Handle non-CopycatExceptions from SinkTasks > ------------------------------------------- > > Key: KAFKA-2480 > URL: https://issues.apache.org/jira/browse/KAFKA-2480 > Project: Kafka > Issue Type: Sub-task > Components: KafkaConnect > Reporter: Ewen Cheslack-Postava > Assignee: Ewen Cheslack-Postava > Priority: Major > Fix For: 0.9.0.0 > > > Currently we catch Throwable in WorkerSinkTask, but we just log the > exception. This can lead to data loss because it indicates the messages in > the {{put(records)}} call probably were not handled properly. We need to > decide what the policy for handling these types of exceptions should be -- > try repeating the same records again, risking duplication? or skip them, > risking loss? or kill the task immediately and require intervention since > it's unclear what happened? > SourceTasks don't have the same concern -- they can throw other exceptions > and as long as we catch them, it is up to the connector to ensure that it > does not lose data as a result. -- This message was sent by Atlassian JIRA (v7.6.3#76005)