[ 
https://issues.apache.org/jira/browse/KAFKA-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831658#comment-13831658
 ] 

Imran Rashid commented on KAFKA-1144:
-------------------------------------

Thanks for quick response, Jun.

(1) is unfortunate, though it doesn't technically break this.  It just 
increases the number of messages that would get processed multiple times on a 
rebalance + crash (I'm pretty sure thats the only way you could end up with a 
lasting backwards commit without the conditional update).  I thought the move 
off of zk wasn't until 0.9, sorry -- I think this patch can go forward w/ out 
the conditional update.  (should I update the patches to remove it?  or just 
leave it in, and then it will go away when there is no more zk?)

(2) Notification on rebalances does not eliminate the desire for this patch.  
(It would, however, eliminate the need for conditional updates!)  Even without 
rebalances, with the current api, you really need to stop all worker threads 
before doing a commit if you want to guarantee that your app has seen all the 
messages.  This is especially true w/ batch processing.

Again, the patch isn't necessary, but its a small change that makes it sooo 
much easier to get user code right, not to mention more efficient.

Maybe other changes in the 0.9 api will make this unnecessary, I dunno.  but I 
think this is useful for 0.8 in the meantime.  And I'd hope the client rewrite 
would also make it easy to write batch consumers, like the api I put together 
in the other repo.  (I'd happily submit that directly to kafka, if it was 
desired, though its very scala-y, and I guess the user api is going to be 
java-only?)

> commitOffsets can be passed the offsets to commit
> -------------------------------------------------
>
>                 Key: KAFKA-1144
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1144
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>    Affects Versions: 0.8
>            Reporter: Imran Rashid
>            Assignee: Neha Narkhede
>         Attachments: 
> 0001-allow-committing-of-arbitrary-offsets-to-facilitate-.patch, 
> 0002-add-protection-against-backward-commits.patch
>
>
> This adds another version of commitOffsets that takes the offsets to commit 
> as a parameter.
> Without this change, getting correct user code is very hard. Despite kafka's 
> at-least-once guarantees, most user code doesn't actually have that 
> guarantee, and is almost certainly wrong if doing batch processing. Getting 
> it right requires some very careful synchronization between all consumer 
> threads, which is both:
> 1) painful to get right
> 2) slow b/c of the need to stop all workers during a commit.
> This small change simplifies a lot of this. This was discussed extensively on 
> the user mailing list, on the thread "are kafka consumer apps guaranteed to 
> see msgs at least once?"
> You can also see an example implementation of a user api which makes use of 
> this, to get proper at-least-once guarantees by user code, even for batches:
> https://github.com/quantifind/kafka-utils/pull/1
> I'm open to any suggestions on how to add unit tests for this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to