Re: Review Request 29379: Patch for KAFKA-1788

Ewen Cheslack-Postava Tue, 06 Jan 2015 15:13:11 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29379/#review66942
-----------------------------------------------------------

Thinking about this more, I think this patch only covers one very specific case 
of the bug. Since we only check for timeouts when the leader lookup returns 
null, this only handles the case where we're unable to update metadata (which 
is the only reason we would consistently be missing leader info). It doesn't 
handle other important cases, e.g. we know the leader but can't send for a long 
time because we got disconnected and can't reconnect or we're still connected 
but requests are getting through too slowly so the batch sits in the queue for 
a long time.

I think it might make sense to restructure this a bit to address those cases 
and handle the issue with the missing stats/completeBatch call. How about 
always checking expiration as we're iterating through all these items, 
including it in the ReadyCheckResult return value as a collection like we do 
with readyNodes, and then the caller, Sender, can use completeBatch() to clean 
up? This avoids some duplicate code, gets the error stats right, and makes sure 
batches are always considered for expiration so it should completely solve the 
problem. You might need to add a while() loop to pull off all the expired 
batches and then use the existing code to process the next remaining batch (if 
there is one left).

Another issue: the current patch relies on ready() being called frequently for 
the batches to be removed promptly after they expire. However, there are 
conditions where poll() will be called with large timeouts, potentially up to 5 
minutes using the default settings. If we followed the approach described 
above, we'd probably need to track another value similar to 
nextReadyCheckDelayMs which would indicate when we would next need to wake up 
to expire the batch with the earliest expiration time.

The fixes you made did address the problems. It looks like this no longer 
applies to trunk due to 50b734690, so it'll need rebasing.

- Ewen Cheslack-Postava

On Jan. 6, 2015, 6:44 p.m., Parth Brahmbhatt wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/29379/
> -----------------------------------------------------------
> 
> (Updated Jan. 6, 2015, 6:44 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1788
>     https://issues.apache.org/jira/browse/KAFKA-1788
> 
> 
> Repository: kafka
> 
> 
> Description
> -------
> 
> Merge remote-tracking branch 'origin/trunk' into KAFKA-1788
> 
> 
> KAFKA-1788: addressed Ewen's comments.
> 
> 
> Diffs
> -----
> 
>   clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java 
> f61efb35db7e0de590556e6a94a7b5cb850cdae9 
>   clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java 
> a893d88c2f4e21509b6c70d6817b4b2cdd0fd657 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
>  c15485d1af304ef53691d478f113f332fe67af77 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordBatch.java
>  dd0af8aee98abed5d4a0dc50989e37888bb353fe 
>   
> clients/src/test/java/org/apache/kafka/clients/producer/RecordAccumulatorTest.java
>  2c9932401d573549c40f16fda8c4e3e11309cb85 
>   clients/src/test/java/org/apache/kafka/clients/producer/SenderTest.java 
> ef2ca65cabe97b909f17b62027a1bb06827e88fe 
> 
> Diff: https://reviews.apache.org/r/29379/diff/
> 
> 
> Testing
> -------
> 
> Unit test added. 
> 
> 
> Thanks,
> 
> Parth Brahmbhatt
> 
>

Re: Review Request 29379: Patch for KAFKA-1788

Reply via email to