[ 
https://issues.apache.org/jira/browse/KAFKA-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949651#comment-14949651
 ] 

ASF GitHub Bot commented on KAFKA-2459:
---------------------------------------

GitHub user enothereska opened a pull request:

    https://github.com/apache/kafka/pull/290

    KAFKA-2459: connection backoff, timeouts and retries

    This fix applies to three JIRAs, since they are all connected.
    
    KAFKA-2459Connection backoff/blackout period should start when a connection 
is disconnected, not when the connection attempt was initiated
    Backoff when connection is disconnected
    
    KAFKA-2615Poll() method is broken wrt time
    Added Time through the NetworkClient API. Minimal change.
    
    KAFKA-1843Metadata fetch/refresh in new producer should handle all node 
connection states gracefully
    I’ve partially addressed this for a specific failure case in the JIRA.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/enothereska/kafka trunk

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #290
    
----
commit 90c0085a76374fafe6fa62c18e3d24504852e687
Author: Eno Thereska <eno.there...@gmail.com>
Date:   2015-10-07T00:06:49Z

    Commits to fix timing issues in three JIRAs

commit ee66491fb36d55527d156afda90c3addc3eb3175
Author: Eno Thereska <eno.there...@gmail.com>
Date:   2015-10-07T00:07:21Z

    Merge remote-tracking branch 'apache-kafka/trunk' into trunk

commit 17a373733e414456475217248cbc7b0bc98fda40
Author: Eno Thereska <eno.there...@gmail.com>
Date:   2015-10-07T15:15:19Z

    Merge remote-tracking branch 'apache-kafka/trunk' into trunk

commit eb5fbf458a5b455ae8b3c8b3ebf32524f5a3ab3e
Author: Eno Thereska <eno.there...@gmail.com>
Date:   2015-10-07T16:20:45Z

    Removed debug messages

commit 041baae45012cf8f99afd2c8b5d9a8099a8a928b
Author: Eno Thereska <eno.there...@gmail.com>
Date:   2015-10-07T17:35:12Z

    Pick a node, but not one that is blacked out

commit 69679d7e61d36f76d2ea1dd1fcc0a1192c9b50d6
Author: Eno Thereska <eno.there...@gmail.com>
Date:   2015-10-08T17:18:02Z

    Removed unneeded checks

commit 3ce5e151396575f45d1f022720f454ac36653d0d
Author: Eno Thereska <eno.there...@gmail.com>
Date:   2015-10-08T17:18:18Z

    Merge remote-tracking branch 'apache-kafka/trunk' into trunk

commit 76e6a0d8ab3fe847b28edde2e0072e7fe06484ff
Author: Eno Thereska <eno.there...@gmail.com>
Date:   2015-10-08T23:35:41Z

    More efficient implementation of nodesEverSeen

----


> Connection backoff/blackout period should start when a connection is 
> disconnected, not when the connection attempt was initiated
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2459
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2459
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer, producer 
>    Affects Versions: 0.8.2.1
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Eno Thereska
>
> Currently the connection code for new clients marks the time when a 
> connection was initiated (NodeConnectionState.lastConnectMs) and then uses 
> this to compute blackout periods for nodes, during which connections will not 
> be attempted and the node is not considered a candidate for leastLoadedNode.
> However, in cases where the connection attempt takes longer than the 
> blackout/backoff period (default 10ms), this results in incorrect behavior. 
> If a broker is not available and, for example, the broker does not explicitly 
> reject the connection, instead waiting for a connection timeout (e.g. due to 
> firewall settings), then the backoff period will have already elapsed and the 
> node will immediately be considered ready for a new connection attempt and a 
> node to be selected by leastLoadedNode for metadata updates. I think it 
> should be easy to reproduce and verify this problem manually by using tc to 
> introduce enough latency to make connection failures take > 10ms.
> The correct behavior would use the disconnection event to mark the end of the 
> last connection attempt and then wait for the backoff period to elapse after 
> that.
> See 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201508.mbox/%3CCAJY8EofpeU4%2BAJ%3Dw91HDUx2RabjkWoU00Z%3DcQ2wHcQSrbPT4HA%40mail.gmail.com%3E
>  for the original description of the problem.
> This is related to KAFKA-1843 because leastLoadedNode currently will 
> consistently choose the same node if this blackout period is not handled 
> correctly, but is a much smaller issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to