[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely

Jason Gustafson (JIRA) Wed, 20 May 2015 14:33:19 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553167#comment-14553167
 ]


Jason Gustafson commented on KAFKA-2168:
----------------------------------------

I've started taking a look at this issue. Here are a couple options:

1. Finer-grained synchronization: In KafkaConsumer, it looks like 
synchronization is needed primarily around the SubscriptionState instance and 
the NetworkClient. It may be possible to push the synchronization into these 
classes and avoid the direct synchronization in KafkaConsumer. It may work, but 
will make reasoning about the correctness of the consumer more difficult. 

2. Coarse-grained synchronization with wakeup: As Ewen suggests, the selector 
can be woken up before the operation that needs to be done. A second lock (or a 
queue) could be introduced to solve the starvation problem mentioned above. 
Basically each thread would have to acquire the second lock (or traverse the 
queue) before being able to acquire the critical lock. This logic could be 
encapsulated in a separate class to avoid polluting the consumer too much.

Either approach will add some new complexity to the consumer. There is also the 
option of doing nothing. In that case, interactions with the consumer must come 
from the polling thread (in-between polls). This puts the burden on the user to 
implement the hooks in the polling thread to commit offsets, do seeks, or 
whatever.


> New consumer poll() can block other calls like position(), commit(), and 
> close() indefinitely
> ---------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2168
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2168
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, consumer
>            Reporter: Ewen Cheslack-Postava
>            Assignee: Ewen Cheslack-Postava
>
> The new consumer is currently using very coarse-grained synchronization. For 
> most methods this isn't a problem since they finish quickly once the lock is 
> acquired, but poll() might run for a long time (and commonly will since 
> polling with long timeouts is a normal use case). This means any operations 
> invoked from another thread may block until the poll() call completes.
> Some example use cases where this can be a problem:
> * A shutdown hook is registered to trigger shutdown and invokes close(). It 
> gets invoked from another thread and blocks indefinitely.
> * User wants to manage offset commit themselves in a background thread. If 
> the commit policy is not purely time based, it's not currently possibly to 
> make sure the call to commit() will be processed promptly.
> Two possible solutions to this:
> 1. Make sure a lock is not held during the actual select call. Since we have 
> multiple layers (KafkaConsumer -> NetworkClient -> Selector -> nio Selector) 
> this is probably hard to make work cleanly since locking is currently only 
> performed at the KafkaConsumer level and we'd want it unlocked around a 
> single line of code in Selector.
> 2. Wake up the selector before synchronizing for certain operations. This 
> would require some additional coordination to make sure the caller of 
> wakeup() is able to acquire the lock promptly (instead of, e.g., the poll() 
> thread being woken up and then promptly reacquiring the lock with a 
> subsequent long poll() call).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2168) New consumer poll() can block other calls like position(), commit(), and close() indefinitely

Reply via email to