Hi Eno Thanks for the JIRA info The change looks worth trying.Will let you know after i try it out.
Regards Sai On Wed, Nov 2, 2016 at 1:33 PM, Eno Thereska <eno.there...@gmail.com> wrote: > Hi Sai, > > For your second note on rebalancing taking a long time, we have just > improved the situation in trunk after fixing this JIRA: > https://issues.apache.org/jira/browse/KAFKA-3559 < > https://issues.apache.org/jira/browse/KAFKA-3559>. Feel free to give it a > go if rebalancing time continues to be a problem. > > Thanks > Eno > > > On 31 Oct 2016, at 19:44, saiprasad mishra <saiprasadmis...@gmail.com> > wrote: > > > > Hey Guys > > > > I have noticed similar issues when network goes down on starting of kafka > > stream apps especially the store has initialized but the task > > initialization is not complete and when the network comes back the > > rebalance fails with the above error and I had to restart. as i run many > > partitions and have many tasks get initialized. > > > > Otherwise if the kafka streams app is started successfully does recover > > from network issues always as far as what I have seen so far and also > > stores do remain available. > > > > Which means some of these initialization exceptions can be categorized as > > recoverable and should be always retried. > > > > I think task 0_0 in your case was not initialized properly in the first > > place and then rebalance happened bcoz of network connectivity and it > > resulted in the above exception. > > > > On a separate note rebalance takes longer time as i have some > > intermeidiary topics and thinking it might be worse if network is slow > and > > was thinking of something like store may be available for querying > quickly > > without waiting for the full initialization of tasks > > > > Regards > > Sai > > > > > > > > > > > > > > Regards > > Sai > > > > On Mon, Oct 31, 2016 at 3:51 AM, Damian Guy <damian....@gmail.com> > wrote: > > > >> Hi Frank, > >> > >> This usually means that another StreamThread has the lock for the state > >> directory. So it would seem that one of the StreamThreads hasn't shut > down > >> cleanly. If it happens again can you please take a Thread Dump so we can > >> see what is happening? > >> > >> Thanks, > >> Damian > >> > >> On Sun, 30 Oct 2016 at 10:52 Frank Lyaruu <flya...@gmail.com> wrote: > >> > >>> I have a remote Kafka cluster, to which I connect using a VPN and a > >>> not-so-great WiFi network. > >>> That means that sometimes the Kafka Client loses briefly loses > >>> connectivity. > >>> When it regains a connection after a while, I see: > >>> > >>> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot > >> be > >>> completed since the group has already rebalanced and assigned the > >>> partitions to another member. This means that the time between > subsequent > >>> calls to poll() was longer than the configured max.poll.interval.ms, > >> which > >>> typically implies that the poll loop is spending too much time message > >>> processing. You can address this either by increasing the session > timeout > >>> or by reducing the maximum size of batches returned in poll() with > >>> max.poll.records. > >>> > >>> ... > >>> > >>> Which makes sense I suppose, but this shouldn't be fatal. > >>> > >>> But then I see: > >>> [StreamThread-1] ERROR > >>> org.apache.kafka.streams.processor.internals.StreamThread - > >> stream-thread > >>> [StreamThread-1] Failed to create an active task %s: > >>> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] > >> Error > >>> while creating the state manager > >>> > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals.AbstractTask.<init>( > >> AbstractTask.java:72) > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals. > >> StreamTask.<init>(StreamTask.java:89) > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals. > >> StreamThread.createStreamTask(StreamThread.java:633) > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals. > >> StreamThread.addStreamTasks(StreamThread.java:660) > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals.StreamThread.access$100( > >> StreamThread.java:69) > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals.StreamThread$1. > >> onPartitionsAssigned(StreamThread.java:124) > >>> at > >>> > >>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator. > >> onJoinComplete(ConsumerCoordinator.java:228) > >>> at > >>> > >>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator. > >> joinGroupIfNeeded(AbstractCoordinator.java:313) > >>> at > >>> > >>> org.apache.kafka.clients.consumer.internals.AbstractCoordinator. > >> ensureActiveGroup(AbstractCoordinator.java:277) > >>> at > >>> > >>> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll( > >> ConsumerCoordinator.java:259) > >>> at > >>> > >>> org.apache.kafka.clients.consumer.KafkaConsumer. > >> pollOnce(KafkaConsumer.java:1013) > >>> at > >>> > >>> org.apache.kafka.clients.consumer.KafkaConsumer.poll( > >> KafkaConsumer.java:979) > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals.StreamThread.runLoop( > >> StreamThread.java:407) > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals. > >> StreamThread.run(StreamThread.java:242) > >>> > >>> Caused by: java.io.IOException: task [0_0] Failed to lock the state > >>> directory: > >>> > >>> /Users/frank/git/dexels.repository/com.dexels.kafka. > >> streams/kafka-streams/develop3-person/0_0 > >>> > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals. > >> ProcessorStateManager.<init>(ProcessorStateManager.java:101) > >>> at > >>> > >>> org.apache.kafka.streams.processor.internals.AbstractTask.<init>( > >> AbstractTask.java:69) > >>> > >>> ... 13 more > >>> > >>> And my stream applications is dead. > >>> > >>> So I'm guessing that either the store wasn't closed properly or some > >> things > >>> happen out of order. > >>> > >>> Any ideas? > >>> > >>> I'm using the trunk of Kafka 0.10.2.0-SNAPSHOT, Java 1.8.0_66 on MacOS > >>> 10.11.6 > >>> > >>> regards, Frank > >>> > >> > >