Hey Guys I have noticed similar issues when network goes down on starting of kafka stream apps especially the store has initialized but the task initialization is not complete and when the network comes back the rebalance fails with the above error and I had to restart. as i run many partitions and have many tasks get initialized.
Otherwise if the kafka streams app is started successfully does recover from network issues always as far as what I have seen so far and also stores do remain available. Which means some of these initialization exceptions can be categorized as recoverable and should be always retried. I think task 0_0 in your case was not initialized properly in the first place and then rebalance happened bcoz of network connectivity and it resulted in the above exception. On a separate note rebalance takes longer time as i have some intermeidiary topics and thinking it might be worse if network is slow and was thinking of something like store may be available for querying quickly without waiting for the full initialization of tasks Regards Sai Regards Sai On Mon, Oct 31, 2016 at 3:51 AM, Damian Guy <damian....@gmail.com> wrote: > Hi Frank, > > This usually means that another StreamThread has the lock for the state > directory. So it would seem that one of the StreamThreads hasn't shut down > cleanly. If it happens again can you please take a Thread Dump so we can > see what is happening? > > Thanks, > Damian > > On Sun, 30 Oct 2016 at 10:52 Frank Lyaruu <flya...@gmail.com> wrote: > > > I have a remote Kafka cluster, to which I connect using a VPN and a > > not-so-great WiFi network. > > That means that sometimes the Kafka Client loses briefly loses > > connectivity. > > When it regains a connection after a while, I see: > > > > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot > be > > completed since the group has already rebalanced and assigned the > > partitions to another member. This means that the time between subsequent > > calls to poll() was longer than the configured max.poll.interval.ms, > which > > typically implies that the poll loop is spending too much time message > > processing. You can address this either by increasing the session timeout > > or by reducing the maximum size of batches returned in poll() with > > max.poll.records. > > > > ... > > > > Which makes sense I suppose, but this shouldn't be fatal. > > > > But then I see: > > [StreamThread-1] ERROR > > org.apache.kafka.streams.processor.internals.StreamThread - > stream-thread > > [StreamThread-1] Failed to create an active task %s: > > org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] > Error > > while creating the state manager > > > > at > > > > org.apache.kafka.streams.processor.internals.AbstractTask.<init>( > AbstractTask.java:72) > > at > > > > org.apache.kafka.streams.processor.internals. > StreamTask.<init>(StreamTask.java:89) > > at > > > > org.apache.kafka.streams.processor.internals. > StreamThread.createStreamTask(StreamThread.java:633) > > at > > > > org.apache.kafka.streams.processor.internals. > StreamThread.addStreamTasks(StreamThread.java:660) > > at > > > > org.apache.kafka.streams.processor.internals.StreamThread.access$100( > StreamThread.java:69) > > at > > > > org.apache.kafka.streams.processor.internals.StreamThread$1. > onPartitionsAssigned(StreamThread.java:124) > > at > > > > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator. > onJoinComplete(ConsumerCoordinator.java:228) > > at > > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator. > joinGroupIfNeeded(AbstractCoordinator.java:313) > > at > > > > org.apache.kafka.clients.consumer.internals.AbstractCoordinator. > ensureActiveGroup(AbstractCoordinator.java:277) > > at > > > > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll( > ConsumerCoordinator.java:259) > > at > > > > org.apache.kafka.clients.consumer.KafkaConsumer. > pollOnce(KafkaConsumer.java:1013) > > at > > > > org.apache.kafka.clients.consumer.KafkaConsumer.poll( > KafkaConsumer.java:979) > > at > > > > org.apache.kafka.streams.processor.internals.StreamThread.runLoop( > StreamThread.java:407) > > at > > > > org.apache.kafka.streams.processor.internals. > StreamThread.run(StreamThread.java:242) > > > > Caused by: java.io.IOException: task [0_0] Failed to lock the state > > directory: > > > > /Users/frank/git/dexels.repository/com.dexels.kafka. > streams/kafka-streams/develop3-person/0_0 > > > > at > > > > org.apache.kafka.streams.processor.internals. > ProcessorStateManager.<init>(ProcessorStateManager.java:101) > > at > > > > org.apache.kafka.streams.processor.internals.AbstractTask.<init>( > AbstractTask.java:69) > > > > ... 13 more > > > > And my stream applications is dead. > > > > So I'm guessing that either the store wasn't closed properly or some > things > > happen out of order. > > > > Any ideas? > > > > I'm using the trunk of Kafka 0.10.2.0-SNAPSHOT, Java 1.8.0_66 on MacOS > > 10.11.6 > > > > regards, Frank > > >