Re: Kafka Streams fails permanently when used with an unstable network

saiprasad mishra Mon, 31 Oct 2016 12:45:06 -0700

Hey Guys

I have noticed similar issues when network goes down on starting of kafka
stream apps especially the store has initialized but the task
initialization is not complete and when the network comes back the
rebalance fails with the above error and I had to restart. as i run many
partitions and have many tasks get initialized.


Otherwise if the kafka streams app is started successfully does recover
from network issues always as far as what I have seen so far and also
stores do remain available.

Which means some of these initialization exceptions can be categorized as
recoverable and should be always retried.

I think task 0_0 in your case was not initialized properly in the first
place and then rebalance happened bcoz of network connectivity and it
resulted in the above exception.

On a separate note rebalance takes longer time  as i have some
intermeidiary topics and thinking it might be worse if network is slow and
was thinking of something like store may be available for querying quickly
without waiting for the full initialization of tasks

Regards
Sai






Regards
Sai

On Mon, Oct 31, 2016 at 3:51 AM, Damian Guy <damian....@gmail.com> wrote:

> Hi Frank,
>
> This usually means that another StreamThread has the lock for the state
> directory. So it would seem that one of the StreamThreads hasn't shut down
> cleanly. If it happens again can you please take a Thread Dump so we can
> see what is happening?
>
> Thanks,
> Damian
>
> On Sun, 30 Oct 2016 at 10:52 Frank Lyaruu <flya...@gmail.com> wrote:
>
> > I have a remote Kafka cluster, to which I connect using a VPN and a
> > not-so-great WiFi network.
> > That means that sometimes the Kafka Client loses briefly loses
> > connectivity.
> > When it regains a connection after a while, I see:
> >
> > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot
> be
> > completed since the group has already rebalanced and assigned the
> > partitions to another member. This means that the time between subsequent
> > calls to poll() was longer than the configured max.poll.interval.ms,
> which
> > typically implies that the poll loop is spending too much time message
> > processing. You can address this either by increasing the session timeout
> > or by reducing the maximum size of batches returned in poll() with
> > max.poll.records.
> >
> > ...
> >
> > Which makes sense I suppose, but this shouldn't be fatal.
> >
> > But then I see:
> > [StreamThread-1] ERROR
> > org.apache.kafka.streams.processor.internals.StreamThread -
> stream-thread
> > [StreamThread-1] Failed to create an active task %s:
> > org.apache.kafka.streams.errors.ProcessorStateException: task [0_0]
> Error
> > while creating the state manager
> >
> > at
> >
> > org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
> AbstractTask.java:72)
> > at
> >
> > org.apache.kafka.streams.processor.internals.
> StreamTask.<init>(StreamTask.java:89)
> > at
> >
> > org.apache.kafka.streams.processor.internals.
> StreamThread.createStreamTask(StreamThread.java:633)
> > at
> >
> > org.apache.kafka.streams.processor.internals.
> StreamThread.addStreamTasks(StreamThread.java:660)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread.access$100(
> StreamThread.java:69)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread$1.
> onPartitionsAssigned(StreamThread.java:124)
> > at
> >
> > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
> onJoinComplete(ConsumerCoordinator.java:228)
> > at
> >
> > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> joinGroupIfNeeded(AbstractCoordinator.java:313)
> > at
> >
> > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.
> ensureActiveGroup(AbstractCoordinator.java:277)
> > at
> >
> > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(
> ConsumerCoordinator.java:259)
> > at
> >
> > org.apache.kafka.clients.consumer.KafkaConsumer.
> pollOnce(KafkaConsumer.java:1013)
> > at
> >
> > org.apache.kafka.clients.consumer.KafkaConsumer.poll(
> KafkaConsumer.java:979)
> > at
> >
> > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(
> StreamThread.java:407)
> > at
> >
> > org.apache.kafka.streams.processor.internals.
> StreamThread.run(StreamThread.java:242)
> >
> > Caused by: java.io.IOException: task [0_0] Failed to lock the state
> > directory:
> >
> > /Users/frank/git/dexels.repository/com.dexels.kafka.
> streams/kafka-streams/develop3-person/0_0
> >
> > at
> >
> > org.apache.kafka.streams.processor.internals.
> ProcessorStateManager.<init>(ProcessorStateManager.java:101)
> > at
> >
> > org.apache.kafka.streams.processor.internals.AbstractTask.<init>(
> AbstractTask.java:69)
> >
> > ... 13 more
> >
> > And my stream applications is dead.
> >
> > So I'm guessing that either the store wasn't closed properly or some
> things
> > happen out of order.
> >
> > Any ideas?
> >
> > I'm using the trunk of Kafka 0.10.2.0-SNAPSHOT, Java 1.8.0_66 on MacOS
> > 10.11.6
> >
> > regards, Frank
> >
>

Re: Kafka Streams fails permanently when used with an unstable network

Reply via email to