The message > Wasn't unable to resume work after last rebalance
means that you previous iterations of the rebalance were somehow behind/out of sync with other members of the group, i.e. they had not read up to the same point in the config topic so it wouldn't be safe for this worker (or possibly the entire cluster if this worker was the leader) to resume work. (I think there's a typo in the log message, it should say "wasn't *able* to resume work".) This message indicates the problem: > Catching up to assignment's config offset. The leader was using configs that were newer than this member, so it's not safe for it to start its assigned work since it might be using outdated configuration. When it tries to catch up, it continues trying to read up until the end of the config topic, which should be at least as far as the leader indicated its position was. (Another gap in logging: that message should really include the offset it is trying to catch up to, although you can also check that manually since it'll always be trying to read to the end of the topic.) This catch up has a timeout which defaults to 3s (which is pretty substantial given the rate at which configs tend to be written and their size). The fact that your worker isn't able to catch up probably indicates a connectivity issue or possibly even some misconfiguration where one worker is looking at one cluster/config topic, and the other is in the same group in the same cluster but looking at a different cluster/config topic when reading configs. -Ewen On Fri, Dec 16, 2016 at 3:16 AM, Frank Lyaruu <flya...@gmail.com> wrote: > Hi people, > > I've just deployed my Kafka Streams / Connect (I only use a connect sink to > mongodb) application on a cluster of four instances (4 containers on 2 > machines) and now it seems to get into a sort of rebalancing loop, and I > don't get much in mongodb, I've got a little bit of data at the beginning, > but no new data appears. > > The rest of the streams application seems to behave. > > This is what I get in my log, but at a pretty high speed (about 100 per > second): > > Current config state offset 3 is behind group assignment 5, reading to end > of config log > Joined group and got assignment: Assignment{error=0, > leader='connect-2-8fb3bfc4-93f2-4d08-82df-8e7c4b99ec13', leaderUrl='', > offset=5, connectorIds=[KNVB-production-generation-99-person-mongosink], > taskIds=[]} > Successfully joined group NHV-production-generation-99-person-mongosink > with generation 6 > Successfully joined group KNVB-production-generation-99-person-mongosink > with generation 6 > Wasn't unable to resume work after last rebalance, can skip stopping > connectors and tasks > Rebalance started > Wasn't unable to resume work after last rebalance, can skip stopping > connectors and tasks > (Re-)joining group KNVB-production-generation-99-person-mongosink > Current config state offset 3 does not match group assignment 5. Forcing > rebalance. > Finished reading to end of log and updated config snapshot, new config log > offset: 3 > Finished reading to end of log and updated config snapshot, new config log > offset: 3 > Current config state offset 3 does not match group assignment 5. Forcing > rebalance. > Joined group and got assignment: Assignment{error=0, > leader='connect-1-1893fd59-3ce8-4061-8131-ae36e58f5524', leaderUrl='', > offset=5, connectorIds=[], taskIds=[]} > Current config state offset 3 is behind group assignment 5, reading to end > of config log > Successfully joined group KNVB-production-generation-99-person-mongosink > with generation 6 > (Re-)joining group KNVB-production-generation-99-person-mongosink > Current config state offset 3 does not match group assignment 5. Forcing > rebalance.Rebalance started > Current config state offset 3 is behind group assignment 5, reading to end > of config log > Catching up to assignment's config offset. > Successfully joined group NHV-production-generation-99-person-mongosink > with generation 6 > Joined group and got assignment: Assignment{error=0, > leader='connect-2-8fb3bfc4-93f2-4d08-82df-8e7c4b99ec13', leaderUrl='', > offset=5, connectorIds=[], taskIds=[]} > Catching up to assignment's config offset. > Joined group and got assignment: Assignment{error=0, > leader='connect-2-8fb3bfc4-93f2-4d08-82df-8e7c4b99ec13', leaderUrl='', > offset=5, connectorIds=[], taskIds=[]} > (Re-)joining group NHV-production-generation-99-person-mongosink > Wasn't unable to resume work after last rebalance, can skip stopping > connectors and tasks > Successfully joined group NHV-production-generation-99-person-mongosink > with generation 6 > Current config state offset 3 does not match group assignment 5. Forcing > rebalance. > Finished reading to end of log and updated config snapshot, new config log > offset: 3 > Current config state offset 3 does not match group assignment 5. Forcing > rebalance. > Rebalance started > > ... and so on.. > > Any ideas? > > regards, Frank >