On Aug 29, 2013, at 11:12 PM, Neha Narkhede wrote:
>>> How do you automate waiting for the broker to come up? Just keep
> monitoring the process and keep trying to connect to the port?
>
> Every leader in a Kafka cluster exposes the UnderReplicatedPartitionCount
> metric. The safest way to issue
IIUC it is a pseudo-automation in that you set up the retry interval
for controlled shutdown (controlled.shutdown.retry.backoff.ms) and the
number of retries (controlled.shutdown.max.retries) high enough so
that during a rolling bounce, the likelihood of a controlled shutdown
being unsuccessful is
Thinking about it some more I guess you are really talking about monitoring
UnderReplicatedPartitionCount during a restart?
/Sam
On Sep 6, 2013, at 5:46 PM, Sam Meder wrote:
> On Aug 29, 2013, at 11:12 PM, Neha Narkhede wrote:
>
How do you automate waiting for the broker to come up? Jus
Can't he get this automatically though with the Sriram's controlled
shutdown stuff?
-Jay
On Thu, Aug 29, 2013 at 2:12 PM, Neha Narkhede wrote:
> >> How do you automate waiting for the broker to come up? Just keep
> monitoring the process and keep trying to connect to the port?
>
> Every leader
>> How do you automate waiting for the broker to come up? Just keep
monitoring the process and keep trying to connect to the port?
Every leader in a Kafka cluster exposes the UnderReplicatedPartitionCount
metric. The safest way to issue controlled shutdown is to wait until that
metric reports 0 on
Ok, I spent some more time staring at our logs and figured out that it was our
fault. We were not waiting around for the Kafka broker to fully initialize
before moving on to the next broker and loading the data logs can take quite
some time (~7 minutes in one case), so we ended up with no repl
On Aug 29, 2013, at 5:50 PM, Sriram Subramanian
wrote:
> Do you know why you timed out on a regular shutdown?
No, though I think it may just have been that the timeout we put in was too
short.
> If the replica had
> fallen off of the ISR and shutdown was forced on the leader this could
> hap
>> Right, although I was under the impression that committed meant
replicated not necessarily synced to disk. That's not the case?
That's correct.
>> But what happens when a node goes down, has log truncated to less that
in sync and becomes the leader again? You're saying there should be some
cod
On Aug 29, 2013, at 5:44 PM, Jay Kreps wrote:
> This should not happen. We have a notion of a "committed" message, which is
> a message present on all "in sync" nodes.
Right, although I was under the impression that committed meant replicated not
necessarily synced to disk. That's not the case
Do you know why you timed out on a regular shutdown? If the replica had
fallen off of the ISR and shutdown was forced on the leader this could
happen. With ack = -1, we guarantee that all the replicas in the in sync
set have received the message before exposing the message to the consumer.
On 8/29
This should not happen. We have a notion of a "committed" message, which is
a message present on all "in sync" nodes. We never hand out a message to
any consumer until it is committed, and we guarantee that only "in sync"
nodes are electable as leaders. Setting acks=-1 means wait until the
message
11 matches
Mail list logo