A replica is dropped out of ISR if (1) it hasn't issue a fetch request for
some time, or (2) it's behind the leader by some messages. The replica will
be added back to ISR if neither condition is longer true.
The actual value depends on the application. For example, if there is a
spike and the fol
Jun,
Can you explain the difference between "failed" and "slow"? In either
case, the follower drops out of the ISR, and can come back later if they
catch up, no?
In the configuration doc, it seems to describe them both with the same
language: "if ., the leader will remove the follower from
replica.lag.time.max.ms is used to detect a failed broker.
replica.lag.max.messages is used to detect a slow broker.
Thanks,
Jun
On Fri, Nov 1, 2013 at 10:36 PM, Jason Rosenberg wrote:
> In response to Joel's point, I think I do understand that messages can be
> lost, if in fact we have dropp
In response to Joel's point, I think I do understand that messages can be
lost, if in fact we have dropped down to only 1 member in the ISR at the
time the message is written, and then that 1 node goes down.
What I'm not clear on, is the conditions under which a node can drop out of
the ISR. You
For supporting more durability at the expense of availability, we have a
JIRA that we will fix on trunk. This will allow you to configure the
default as well as per topic durability vs availability behavior -
https://issues.apache.org/jira/browse/KAFKA-1028
Thanks,
Neha
On Fri, Nov 1, 2013 at 1
Unclean shutdown could result in data loss - since you are moving
leadership to a replica that has fallen out of ISR. i.e., it's log end
offset is behind the last committed message to this partition.
>>> But if data is written with 'request.required.acks=-1', no data s
I've filed: https://issues.apache.org/jira/browse/KAFKA-1108
On Tue, Oct 29, 2013 at 4:29 PM, Jason Rosenberg wrote:
> Here's another exception I see during controlled shutdown (this time there
> was not an unclean shutdown problem). Should I be concerned about this
> exception? Is any data lo
Here's another exception I see during controlled shutdown (this time there
was not an unclean shutdown problem). Should I be concerned about this
exception? Is any data loss possible with this? This one happened after
the first "Retrying controlled shutdown after the previous attempt
failed..." m
On Fri, Oct 25, 2013 at 9:16 PM, Joel Koshy wrote:
>
> Unclean shutdown could result in data loss - since you are moving
> leadership to a replica that has fallen out of ISR. i.e., it's log end
> offset is behind the last committed message to this partition.
>
>
But if data is written with 'reque
On Fri, Oct 25, 2013 at 3:22 PM, Jason Rosenberg wrote:
> It looks like when the controlled shutdown failes with an IOException, the
> exception is swallowed, and we see nothing in the logs:
>
> catch {
> case ioe: java.io.IOException =>
> channel.disconne
It looks like when the controlled shutdown failes with an IOException, the
exception is swallowed, and we see nothing in the logs:
catch {
case ioe: java.io.IOException =>
channel.disconnect()
channel = null
// ignore and tr
Neha,
It looks like the StateChangeLogMergerTool takes state change logs as
input. I'm not sure I know where those live? (Maybe update the doc on
that wiki page to describe!).
Thanks,
Jason
On Fri, Oct 25, 2013 at 12:38 PM, Neha Narkhede wrote:
> Jason,
>
> The state change log tool is desc
Jason,
The state change log tool is described here -
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-7.StateChangeLogMergerTool
I'm curious what the IOException is and if we can improve error reporting.
Can you send around the stack trace ?
Thanks,
Neha
On
Ok,
Looking at the controlled shutdown code, it appears that it can fail with
an IOException too, in which case it won't report the remaining partitions
to replicate, etc. (I think that might be what I'm seeing, since I never
saw the log line for "controlled shutdown failed, X remaining partition
Controlled shutdown can fail if the cluster has non zero under replicated
partition count. Since that means the leaders may not move off of the
broker being shutdown, causing controlled shutdown to fail. The backoff
might help if the under replication is just temporary due to a spike in
traffic. Th
On Fri, Oct 25, 2013 at 1:18 AM, Jason Rosenberg wrote:
> I'm running into an issue where sometimes, the controlled shutdown fails to
> complete after the default 3 retry attempts. This ended up in one case,
> with a broker under going an unclean shutdown, and then it was in a rather
> bad state
16 matches
Mail list logo