Ok, I've reproduced this again and made sure to grab the broker logs before
the instance are terminated. I posted a writeup with what seemed like the
relevant bits of the logs here:
https://gist.github.com/lukesteensen/793a467a058af51a7047
To summarize, it looks like Gwen was correct and the broke
I don't have broker logs at the moment, but I'll work on getting some I can
share. We are running 0.9.0.0 for both the brokers and producer in this
case. I've pasted some bits from the producer log below in case that's
helpful. Of particular note is how long it takes for the second disconnect
to oc
Do you happen to have broker-logs and state-change logs from the controlled
shutdown attempt?
In theory, the producer should not really see a disconnect - it should get
NotALeader exception (because leaders are re-assigned before the shutdown)
that will cause it to get the metadata. I am guessing
Yes, that was my intention and we have both of those configs turned on. For
some reason, however, the controlled shutdown wasn't transferring
leadership of all partitions, which caused the issues I described in my
initial email.
On Wed, Jan 13, 2016 at 12:05 AM, Ján Koščo <3k.stan...@gmail.com> w
Not sure, but should combination of auto.leader.rebalance.enable=true
and controlled.shutdown.enable=true sort this out for you?
2016-01-13 1:13 GMT+01:00 Scott Reynolds :
> we use 0.9.0.0 and it is working fine. Not all the features work and a few
> things make a few assumptions about how zookee
we use 0.9.0.0 and it is working fine. Not all the features work and a few
things make a few assumptions about how zookeeper is used. But as a tool
for provisioning, expanding and failure recovery it is working fine so far.
*knocks on wood*
On Tue, Jan 12, 2016 at 4:08 PM, Luke Steensen <
luke.st
Ah, that's a good idea. Do you know if kafka-manager works with kafka 0.9
by chance? That would be a nice improvement of the cli tools.
Thanks,
Luke
On Tue, Jan 12, 2016 at 4:53 PM, Scott Reynolds
wrote:
> Luke,
>
> We practice the same immutable pattern on AWS. To decommission a broker, we
>
Luke,
We practice the same immutable pattern on AWS. To decommission a broker, we
use partition reassignment first to move the partitions off of the node and
preferred leadership election. To do this with a web ui, so that you can
handle it on lizard brain at 3 am, we have the Yahoo Kafka Manager
Thanks, Prabhjot
I know that running out of space on disks can cause a Kafka shutdown but it
is not the case here, there is a lot of free space
On Thu, Nov 5, 2015 at 6:08 AM, Prabhjot Bharaj
wrote:
> Hi Vadim,
>
> Did you see your hard disk partition getting full where kafka data
> directory i
Hi Gleb,
No, no zookeper related errors. The only suspicious lines I see immediately
preceeding the shutdown is this:
2015-11-03 01:53:20,810] INFO Reconnect due to socket error:
java.nio.channels.ClosedChannelException (kafka.consumer.SimpleConsumer)
which makes me think it could be some networ
Hi Vadim,
Did you see your hard disk partition getting full where kafka data
directory is present ?
It could be because you have set log retention to a larger value, whereas
your input data may be taking up full disk space. In that case, move some
data out from that disk partition, set log retenti
Hi, Vadim. Do you see something like this: "zookeeper state changed
(Expired)" in kafka's logs?
On Wed, Nov 4, 2015 at 6:33 PM, Vadim Bobrov wrote:
> Hi,
>
> does anyone know in what cases Kafka will take itself down? I have a
> cluster of 2 nodes that went down (not crashed) this night in a con
Btw. a regular UNIX kill will do the same - SIGTERM -
http://linux.die.net/man/1/kill
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/
On Mon, Jul 27, 2015 at 3:57 PM, Andrew Otto wrote:
> Ah, thank you, SIGTERM
Ah, thank you, SIGTERM is what I was looking for. The docs are unclear on
that, it would be useful to fix those. Thanks!
> On Jul 27, 2015, at 14:59, Binh Nguyen Van wrote:
>
> You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This
> will send a SIGTERM to broker to tell
You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This
will send a SIGTERM to broker to tell it to do the controlled shutdown.
I also got confused before and had to look at the code to figure that out.
I think it is better if we can add this to the document.
-Binh
On Mon, Jul
controlled.shutdown built into broker when this config set to true it makes
request to controller to initiate the controlled shutdown, waits till the
request is succeeded and incase of failure retries the shutdown
controlled.shutdown.max.retries times.
https://github.com/apache/kafka/blob/0.8.
Thanks!
But how do I initiate a controlled shutdown on a running broker? Editing
server.properties is not going to cause this to happen. Don’t I have to tell
the broker to shutdown nicely? All I really want to do is tell the controller
to move leadership to other replicas, so I can shutdown
You can set controlled.shutdown.enable to true in kafka’s server.properties ,
this is enabled by default in 0.8.2 on wards
and also you can set max retries using controlled.shutdown.max.retries defaults
to 3 .
Thanks,
Harsha
On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org)
I think I've figured it out, and it still happens in the 0.8.1 branch. The
code that is responsible for deleting the key from ZooKeeper is broken and
will never be called when using the command line tool, so it will fail
after the first use. I''ve created
https://issues.apache.org/jira/browse/KAFKA
Done. https://issues.apache.org/jira/browse/KAFKA-1360
On Thu, Apr 3, 2014 at 9:13 PM, Neha Narkhede wrote:
> >> Is there a maven repo for pulling snapshot CI builds from?
>
> We still need to get the CI build setup going, could you please file a JIRA
> for this?
> Meanwhile, you will have to ju
>> Is there a maven repo for pulling snapshot CI builds from?
We still need to get the CI build setup going, could you please file a JIRA
for this?
Meanwhile, you will have to just build the code yourself for now,
unfortunately.
Thanks,
Neha
On Thu, Apr 3, 2014 at 12:01 PM, Clark Breyman wrote
Thank Neha - Is there a maven repo for pulling snapshot CI builds from?
Sorry if this is answered elsewhere.
On Wed, Apr 2, 2014 at 7:16 PM, Neha Narkhede wrote:
> I'm not so sure if I know the issue you are running into but we fixed a few
> bugs with similar symptoms and the fixes are on the 0.
I'm not so sure if I know the issue you are running into but we fixed a few
bugs with similar symptoms and the fixes are on the 0.8.1 branch. It will
be great if you give it a try to see if your issue is resolved.
Thanks,
Neha
On Wed, Apr 2, 2014 at 12:59 PM, Clark Breyman wrote:
> Was there a
Was there an answer for 0.8.1 getting stuck in preferred leader election?
I'm seeing this as well. Is there a JIRA ticket on this issue?
On Fri, Mar 21, 2014 at 1:15 PM, Ryan Berdeen wrote:
> So, for 0.8 without "controlled.shutdown.enable", why would ShutdownBroker
> and restarting cause under
So, for 0.8 without "controlled.shutdown.enable", why would ShutdownBroker
and restarting cause under-replication and producer exceptions? How can I
upgrade gracefully?
What's up with 0.8.1 getting stuck in preferred leader election?
On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede wrote:
> Whic
Which brings up the question - Do we need ShutdownBroker anymore? It seems
like the config should handle controlled shutdown correctly anyway.
Thanks,
Neha
On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao wrote:
> We haven't been testing the ShutdownBroker command in 0.8.1 rigorously
> since in 0.8.1,
We haven't been testing the ShutdownBroker command in 0.8.1 rigorously
since in 0.8.1, one can do the controlled shutdown through the new config
"controlled.shutdown.enable". Instead of running the ShutdownBroker command
during the upgrade, you can also wait until under replicated partition
count d
Using "controlled shut down" means Kafka will try first to migrate the
partition leaderships from the broker being shut down before really shut it
down so that the partitions will not be unavailable. Disabling it would
mean that during the time when the broker is done until the controller
noticed i
A replica is dropped out of ISR if (1) it hasn't issue a fetch request for
some time, or (2) it's behind the leader by some messages. The replica will
be added back to ISR if neither condition is longer true.
The actual value depends on the application. For example, if there is a
spike and the fol
Jun,
Can you explain the difference between "failed" and "slow"? In either
case, the follower drops out of the ISR, and can come back later if they
catch up, no?
In the configuration doc, it seems to describe them both with the same
language: "if ., the leader will remove the follower from
replica.lag.time.max.ms is used to detect a failed broker.
replica.lag.max.messages is used to detect a slow broker.
Thanks,
Jun
On Fri, Nov 1, 2013 at 10:36 PM, Jason Rosenberg wrote:
> In response to Joel's point, I think I do understand that messages can be
> lost, if in fact we have dropp
In response to Joel's point, I think I do understand that messages can be
lost, if in fact we have dropped down to only 1 member in the ISR at the
time the message is written, and then that 1 node goes down.
What I'm not clear on, is the conditions under which a node can drop out of
the ISR. You
For supporting more durability at the expense of availability, we have a
JIRA that we will fix on trunk. This will allow you to configure the
default as well as per topic durability vs availability behavior -
https://issues.apache.org/jira/browse/KAFKA-1028
Thanks,
Neha
On Fri, Nov 1, 2013 at 1
Unclean shutdown could result in data loss - since you are moving
leadership to a replica that has fallen out of ISR. i.e., it's log end
offset is behind the last committed message to this partition.
>>> But if data is written with 'request.required.acks=-1', no data s
I've filed: https://issues.apache.org/jira/browse/KAFKA-1108
On Tue, Oct 29, 2013 at 4:29 PM, Jason Rosenberg wrote:
> Here's another exception I see during controlled shutdown (this time there
> was not an unclean shutdown problem). Should I be concerned about this
> exception? Is any data lo
Here's another exception I see during controlled shutdown (this time there
was not an unclean shutdown problem). Should I be concerned about this
exception? Is any data loss possible with this? This one happened after
the first "Retrying controlled shutdown after the previous attempt
failed..." m
On Fri, Oct 25, 2013 at 9:16 PM, Joel Koshy wrote:
>
> Unclean shutdown could result in data loss - since you are moving
> leadership to a replica that has fallen out of ISR. i.e., it's log end
> offset is behind the last committed message to this partition.
>
>
But if data is written with 'reque
On Fri, Oct 25, 2013 at 3:22 PM, Jason Rosenberg wrote:
> It looks like when the controlled shutdown failes with an IOException, the
> exception is swallowed, and we see nothing in the logs:
>
> catch {
> case ioe: java.io.IOException =>
> channel.disconne
It looks like when the controlled shutdown failes with an IOException, the
exception is swallowed, and we see nothing in the logs:
catch {
case ioe: java.io.IOException =>
channel.disconnect()
channel = null
// ignore and tr
Neha,
It looks like the StateChangeLogMergerTool takes state change logs as
input. I'm not sure I know where those live? (Maybe update the doc on
that wiki page to describe!).
Thanks,
Jason
On Fri, Oct 25, 2013 at 12:38 PM, Neha Narkhede wrote:
> Jason,
>
> The state change log tool is desc
Jason,
The state change log tool is described here -
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-7.StateChangeLogMergerTool
I'm curious what the IOException is and if we can improve error reporting.
Can you send around the stack trace ?
Thanks,
Neha
On
Ok,
Looking at the controlled shutdown code, it appears that it can fail with
an IOException too, in which case it won't report the remaining partitions
to replicate, etc. (I think that might be what I'm seeing, since I never
saw the log line for "controlled shutdown failed, X remaining partition
Controlled shutdown can fail if the cluster has non zero under replicated
partition count. Since that means the leaders may not move off of the
broker being shutdown, causing controlled shutdown to fail. The backoff
might help if the under replication is just temporary due to a spike in
traffic. Th
On Fri, Oct 25, 2013 at 1:18 AM, Jason Rosenberg wrote:
> I'm running into an issue where sometimes, the controlled shutdown fails to
> complete after the default 3 retry attempts. This ended up in one case,
> with a broker under going an unclean shutdown, and then it was in a rather
> bad state
You can send it a SIGTERM signal (SIGKILL won't work).
Thanks,
Joel
On Wed, Aug 14, 2013 at 8:05 AM, Yu, Libo wrote:
> Hi,
>
> In this link
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools
> two ways to do controlled shutdown are mentioned. "The first approach is to
> set
45 matches
Mail list logo