While we do this kafka
controlled shutdown hangs
This same issue was not seen with 25 topics.
Please let us know if any solution is known to this issue
Thanks
Martin
The broker has the following parameters related to controlled shutdown:
controlled.shutdown.enable Enable controlled shutdown of the server
boolean truemedium
controlled.shutdown.max.retries Controlled shutdown can fail for multiple
reasons. This determines the number of
ents.producer.internals.Sender.run(Sender.java:128)
> [vault_deploy.jar:na]
> #011at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
> DEBUG [2016-01-05 21:49:18,159] [] o.apache.kafka.clients.NetworkClient -
> Node 0 disconnected.
>
>
> On Thu, Jan 14, 2016 at 10:47 AM, Gwe
s and state-change logs from the controlled
> shutdown attempt?
>
> In theory, the producer should not really see a disconnect - it should get
> NotALeader exception (because leaders are re-assigned before the shutdown)
> that will cause it to get the metadata. I am guessing that leade
Do you happen to have broker-logs and state-change logs from the controlled
shutdown attempt?
In theory, the producer should not really see a disconnect - it should get
NotALeader exception (because leaders are re-assigned before the shutdown)
that will cause it to get the metadata. I am guessing
Yes, that was my intention and we have both of those configs turned on. For
some reason, however, the controlled shutdown wasn't transferring
leadership of all partitions, which caused the issues I described in my
initial email.
On Wed, Jan 13, 2016 at 12:05 AM, Ján Koščo <3k.stan...@g
t; > > luke.steen...@braintreepayments.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > We've run into a bit of a head-scratcher with a new kafka deployment
> > and
> > > > I'm curious if anyone has any ideas.
> &g
>
> > > I was able to work around the issue by waiting 60 seconds between
> > shutting
> > > down the broker and terminating that instance, as well as raising
> > > request.timeout.ms on the producer to 2x our zookeeper timeout. This
> > gives
> > > the broker a much quicker "connection refused" error instead of the
> > > connection timeout and seems to give enough time for normal failure
> > > detection and leader election to kick in before requests are timed out.
> > >
> > > So two questions really: (1) are there any known issues that would
> cause
> > a
> > > controlled shutdown to fail to release leadership of all partitions?
> and
> > > (2) should the producer be timing out connection attempts more
> > proactively?
> > >
> > > Thanks,
> > > Luke
> > >
> >
>
e has any ideas.
> >
> > A little bit of background: this deployment uses "immutable
> infrastructure"
> > on AWS, so instead of configuring the host in-place, we stop the broker,
> > tear down the instance, and replace it wholesale. My understanding was
> t
anyone has any ideas.
>
> A little bit of background: this deployment uses "immutable infrastructure"
> on AWS, so instead of configuring the host in-place, we stop the broker,
> tear down the instance, and replace it wholesale. My understanding was that
> controlled shu
wn the instance, and replace it wholesale. My understanding was that
controlled shutdown combined with producer retries would allow this
operation to be zero-downtime. Unfortunately, things aren't working quite
as I expected.
After poring over the logs, I pieced together to following ch
Thanks, Prabhjot
I know that running out of space on disks can cause a Kafka shutdown but it
is not the case here, there is a lot of free space
On Thu, Nov 5, 2015 at 6:08 AM, Prabhjot Bharaj
wrote:
> Hi Vadim,
>
> Did you see your hard disk partition getting full where kafka data
> directory i
Hi Gleb,
No, no zookeper related errors. The only suspicious lines I see immediately
preceeding the shutdown is this:
2015-11-03 01:53:20,810] INFO Reconnect due to socket error:
java.nio.channels.ClosedChannelException (kafka.consumer.SimpleConsumer)
which makes me think it could be some networ
Hi Vadim,
Did you see your hard disk partition getting full where kafka data
directory is present ?
It could be because you have set log retention to a larger value, whereas
your input data may be taking up full disk space. In that case, move some
data out from that disk partition, set log retenti
Hi, Vadim. Do you see something like this: "zookeeper state changed
(Expired)" in kafka's logs?
On Wed, Nov 4, 2015 at 6:33 PM, Vadim Bobrov wrote:
> Hi,
>
> does anyone know in what cases Kafka will take itself down? I have a
> cluster of 2 nodes that went down (not crashed) this night in a con
Hi,
does anyone know in what cases Kafka will take itself down? I have a
cluster of 2 nodes that went down (not crashed) this night in a controlled
and orderly shutdown as far as I can tell, except it wasn't controlled by me
Thanks
Vadim
Hi Federico,
What is your replica.lag.time.max.ms?
When acks=-1, the ProducerResponse won't return until all the broker in ISR
get the message. During controlled shutdown, the shutting down broker is
doing a lot of leader migration and could slow down on fetching data. The
broker won't
Hi,
I have few java async producers sending data to a 4-node Kafka cluster
version 0.8.2, containing few thousand topics, all with replication factor
2. When i use acks=1 and trigger a controlled shutdown + restart on one
broker, the producers will send data to the new leader, reporting a very
SIGTERM is what I was looking for. The docs are unclear on
> that, it would be useful to fix those. Thanks!
>
>
> > On Jul 27, 2015, at 14:59, Binh Nguyen Van wrote:
> >
> > You can initiate controlled shutdown by run bin/kafka-server-stop.sh.
> This
> > wil
Ah, thank you, SIGTERM is what I was looking for. The docs are unclear on
that, it would be useful to fix those. Thanks!
> On Jul 27, 2015, at 14:59, Binh Nguyen Van wrote:
>
> You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This
> will send a SIGTERM to br
You can initiate controlled shutdown by run bin/kafka-server-stop.sh. This
will send a SIGTERM to broker to tell it to do the controlled shutdown.
I also got confused before and had to look at the code to figure that out.
I think it is better if we can add this to the document.
-Binh
On Mon, Jul
controlled.shutdown built into broker when this config set to true it makes
request to controller to initiate the controlled shutdown, waits till the
request is succeeded and incase of failure retries the shutdown
controlled.shutdown.max.retries times.
https://github.com/apache/kafka/blob
Thanks!
But how do I initiate a controlled shutdown on a running broker? Editing
server.properties is not going to cause this to happen. Don’t I have to tell
the broker to shutdown nicely? All I really want to do is tell the controller
to move leadership to other replicas, so I can shutdown
You can set controlled.shutdown.enable to true in kafka’s server.properties ,
this is enabled by default in 0.8.2 on wards
and also you can set max retries using controlled.shutdown.max.retries defaults
to 3 .
Thanks,
Harsha
On July 27, 2015 at 11:42:32 AM, Andrew Otto (ao...@wikimedia.org)
I’m working on packaging 0.8.2.1 for Wikimedia, and in doing so I’ve noticed
that kafka.admin.ShutdownBroker doesn’t exist anymore. From what I can tell,
this has been intentionally removed in favor of a JMX(?) config
“controlled.shutdown.enable”. It is unclear from the documentation how one i
gt; > Solon,
> > >
> > > Which version of Kafka are you running and are you enabling auto leader
> > > rebalance at the same time?
> > >
> > > Guozhang
> > >
> > > On Fri, Nov 7, 2014 at 8:41 AM, Solon Gordon
> wrote:
> > >
at 8:41 AM, Solon Gordon wrote:
> >
> > > Hi all,
> > >
> > > My team has observed that if a broker process is killed in the middle
> of
> > > the controlled shutdown procedure, the remaining brokers start spewing
> > > errors and do not prop
We fixed a couple issues related to automatic leader balancing and
controlled shutdown. Would you mind trying out 0.8.2-beta?
On Fri, Nov 7, 2014 at 11:52 AM, Solon Gordon wrote:
> We're using 0.8.1.1 with auto.leader.rebalance.enable=true.
>
> On Fri, Nov 7, 2014 at 2:35 PM,
2014 at 8:41 AM, Solon Gordon wrote:
>
> > Hi all,
> >
> > My team has observed that if a broker process is killed in the middle of
> > the controlled shutdown procedure, the remaining brokers start spewing
> > errors and do not properly rebalance leadership. The
Solon,
Which version of Kafka are you running and are you enabling auto leader
rebalance at the same time?
Guozhang
On Fri, Nov 7, 2014 at 8:41 AM, Solon Gordon wrote:
> Hi all,
>
> My team has observed that if a broker process is killed in the middle of
> the controlled shutdo
Hi all,
My team has observed that if a broker process is killed in the middle of
the controlled shutdown procedure, the remaining brokers start spewing
errors and do not properly rebalance leadership. The cluster cannot recover
without major manual intervention.
Here is how to reproduce the
12:59:16PM -0700, Ryan Williams wrote:
> Thanks for clarifying.
>
> When I increase the replication factor, enable controlled shutdown and want
> to do a controlled shutdown, do I still issue the same shutdown (SIGTERM)?
>
>
> On Thu, Aug 14, 2014 at 11:40
Thanks for clarifying.
When I increase the replication factor, enable controlled shutdown and want
to do a controlled shutdown, do I still issue the same shutdown (SIGTERM)?
On Thu, Aug 14, 2014 at 11:40 AM, Joel Koshy wrote:
> Controlled shutdown does not really help in your case since y
Controlled shutdown does not really help in your case since your
replication factor is one.
> What does the -1 for Leader and blank Isr indicate? Do I need to run
It means the partition is unavailable (since there are no other
replicas).
So you should either use a higher replication factor
Running 0.8.1 and am unable to do a controlled shutdown as part of a
rolling bounce.
Is this the primary reference for this task?
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-1.ControlledShutdown
I've set the config to enable controlled shu
; rberd...@hubspot.com>
> > > > > wrote:
> > > > >
> > > > > > So, for 0.8 without "controlled.shutdown.enable", why would
> > > > > ShutdownBroker
> > > > > > and restarting cause
acefully?
> > > > >
> > > > > What's up with 0.8.1 getting stuck in preferred leader election?
> > > > >
> > > > >
> > > > > On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede <
> > > neha.narkh...@g
; > > > What's up with 0.8.1 getting stuck in preferred leader election?
> > > >
> > > >
> > > > On Fri, Mar 21, 2014 at 12:18 AM, Neha Narkhede <
> > neha.narkh...@gmail.com
> > > > >wrote:
> > > >
> > >
ede <
> neha.narkh...@gmail.com
> > > >wrote:
> > >
> > > > Which brings up the question - Do we need ShutdownBroker anymore? It
> > > seems
> > > > like the config should handle controlled shutdown correctly anyway.
> > > >
>
arkhede > >wrote:
> >
> > > Which brings up the question - Do we need ShutdownBroker anymore? It
> > seems
> > > like the config should handle controlled shutdown correctly anyway.
> > >
> > > Thanks,
> > > Neha
> > >
> >
we need ShutdownBroker anymore? It
> seems
> > like the config should handle controlled shutdown correctly anyway.
> >
> > Thanks,
> > Neha
> >
> >
> > On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao wrote:
> >
> > > We haven't been testin
Narkhede wrote:
> Which brings up the question - Do we need ShutdownBroker anymore? It seems
> like the config should handle controlled shutdown correctly anyway.
>
> Thanks,
> Neha
>
>
> On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao wrote:
>
> > We haven't been t
Which brings up the question - Do we need ShutdownBroker anymore? It seems
like the config should handle controlled shutdown correctly anyway.
Thanks,
Neha
On Thu, Mar 20, 2014 at 9:16 PM, Jun Rao wrote:
> We haven't been testing the ShutdownBroker command in 0.8.1 rigorously
> sin
We haven't been testing the ShutdownBroker command in 0.8.1 rigorously
since in 0.8.1, one can do the controlled shutdown through the new config
"controlled.shutdown.enable". Instead of running the ShutdownBroker command
during the upgrade, you can also wait until under replicated
While upgrading from 0.8.0 to 0.8.1 in place, I observed some surprising
behavior using kafka.admin.ShutdownBroker. At the start, there were no
underreplicated partitions. After running
bin/kafka-run-class.sh kafka.admin.ShutdownBroker --broker 10 ...
Partitions that had replicas on broker 10 w
l#brokerconfigs
On Mon, Dec 2, 2013 at 4:23 PM, Nitzan Harel
wrote:
> The default value for "controlled.shutdown.enable" is false.
> Does that mean that stopping a broker without a controlled shutdown and
> using a "kill ?9" might lead to an under "UnderReplicatedPartitions" state?
>
--
-- Guozhang
The default value for "controlled.shutdown.enable" is false.
Does that mean that stopping a broker without a controlled shutdown and using a
"kill ?9" might lead to an under "UnderReplicatedPartitions" state?
A replica is dropped out of ISR if (1) it hasn't issue a fetch request for
some time, or (2) it's behind the leader by some messages. The replica will
be added back to ISR if neither condition is longer true.
The actual value depends on the application. For example, if there is a
spike and the fol
Jun,
Can you explain the difference between "failed" and "slow"? In either
case, the follower drops out of the ISR, and can come back later if they
catch up, no?
In the configuration doc, it seems to describe them both with the same
language: "if ., the leader will remove the follower from
replica.lag.time.max.ms is used to detect a failed broker.
replica.lag.max.messages is used to detect a slow broker.
Thanks,
Jun
On Fri, Nov 1, 2013 at 10:36 PM, Jason Rosenberg wrote:
> In response to Joel's point, I think I do understand that messages can be
> lost, if in fact we have dropp
In response to Joel's point, I think I do understand that messages can be
lost, if in fact we have dropped down to only 1 member in the ISR at the
time the message is written, and then that 1 node goes down.
What I'm not clear on, is the conditions under which a node can drop out of
the ISR. You
For supporting more durability at the expense of availability, we have a
JIRA that we will fix on trunk. This will allow you to configure the
default as well as per topic durability vs availability behavior -
https://issues.apache.org/jira/browse/KAFKA-1028
Thanks,
Neha
On Fri, Nov 1, 2013 at 1
Unclean shutdown could result in data loss - since you are moving
leadership to a replica that has fallen out of ISR. i.e., it's log end
offset is behind the last committed message to this partition.
>>> But if data is written with 'request.required.acks=-1', no data s
I've filed: https://issues.apache.org/jira/browse/KAFKA-1108
On Tue, Oct 29, 2013 at 4:29 PM, Jason Rosenberg wrote:
> Here's another exception I see during controlled shutdown (this time there
> was not an unclean shutdown problem). Should I be concerned about this
> exc
Here's another exception I see during controlled shutdown (this time there
was not an unclean shutdown problem). Should I be concerned about this
exception? Is any data loss possible with this? This one happened after
the first "Retrying controlled shutdown after the previous atte
On Fri, Oct 25, 2013 at 9:16 PM, Joel Koshy wrote:
>
> Unclean shutdown could result in data loss - since you are moving
> leadership to a replica that has fallen out of ISR. i.e., it's log end
> offset is behind the last committed message to this partition.
>
>
But if data is written with 'reque
On Fri, Oct 25, 2013 at 3:22 PM, Jason Rosenberg wrote:
> It looks like when the controlled shutdown failes with an IOException, the
> exception is swallowed, and we see nothing in the logs:
>
> catch {
> case ioe: java
It looks like when the controlled shutdown failes with an IOException, the
exception is swallowed, and we see nothing in the logs:
catch {
case ioe: java.io.IOException =>
channel.disconnect()
channel = null
// ignore
,
> Neha
>
>
> On Fri, Oct 25, 2013 at 8:26 AM, Jason Rosenberg wrote:
>
> > Ok,
> >
> > Looking at the controlled shutdown code, it appears that it can fail with
> > an IOException too, in which case it won't report the remaining
> partitions
> &
On Fri, Oct 25, 2013 at 8:26 AM, Jason Rosenberg wrote:
> Ok,
>
> Looking at the controlled shutdown code, it appears that it can fail with
> an IOException too, in which case it won't report the remaining partitions
> to replicate, etc. (I think that might be what I'
Ok,
Looking at the controlled shutdown code, it appears that it can fail with
an IOException too, in which case it won't report the remaining partitions
to replicate, etc. (I think that might be what I'm seeing, since I never
saw the log line for "controlled shutdown fail
Controlled shutdown can fail if the cluster has non zero under replicated
partition count. Since that means the leaders may not move off of the
broker being shutdown, causing controlled shutdown to fail. The backoff
might help if the under replication is just temporary due to a spike in
traffic
On Fri, Oct 25, 2013 at 1:18 AM, Jason Rosenberg wrote:
> I'm running into an issue where sometimes, the controlled shutdown fails to
> complete after the default 3 retry attempts. This ended up in one case,
> with a broker under going an unclean shutdown, and then it was in a rath
I'm running into an issue where sometimes, the controlled shutdown fails to
complete after the default 3 retry attempts. This ended up in one case,
with a broker under going an unclean shutdown, and then it was in a rather
bad state after restart. Producers would connect to the metadat
You can send it a SIGTERM signal (SIGKILL won't work).
Thanks,
Joel
On Wed, Aug 14, 2013 at 8:05 AM, Yu, Libo wrote:
> Hi,
>
> In this link
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools
> two ways to do controlled shutdown are mentioned. "
Hi,
In this link https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools
two ways to do controlled shutdown are mentioned. "The first approach is to set
"controlled.shutdown.enable" to true in the broker. Then, the broker will try to
move all leaders on it to other brok
Nice!
Thanks,
Jason
On Wed, Jun 19, 2013 at 9:16 PM, Jun Rao wrote:
> Actually, we recently added the option to enable controlled shutdown in the
> broker shutdown hook. I have updated our wiki accordingly (
> https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools).
&
Actually, we recently added the option to enable controlled shutdown in the
broker shutdown hook. I have updated our wiki accordingly (
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools).
Thanks,
Jun
On Wed, Jun 19, 2013 at 6:32 PM, Jason Rosenberg wrote:
> Was just read
Was just reading about Controlled Shutdown here:
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools
Is this something that can be invoked from code, from within a container
running the KafkaServer?
Currently I launch kafka.server.KafkaServer directly from our java app
container
69 matches
Mail list logo