Re: Connection reset by peer

Jun Rao Thu, 28 Mar 2013 07:52:20 -0700

The zk session timeout only kicks in if you force kill the consumer.
Otherwise, consumer will close ZK session properly on clean shutdown.


The problem with GC is that if the consumer pauses for a long time, ZK
server won't receive pings from the client and thus can expire a still
existing session.

The best thing to do here is to fix the GC issue since it may have other
implications. To start with, you probably want to enable GC logging and see
how long and how frequent your GCs are.

Thanks,

Jun

On Thu, Mar 28, 2013 at 12:23 AM, Yonghui Zhao <zhaoyong...@gmail.com>wrote:

> I used zookeeper-3.3.4 in kafka.
>
> Default tickTime is 3 seconds, minSesstionTimeOut is 6 seconds.
> Now I change tickTime to 5 seconds. minSesstionTimeOut to 10 seconds
> But if we change timeout to a larger one,
> "you have shutdown this broker and restarted it faster than the zookeeper
> timeout so it appears to be re-registering."
> this could happened more easily
>
> Do you think consumer GC will affect kafka server and zk connection?
>
>
>
> 2013/3/28 Jun Rao <jun...@gmail.com>
>
> > Not sure why the re-registration fails. Are you using ZK 3.3.4 or above?
> >
> > It seems that you consumer still GCs, which is the root cause. So, you
> will
> > need to tune the GC setting further. Another way to avoid ZK session
> > timeout is to increase the session timeout config.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Mar 27, 2013 at 8:35 PM, Yonghui Zhao <zhaoyong...@gmail.com>
> > wrote:
> >
> > > Now I used GC like this:
> > >
> > > -server -Xms1536m -Xmx1536m -XX:NewSize=128m -XX:MaxNewSize=128m
> > > -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
> > > -XX:CMSInitiatingOccupancyFraction=70
> > >
> > >
> > > But it still happened.  It seems kafka server reconnect with zk, but
> the
> > > old node was still there. So kafka server stopped.
> > > Can kafka server retry to connect with zk?
> > >
> > >
> > > 2013-03-27 22:15:03,529] INFO Opening socket connection to server
> > > localhost/
> > > 127.0.0.1:2181 (org.apache.zookeeper.ClientCnxn)
> > > [2013-03-27 22:15:03,529] INFO Socket connection established to
> > localhost/
> > > 127.0.0.1:2181, initiating session (org.apache.zookeeper.ClientCnxn)
> > > [2013-03-27 22:15:05,855] INFO Session establishment complete on server
> > > localhost/127.0.0.1:2181, sessionid = 0x13da6d94abf00aa, negotiated
> > > timeout
> > > = 6000 (org.apache.zookeeper.ClientCnxn)
> > > [2013-03-27 22:15:05,942] INFO zookeeper state changed (SyncConnected)
> > > (org.I0Itec.zkclient.ZkClient)
> > > [2013-03-27 22:15:14,912] INFO conflict in /brokers/ids/0 data:
> > > 127.0.0.1-1364393691770:127.0.0.1:9093 stored data: null
> > > (kafka.utils.ZkUtils$)
> > > [2013-03-27 22:15:14,942] ERROR Error handling event ZkEvent[New
> session
> > > event sent to
> kafka.server.KafkaZooKeeper$SessionExpireListener@18f389bc
> > ]
> > > (org.I0Itec.zkclient.ZkEventThread)
> > > java.lang.RuntimeException: A broker is already registered on the path
> > > /brokers/ids/0. This probably indicates that you either have
> configured a
> > > brokerid that is already in use, or else you have shutdown this broker
> > and
> > > restarted it faster than the zookeeper timeout so it appears to be
> > > re-registering.
> > >     at
> > > kafka.server.KafkaZooKeeper.registerBrokerInZk(KafkaZooKeeper.scala:57)
> > >     at
> > >
> > >
> >
> kafka.server.KafkaZooKeeper$SessionExpireListener.handleNewSession(KafkaZooKeeper.scala:100)
> > >     at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472)
> > >     at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> > > [2013-03-27 22:15:33,736] INFO Closing socket connection to /127.0.0.1
> .
> > > (kafka.network.Processor)
> > >
> > >
> > >
> > >
> > >
> > > 2013/3/27 Neha Narkhede <neha.narkh...@gmail.com>
> > >
> > > > The kafka-server-start.sh script doesn't have the mentioned GC
> > > > settings and heap size configured. However, probably doing that is a
> > > > good idea.
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > > On Tue, Mar 26, 2013 at 9:47 AM, Yonghui Zhao <zhaoyong...@gmail.com
> >
> > > > wrote:
> > > > > kafka server is started by bin/kafka-server-start.sh.  No gc
> setting.
> > > > > 在 2013-3-26 下午11:40，"Neha Narkhede" <neha.narkh...@gmail.com>写道：
> > > > >
> > > > >> Did you have a gc pause around that time on the server ? What are
> > your
> > > > >> server's current gc settings ?
> > > > >>
> > > > >> Thanks,
> > > > >> Neha
> > > > >>
> > > > >> On Mon, Mar 25, 2013 at 8:48 PM, Yonghui Zhao <
> > zhaoyong...@gmail.com>
> > > > >> wrote:
> > > > >> > Thanks Neha, btw have you seen this exception.  We didn't
> restart
> > > any
> > > > >> > service it happens in deep night.
> > > > >> >
> > > > >> > java.lang.RuntimeException: A broker is already registered on
> the
> > > path
> > > > >> > /brokers/ids/0. This probably indicates that you either have
> > > > configured a
> > > > >> > brokerid that is already in use, or else you have shutdown this
> > > broker
> > > > >> and
> > > > >> > restarted it faster than the zookeeper timeout so it appears to
> be
> > > > >> > re-registering.
> > > > >> >     at
> > > > >> >
> > > >
> kafka.server.KafkaZooKeeper.registerBrokerInZk(KafkaZooKeeper.scala:57)
> > > > >> >     at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> kafka.server.KafkaZooKeeper$SessionExpireListener.handleNewSession(KafkaZooKeeper.scala:100)
> > > > >> >     at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472)
> > > > >> >     at
> > org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> > > > >> > [2013-03-26 02:07:19,155] INFO re-registering broker info in ZK
> > for
> > > > >> broker
> > > > >> > 0 (kafka.server.KafkaZooKeeper)
> > > > >> > [2013-03-26 02:07:19,155] INFO Registering broker /brokers/ids/0
> > > > >> > (kafka.server.KafkaZooKeeper)
> > > > >> > [2013-03-26 02:07:19,611] INFO conflict in /brokers/ids/0 data:
> > > > >> > 127.0.0.1-1364234839275:127.0.0.1:9093 stored data:
> > > > >> 127.0.0.1-1364227372971:
> > > > >> > 127.0.0.1:9093 (kafka.utils.ZkUtils$)
> > > > >> > [2013-03-26 02:07:19,611] ERROR Error handling event ZkEvent[New
> > > > session
> > > > >> > event sent to
> > > > kafka.server.KafkaZooKeeper$SessionExpireListener@40f8c9bf
> > > > >> ]
> > > > >> > (org.I0Itec.zkclient.ZkEventThread)
> > > > >> > java.lang.RuntimeException: A broker is already registered on
> the
> > > path
> > > > >> > /brokers/ids/0. This probably indicates that you either have
> > > > configured a
> > > > >> > brokerid that is already in use, or else you have shutdown this
> > > broker
> > > > >> and
> > > > >> > restarted it faster than the zookeeper timeout so it appears to
> be
> > > > >> > re-registering.
> > > > >> >     at
> > > > >> >
> > > >
> kafka.server.KafkaZooKeeper.registerBrokerInZk(KafkaZooKeeper.scala:57)
> > > > >> >     at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> kafka.server.KafkaZooKeeper$SessionExpireListener.handleNewSession(KafkaZooKeeper.scala:100)
> > > > >> >     at org.I0Itec.zkclient.ZkClient$4.run(ZkClient.java:472)
> > > > >> >     at
> > org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > 2013/3/26 Neha Narkhede <neha.narkh...@gmail.com>
> > > > >> >
> > > > >> >> That really depends on your consumer application's memory
> > > allocation
> > > > >> >> patterns. If it is a thin wrapper over a Kafka consumer, I
> would
> > > > imagine
> > > > >> >> you can get away with using CMS for the tenured generation and
> > > > parallel
> > > > >> >> collector for the new generation with a small heap like 1gb or
> > so.
> > > > >> >>
> > > > >> >> Thanks,
> > > > >> >> Neha
> > > > >> >>
> > > > >> >> On Monday, March 25, 2013, Yonghui Zhao wrote:
> > > > >> >>
> > > > >> >> > Any suggestion on consumer side?
> > > > >> >> > 在 2013-3-25 下午9:49，"Neha Narkhede" <neha.narkh...@gmail.com
> > > > >> <javascript:;>
> > > > >> >> > >写道：
> > > > >> >> >
> > > > >> >> > > For Kafka 0.7 in production at Linkedin, we use a heap of
> > size
> > > > 3G,
> > > > >> new
> > > > >> >> > gen
> > > > >> >> > > 256 MB, CMS collector with occupancy of 70%.
> > > > >> >> > >
> > > > >> >> > > Thanks,
> > > > >> >> > > Neha
> > > > >> >> > >
> > > > >> >> > > On Sunday, March 24, 2013, Yonghui Zhao wrote:
> > > > >> >> > >
> > > > >> >> > > > Hi Jun,
> > > > >> >> > > >
> > > > >> >> > > > I used kafka-server-start.sh to start kafka, there is
> only
> > > one
> > > > jvm
> > > > >> >> > > setting
> > > > >> >> > > > "-Xmx512M“
> > > > >> >> > > >
> > > > >> >> > > > Do you have some recommend GC setting?   Usually our
> sever
> > > has
> > > > >> 32GB
> > > > >> >> or
> > > > >> >> > > 64GB
> > > > >> >> > > > RAM.
> > > > >> >> > > >
> > > > >> >> > > > 2013/3/22 Jun Rao <jun...@gmail.com>
> > > > >> >> > > >
> > > > >> >> > > > > A typical reason for many rebalancing is the consumer
> > side
> > > > GC.
> > > > >> If
> > > > >> >> so,
> > > > >> >> > > you
> > > > >> >> > > > > will see logs in the consume saying sth like "expired
> > > > session"
> > > > >> for
> > > > >> >> > ZK.
> > > > >> >> > > > > Occasional rebalances are fine. Too many rebalances can
> > > slow
> > > > >> down
> > > > >> >> the
> > > > >> >> > > > > consumption and you will need to tune your GC setting.
> > > > >> >> > > > >
> > > > >> >> > > > > Thanks,
> > > > >> >> > > > >
> > > > >> >> > > > > Jun
> > > > >> >> > > > >
> > > > >> >> > > > > On Thu, Mar 21, 2013 at 11:07 PM, Yonghui Zhao <
> > > > >> >> > zhaoyong...@gmail.com
> > > > >> >> > > > > >wrote:
> > > > >> >> > > > >
> > > > >> >> > > > > > Yes, before consumer exception:
> > > > >> >> > > > > >
> > > > >> >> > > > > > 2013/03/21 12:07:17.909 INFO
> > [ZookeeperConsumerConnector]
> > > > []
> > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 *end
> rebalancing
> > > > >> >> > > > > > consumer*0_lg-mc-db01.bj-1363784482043-f98c7868 try
> #0
> > > > >> >> > > > > > 2013/03/21 12:07:17.911 INFO
> > [ZookeeperConsumerConnector]
> > > > []
> > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 *begin
> > rebalancing
> > > > >> >> > > > > > consumer*0_lg-mc-db01.bj-1363784482043-f98c7868 try
> #0
> > > > >> >> > > > > > 2013/03/21 12:07:17.934 INFO [FetcherRunnable] []
> > > > >> FetchRunnable-0
> > > > >> >> > > start
> > > > >> >> > > > > > fetching topic: sms part: 0 offset: 43667888259 from
> > > > >> >> > 127.0.0.1:9093
> > > > >> >> > > > > > 2013/03/21 12:07:17.940 INFO [SimpleConsumer] []
> > > Reconnect
> > > > in
> > > > >> >> > > > multifetch
> > > > >> >> > > > > > due to socket error:
> > > > >> >> > > > > > java.nio.channels.*ClosedByInterruptException*
> > > > >> >> > > > > >         at
> > > > >> java.nio.channels.spi.*AbstractInterruptibleChannel*
> > > > >> >> > > > > > .end(AbstractInterruptibleChannel.java:201)
> > > > >> >> > > > > >
> > > > >> >> > > > > >
> > > > >> >> > > > > > 2013/03/21 12:07:17.978 INFO
> > [ZookeeperConsumerConnector]
> > > > []
> > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 *end
> rebalancing
> > > > >> >> > > > > > consumer*0_lg-mc-db01.bj-1363784482043-f98c7868 try
> #0
> > > > >> >> > > > > > 2013/03/21 12:07:18.004 INFO [FetcherRunnable] []
> > > > >> FetchRunnable-0
> > > > >> >> > > start
> > > > >> >> > > > > > fetching topic: sms part: 0 offset: 43667888259 from
> > > > >> >> > 127.0.0.1:9093
> > > > >> >> > > > > > 2013/03/21 12:07:18.066 INFO
> > [ZookeeperConsumerConnector]
> > > > []
> > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 *begin
> > rebalancing
> > > > >> >> consume*r
> > > > >> >> > > > > > 0_lg-mc-db01.bj-1363784482043-f98c7868 try #0
> > > > >> >> > > > > > 2013/03/21 12:07:18.176 INFO [SimpleConsumer] []
> > > Reconnect
> > > > in
> > > > >> >> > > > multifetch
> > > > >> >> > > > > > due to socket error:
> > > > >> >> > > > > > java.nio.channels.*ClosedByInterruptException*
> > > > >> >> > > > > >         at
> > > > >> java.nio.channels.spi.*AbstractInterruptibleChannel*
> > > > >> >> > > > > > .end(AbstractInterruptibleChannel.java:201)
> > > > >> >> > > > > >
> > > > >> >> > > > > >
> > > > >> >> > > > > > So you think it is normal? How can we avoid this
> > > exception?
> > > > >> >> > > > > >
> > > > >> >> > > > > > I used 4 partitions in kafka,  use only 1 partition？
> > > > >> >> > > > > >
> > > > >> >> > > > > >
> > > > >> >> > > > > >
> > > > >> >> > > > > > 2013/3/22 Jun Rao <jun...@gmail.com>
> > > > >> >> > > > > >
> > > > >> >> > > > > > > Do you see any rebalances in the consumer? Each
> > > rebalance
> > > > >> will
> > > > >> >> > > > > interrupt
> > > > >> >> > > > > > > existing fetcher threads first.
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > Thanks,
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > Jun
> > > > >> >> > > > > > >
> > > > >> >> > > > > > > On Thu, Mar 21, 2013 at 9:40 PM, Yonghui Zhao <
> > > > >> >> > > zhaoyong...@gmail.com
> > > > >> >> >
> > > > >> >>
> > > > >>
> > > >
> > >
> >
>

Re: Connection reset by peer

Reply via email to