i'm using kafka 2.9.2-0.8.1.1 and zookeeper 3.4.6. i noticed that only one broker is going down. My message size is less thn 3 kb and KAFKA_HEAP_OPTS="-Xmx512M" and KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -Djava.awt.headless=true" .
Do you mean kafka broker never goes down and does broker start automatically after failing ? I see only these errors on both the brokers. 10.0.0.11 is the broker which is going down. ERROR Closing socket for /10.0.0.11 because of error (kafka.network.Processor) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) at kafka.utils.Utils$.read(Utils.scala:375) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:347) at kafka.network.Processor.run(SocketServer.scala:245) at java.lang.Thread.run(Thread.java:662) [2015-01-16 11:01:48,173] INFO Closing socket connection to /10.0.0.11. (kafka.network.Processor) [2015-01-16 11:03:08,164] ERROR Closing socket for /10.0.0.11 because of error (kafka.network.Processor) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) at kafka.utils.Utils$.read(Utils.scala:375) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:347) at kafka.network.Processor.run(SocketServer.scala:245) at java.lang.Thread.run(Thread.java:662) [2015-01-16 11:03:08,280] INFO Closing socket connection to /10.0.0.11. (kafka.network.Processor) [2015-01-16 11:03:48,369] ERROR Closing socket for /10.0.0.11 because of error (kafka.network.Processor) java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) at kafka.utils.Utils$.read(Utils.scala:375) at kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) at kafka.network.Processor.read(SocketServer.scala:347) at kafka.network.Processor.run(SocketServer.scala:245) at java.lang.Thread.run(Thread.java:662) On Thu, Jan 15, 2015 at 7:49 PM, Harsha <ka...@harsha.io> wrote: > Tousif, > Which version of kafka and zookeeper are you using and whats your > message size and jvm size that you allocated for kafka brokers. > There is only 1 zookeeper node , if its a production cluster I recommend > you to have quorum of zookeeper nodes. Both kafka & storm are heavy > users of zookeeper. Also supervisord is recommended for storm I am not > sure you need to have it for kafka, for storm its the fail-fast nature > of workers that requires supervisord to restart. > When kafka goes down first time , i.e before supervisord restarts it do > you see same OOM error. Check the logs to see why its going down for the > first time. > -Harsha > > > > On Wed, Jan 14, 2015, at 10:50 PM, Tousif wrote: > > Hello Chia-Chun Shih, > > > > There are multiple issues, > > First thing is i don't see out of memory error every time and OOM happens > > after supervisord keep retrying to start kafka. > > It goes down when it tries to add partition fetcher > > > > it starts with > > > > *conflict in /controller data: > > {"version":1,"brokerid":0,"timestamp":"1421296052741"} stored data: > > {"version":1,"brokerid":1,"timestamp":"1421291998088"} > > (kafka.utils.ZkUtils$)* > > > > > > ERROR Conditional update of path > > /brokers/topics/realtimestreaming/partitions/1/state with data > > > {"controller_epoch":34,"leader":0,"version":1,"leader_epoch":54,"isr":[0]} > > and expected version 90 failed due to > > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode > > = > > BadVersion for /brokers/topics/realtimestreaming/partitions/1/state > > (kafka.utils.ZkUtils$) > > > > and then > > > > [ReplicaFetcherManager on broker 0] Removed fetcher for partitions > > [realtimestreaming,0],[realtimestreaming,1] > > (kafka.server.ReplicaFetcherManager) > > [2015-01-15 09:57:34,350] INFO Truncating log realtimestreaming-0 to > > offset > > 846. (kafka.log.Log) > > [2015-01-15 09:57:34,351] INFO Truncating log realtimestreaming-1 to > > offset > > 957. (kafka.log.Log) > > [2015-01-15 09:57:34,650] INFO [ReplicaFetcherManager on broker 0] *Added > > fetcher for partitions ArrayBuffer([[realtimestreaming,0], initOffset 846 > > to broker id:1,host:realtimeslave1.novalocal,port:9092] , > > [[realtimestreaming,1], initOffset 957 to broker > > id:1,host:realtimeslave1.novalocal,port:9092] ) > > (kafka.server.ReplicaFetcherManager)* > > [2015-01-15 09:57:34,654] INFO [ReplicaFetcherThread-0-1], Starting > > (kafka.server.ReplicaFetcherThread) > > [2015-01-15 09:57:34,747] INFO [ReplicaFetcherThread-1-1], Starting > > (kafka.server.ReplicaFetcherThread) > > [2015-01-15 09:58:14,156] INFO Closing socket connection to /10.0.0.11. > > (kafka.network.Processor) > > > > > > > > On Thu, Jan 15, 2015 at 12:01 PM, Chia-Chun Shih > > <chiachun.s...@gmail.com> > > wrote: > > > > > You can use tools (e.g., VisialVM) to diagnose OOM problem. > > > > > > 2015-01-15 14:15 GMT+08:00 Tousif Khazi <tou...@senseforth.com>: > > > > > > > i see this error > > > > > > > > ERROR [ReplicaFetcherThread-0-1], Error for partition > > > > [realtimestreaming,1] to broker 1:class > > > > kafka.common.NotLeaderForPartitionException > > > > (kafka.server.ReplicaFetcherThread) > > > > [2015-01-15 10:00:04,348] INFO [ReplicaFetcherManager on broker 0] > > > > Removed fetcher for partitions [realtimestreaming,1] > > > > (kafka.server.ReplicaFetcherManager) > > > > [2015-01-15 10:00:04,355] INFO Closing socket connection to > > > > /10.0.0.11. (kafka.network.Processor) > > > > [2015-01-15 10:00:04,444] WARN [KafkaApi-0] Fetch request with > > > > correlation id 0 from client ReplicaFetcherThread-0-0 on partition > > > > [realtimestreaming,1] failed due to Leader not local for partition > > > > [realtimestreaming,1] on broker 0 (kafka.server.KafkaApis) > > > > [2015-01-15 10:00:04,545] INFO [ReplicaFetcherThread-0-1], Shutting > > > > down (kafka.server.ReplicaFetcherThread) > > > > [2015-01-15 10:00:04,848] INFO [ReplicaFetcherThread-0-1], Stopped > > > > (kafka.server.ReplicaFetcherThread) > > > > [2015-01-15 10:00:04,849] INFO [ReplicaFetcherThread-0-1], Shutdown > > > > completed (kafka.server.ReplicaFetcherThread) > > > > [2015-01-15 10:00:39,256] ERROR Closing socket for /10.0.0.11 > because > > > > of error (kafka.network.Processor) > > > > java.io.IOException: Connection reset by peer > > > > at sun.nio.ch.FileDispatcher.read0(Native Method) > > > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > > > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) > > > > at sun.nio.ch.IOUtil.read(IOUtil.java:171) > > > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) > > > > > > > > On Wed, Jan 14, 2015 at 10:12 PM, Tousif <tousif.pa...@gmail.com> > wrote: > > > > > Thanks harsha for quick response. > > > > > I don't see any other error. I used to see replica fetcher error > but > > > > seems > > > > > to be disappeared after setting replica fetcher threads to 2 as I > have > > > 2 > > > > > partitions. Some times I see zookeeper session expiration. > > > > > On Jan 14, 2015 8:02 PM, "Harsha" <ka...@harsha.io> wrote: > > > > > > > > > >> Tousif, > > > > >> Do you see any other errors in server.log > > > > >> -Harsha > > > > >> > > > > >> On Wed, Jan 14, 2015, at 01:51 AM, Tousif wrote: > > > > >> > Hello, > > > > >> > > > > > >> > I have configured kafka nodes to run via supervisord and see > > > > following > > > > >> > exceptions > > > > >> > and eventually brokers going out of memory. i have given enough > > > memory > > > > >> > and > > > > >> > process 1 event/second. kafka goes down every day > > > > >> > > > > > >> > i'm wondering what configurastion is missing or need to be added > > > > >> > > > > > >> > Here are my cluster details: > > > > >> > 2 brokers > > > > >> > 1 zookeeper > > > > >> > and 2 node apache storm > > > > >> > > > > > >> > > > > > >> > INFO zookeeper state changed (SyncConnected) > > > > >> > (org.I0Itec.zkclient.ZkClient) > > > > >> > ERROR Closing socket for /10.0.0.11 because of error > > > > >> > (kafka.network.Processor) > > > > >> > java.io.IOException: Connection reset by peer > > > > >> > at sun.nio.ch.FileDispatcher.read0(Native Method) > > > > >> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > > > > >> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) > > > > >> > at sun.nio.ch.IOUtil.read(IOUtil.java:171) > > > > >> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) > > > > >> > at kafka.utils.Utils$.read(Utils.scala:375) > > > > >> > at > > > > >> > > > > > >> > > > > > > > > kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) > > > > >> > at kafka.network.Processor.read(SocketServer.scala:347) > > > > >> > at kafka.network.Processor.run(SocketServer.scala:245) > > > > >> > at java.lang.Thread.run(Thread.java:662) > > > > >> > [2015-01-13 23:43:37,962] INFO Closing socket connection to / > > > > 10.0.0.11. > > > > >> > (kafka.network.Processor) > > > > >> > Error occurred during initialization of VM > > > > >> > Could not reserve enough space for object heap > > > > >> > Error occurred during initialization of VM > > > > >> > Could not reserve enough space for object heap > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > Regards, > > > > >> > Tousif > > > > >> > +918050227279 > > > > >> > > > > > >> > > > > > >> > -- > > > > >> > > > > > >> > > > > > >> > Regards > > > > >> > Tousif Khazi > > > > >> > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > Tousif > > > > +918050227279 > > > > > > > > > > > > > > > -- > > > > > > Regards > > Tousif Khazi > -- Regards Tousif Khazi