any help ? On Mon, Jan 19, 2015 at 11:43 AM, Tousif <tousif.pa...@gmail.com> wrote:
> Here are the logs from broker id 0 and 1 and it was captured when broker > 1 went down. > > http://paste.ubuntu.com/9782553/ > http://paste.ubuntu.com/9782554/ > > > i'm using netty in storm and here are the configs > storm.messaging.transport: "backtype.storm.messaging.netty.Context" > > storm.messaging.netty.buffer_size: 209715200 > storm.messaging.netty.max_retries: 10 > storm.messaging.netty.max_wait_ms: 5000 > storm.messaging.netty.min_wait_ms: 10000 > > > > > > > On Sat, Jan 17, 2015 at 1:24 AM, Harsha <ka...@harsha.io> wrote: > >> Tousif, >> I meant to say if kafka broker is going down often its better to >> analyze whats the root of cause of the crash. Using supervisord >> to monitor kafka broker is fine, sorry about the confusion. >> -Harsha >> On Fri, Jan 16, 2015, at 11:25 AM, Gwen Shapira wrote: >> > Those errors are expected - if broker 10.0.0.11 went down, it will >> > reset the connection and the other broker will close the socket. >> > However, it looks like 10.0.0.11 crashes every two minutes? >> > >> > Do you have the logs from 10.0.0.11? >> > >> > On Thu, Jan 15, 2015 at 9:51 PM, Tousif <tousif.pa...@gmail.com> wrote: >> > > i'm using kafka 2.9.2-0.8.1.1 and zookeeper 3.4.6. >> > > i noticed that only one broker is going down. >> > > My message size is less thn 3 kb and KAFKA_HEAP_OPTS="-Xmx512M" >> > > and KAFKA_JVM_PERFORMANCE_OPTS="-server -XX:+UseCompressedOops >> > > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled >> > > -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC >> > > -Djava.awt.headless=true" . >> > > >> > > Do you mean kafka broker never goes down and does broker start >> > > automatically after failing ? >> > > I see only these errors on both the brokers. >> > > >> > > 10.0.0.11 is the broker which is going down. >> > > >> > > ERROR Closing socket for /10.0.0.11 because of error >> > > (kafka.network.Processor) >> > > java.io.IOException: Connection reset by peer >> > > at sun.nio.ch.FileDispatcher.read0(Native Method) >> > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) >> > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) >> > > at sun.nio.ch.IOUtil.read(IOUtil.java:171) >> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) >> > > at kafka.utils.Utils$.read(Utils.scala:375) >> > > at >> > > >> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) >> > > at kafka.network.Processor.read(SocketServer.scala:347) >> > > at kafka.network.Processor.run(SocketServer.scala:245) >> > > at java.lang.Thread.run(Thread.java:662) >> > > [2015-01-16 11:01:48,173] INFO Closing socket connection to / >> 10.0.0.11. >> > > (kafka.network.Processor) >> > > [2015-01-16 11:03:08,164] ERROR Closing socket for /10.0.0.11 >> because of >> > > error (kafka.network.Processor) >> > > java.io.IOException: Connection reset by peer >> > > at sun.nio.ch.FileDispatcher.read0(Native Method) >> > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) >> > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) >> > > at sun.nio.ch.IOUtil.read(IOUtil.java:171) >> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) >> > > at kafka.utils.Utils$.read(Utils.scala:375) >> > > at >> > > >> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) >> > > at kafka.network.Processor.read(SocketServer.scala:347) >> > > at kafka.network.Processor.run(SocketServer.scala:245) >> > > at java.lang.Thread.run(Thread.java:662) >> > > [2015-01-16 11:03:08,280] INFO Closing socket connection to / >> 10.0.0.11. >> > > (kafka.network.Processor) >> > > [2015-01-16 11:03:48,369] ERROR Closing socket for /10.0.0.11 >> because of >> > > error (kafka.network.Processor) >> > > java.io.IOException: Connection reset by peer >> > > at sun.nio.ch.FileDispatcher.read0(Native Method) >> > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) >> > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) >> > > at sun.nio.ch.IOUtil.read(IOUtil.java:171) >> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) >> > > at kafka.utils.Utils$.read(Utils.scala:375) >> > > at >> > > >> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) >> > > at kafka.network.Processor.read(SocketServer.scala:347) >> > > at kafka.network.Processor.run(SocketServer.scala:245) >> > > at java.lang.Thread.run(Thread.java:662) >> > > >> > > >> > > >> > > On Thu, Jan 15, 2015 at 7:49 PM, Harsha <ka...@harsha.io> wrote: >> > > >> > >> Tousif, >> > >> Which version of kafka and zookeeper are you using and whats >> your >> > >> message size and jvm size that you allocated for kafka >> brokers. >> > >> There is only 1 zookeeper node , if its a production cluster I >> recommend >> > >> you to have quorum of zookeeper nodes. Both kafka & storm are heavy >> > >> users of zookeeper. Also supervisord is recommended for storm I am >> not >> > >> sure you need to have it for kafka, for storm its the fail-fast >> nature >> > >> of workers that requires supervisord to restart. >> > >> When kafka goes down first time , i.e before supervisord restarts it >> do >> > >> you see same OOM error. Check the logs to see why its going down for >> the >> > >> first time. >> > >> -Harsha >> > >> >> > >> >> > >> >> > >> On Wed, Jan 14, 2015, at 10:50 PM, Tousif wrote: >> > >> > Hello Chia-Chun Shih, >> > >> > >> > >> > There are multiple issues, >> > >> > First thing is i don't see out of memory error every time and OOM >> happens >> > >> > after supervisord keep retrying to start kafka. >> > >> > It goes down when it tries to add partition fetcher >> > >> > >> > >> > it starts with >> > >> > >> > >> > *conflict in /controller data: >> > >> > {"version":1,"brokerid":0,"timestamp":"1421296052741"} stored data: >> > >> > {"version":1,"brokerid":1,"timestamp":"1421291998088"} >> > >> > (kafka.utils.ZkUtils$)* >> > >> > >> > >> > >> > >> > ERROR Conditional update of path >> > >> > /brokers/topics/realtimestreaming/partitions/1/state with data >> > >> > >> > >> >> {"controller_epoch":34,"leader":0,"version":1,"leader_epoch":54,"isr":[0]} >> > >> > and expected version 90 failed due to >> > >> > org.apache.zookeeper.KeeperException$BadVersionException: >> KeeperErrorCode >> > >> > = >> > >> > BadVersion for /brokers/topics/realtimestreaming/partitions/1/state >> > >> > (kafka.utils.ZkUtils$) >> > >> > >> > >> > and then >> > >> > >> > >> > [ReplicaFetcherManager on broker 0] Removed fetcher for partitions >> > >> > [realtimestreaming,0],[realtimestreaming,1] >> > >> > (kafka.server.ReplicaFetcherManager) >> > >> > [2015-01-15 09:57:34,350] INFO Truncating log realtimestreaming-0 >> to >> > >> > offset >> > >> > 846. (kafka.log.Log) >> > >> > [2015-01-15 09:57:34,351] INFO Truncating log realtimestreaming-1 >> to >> > >> > offset >> > >> > 957. (kafka.log.Log) >> > >> > [2015-01-15 09:57:34,650] INFO [ReplicaFetcherManager on broker 0] >> *Added >> > >> > fetcher for partitions ArrayBuffer([[realtimestreaming,0], >> initOffset 846 >> > >> > to broker id:1,host:realtimeslave1.novalocal,port:9092] , >> > >> > [[realtimestreaming,1], initOffset 957 to broker >> > >> > id:1,host:realtimeslave1.novalocal,port:9092] ) >> > >> > (kafka.server.ReplicaFetcherManager)* >> > >> > [2015-01-15 09:57:34,654] INFO [ReplicaFetcherThread-0-1], Starting >> > >> > (kafka.server.ReplicaFetcherThread) >> > >> > [2015-01-15 09:57:34,747] INFO [ReplicaFetcherThread-1-1], Starting >> > >> > (kafka.server.ReplicaFetcherThread) >> > >> > [2015-01-15 09:58:14,156] INFO Closing socket connection to / >> 10.0.0.11. >> > >> > (kafka.network.Processor) >> > >> > >> > >> > >> > >> > >> > >> > On Thu, Jan 15, 2015 at 12:01 PM, Chia-Chun Shih >> > >> > <chiachun.s...@gmail.com> >> > >> > wrote: >> > >> > >> > >> > > You can use tools (e.g., VisialVM) to diagnose OOM problem. >> > >> > > >> > >> > > 2015-01-15 14:15 GMT+08:00 Tousif Khazi <tou...@senseforth.com>: >> > >> > > >> > >> > > > i see this error >> > >> > > > >> > >> > > > ERROR [ReplicaFetcherThread-0-1], Error for partition >> > >> > > > [realtimestreaming,1] to broker 1:class >> > >> > > > kafka.common.NotLeaderForPartitionException >> > >> > > > (kafka.server.ReplicaFetcherThread) >> > >> > > > [2015-01-15 10:00:04,348] INFO [ReplicaFetcherManager on >> broker 0] >> > >> > > > Removed fetcher for partitions [realtimestreaming,1] >> > >> > > > (kafka.server.ReplicaFetcherManager) >> > >> > > > [2015-01-15 10:00:04,355] INFO Closing socket connection to >> > >> > > > /10.0.0.11. (kafka.network.Processor) >> > >> > > > [2015-01-15 10:00:04,444] WARN [KafkaApi-0] Fetch request with >> > >> > > > correlation id 0 from client ReplicaFetcherThread-0-0 on >> partition >> > >> > > > [realtimestreaming,1] failed due to Leader not local for >> partition >> > >> > > > [realtimestreaming,1] on broker 0 (kafka.server.KafkaApis) >> > >> > > > [2015-01-15 10:00:04,545] INFO [ReplicaFetcherThread-0-1], >> Shutting >> > >> > > > down (kafka.server.ReplicaFetcherThread) >> > >> > > > [2015-01-15 10:00:04,848] INFO [ReplicaFetcherThread-0-1], >> Stopped >> > >> > > > (kafka.server.ReplicaFetcherThread) >> > >> > > > [2015-01-15 10:00:04,849] INFO [ReplicaFetcherThread-0-1], >> Shutdown >> > >> > > > completed (kafka.server.ReplicaFetcherThread) >> > >> > > > [2015-01-15 10:00:39,256] ERROR Closing socket for /10.0.0.11 >> > >> because >> > >> > > > of error (kafka.network.Processor) >> > >> > > > java.io.IOException: Connection reset by peer >> > >> > > > at sun.nio.ch.FileDispatcher.read0(Native Method) >> > >> > > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) >> > >> > > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) >> > >> > > > at sun.nio.ch.IOUtil.read(IOUtil.java:171) >> > >> > > > at >> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) >> > >> > > > >> > >> > > > On Wed, Jan 14, 2015 at 10:12 PM, Tousif < >> tousif.pa...@gmail.com> >> > >> wrote: >> > >> > > > > Thanks harsha for quick response. >> > >> > > > > I don't see any other error. I used to see replica fetcher >> error >> > >> but >> > >> > > > seems >> > >> > > > > to be disappeared after setting replica fetcher threads to 2 >> as I >> > >> have >> > >> > > 2 >> > >> > > > > partitions. Some times I see zookeeper session expiration. >> > >> > > > > On Jan 14, 2015 8:02 PM, "Harsha" <ka...@harsha.io> wrote: >> > >> > > > > >> > >> > > > >> Tousif, >> > >> > > > >> Do you see any other errors in server.log >> > >> > > > >> -Harsha >> > >> > > > >> >> > >> > > > >> On Wed, Jan 14, 2015, at 01:51 AM, Tousif wrote: >> > >> > > > >> > Hello, >> > >> > > > >> > >> > >> > > > >> > I have configured kafka nodes to run via supervisord and >> see >> > >> > > > following >> > >> > > > >> > exceptions >> > >> > > > >> > and eventually brokers going out of memory. i have given >> enough >> > >> > > memory >> > >> > > > >> > and >> > >> > > > >> > process 1 event/second. kafka goes down every day >> > >> > > > >> > >> > >> > > > >> > i'm wondering what configurastion is missing or need to >> be added >> > >> > > > >> > >> > >> > > > >> > Here are my cluster details: >> > >> > > > >> > 2 brokers >> > >> > > > >> > 1 zookeeper >> > >> > > > >> > and 2 node apache storm >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > >> > INFO zookeeper state changed (SyncConnected) >> > >> > > > >> > (org.I0Itec.zkclient.ZkClient) >> > >> > > > >> > ERROR Closing socket for /10.0.0.11 because of error >> > >> > > > >> > (kafka.network.Processor) >> > >> > > > >> > java.io.IOException: Connection reset by peer >> > >> > > > >> > at sun.nio.ch.FileDispatcher.read0(Native Method) >> > >> > > > >> > at >> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) >> > >> > > > >> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) >> > >> > > > >> > at sun.nio.ch.IOUtil.read(IOUtil.java:171) >> > >> > > > >> > at >> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) >> > >> > > > >> > at kafka.utils.Utils$.read(Utils.scala:375) >> > >> > > > >> > at >> > >> > > > >> > >> > >> > > > >> >> > >> > > > >> > >> > > >> > >> >> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54) >> > >> > > > >> > at kafka.network.Processor.read(SocketServer.scala:347) >> > >> > > > >> > at kafka.network.Processor.run(SocketServer.scala:245) >> > >> > > > >> > at java.lang.Thread.run(Thread.java:662) >> > >> > > > >> > [2015-01-13 23:43:37,962] INFO Closing socket connection >> to / >> > >> > > > 10.0.0.11. >> > >> > > > >> > (kafka.network.Processor) >> > >> > > > >> > Error occurred during initialization of VM >> > >> > > > >> > Could not reserve enough space for object heap >> > >> > > > >> > Error occurred during initialization of VM >> > >> > > > >> > Could not reserve enough space for object heap >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > >> > -- >> > >> > > > >> > Regards, >> > >> > > > >> > Tousif >> > >> > > > >> > +918050227279 >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > >> > -- >> > >> > > > >> > >> > >> > > > >> > >> > >> > > > >> > Regards >> > >> > > > >> > Tousif Khazi >> > >> > > > >> >> > >> > > > >> > >> > > > >> > >> > > > >> > >> > > > -- >> > >> > > > Regards, >> > >> > > > Tousif >> > >> > > > +918050227279 >> > >> > > > >> > >> > > >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > >> > >> > >> > >> > Regards >> > >> > Tousif Khazi >> > >> >> > > >> > > >> > > >> > > -- >> > > >> > > >> > > Regards >> > > Tousif Khazi >> > > > > -- > > > Regards > Tousif Khazi > > -- Regards Tousif Khazi