I will add more important info for this case: 1. I tailed the logs and found out we suddenly starting to get those error messages:
2017-02-19T09:01:56.114Z INFO [SimpleConsumer] Reconnect due to error: java.nio.channels.ClosedChannelException: null at kafka.network.BlockingChannel.send(BlockingChannel.scala:110) ~[graylog. jar:?] at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:85) [ graylog.jar:?] at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest (SimpleConsumer.scala:83) [graylog.jar:?] at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:149) [graylog.jar:?] at kafka.consumer.SimpleConsumer.earliestOrLatestOffset(SimpleConsumer. scala:188) [graylog.jar:?] at kafka.consumer.ConsumerFetcherThread.handleOffsetOutOfRange( ConsumerFetcherThread.scala:84) [graylog.jar:?] at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply( AbstractFetcherThread.scala:187) [graylog.jar:?] at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply( AbstractFetcherThread.scala:182) [graylog.jar:?] at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply( TraversableLike.scala:733) [graylog.jar:?] at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) [graylog.jar :?] at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike. scala:732) [graylog.jar:?] at kafka.server.AbstractFetcherThread.addPartitions(AbstractFetcherThread. scala:182) [graylog.jar:?] at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2. apply(AbstractFetcherManager.scala:88) [graylog.jar:?] at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2. apply(AbstractFetcherManager.scala:78) [graylog.jar:?] at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply( TraversableLike.scala:733) [graylog.jar:?] at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221) [ graylog.jar:?] at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428 ) [graylog.jar:?] at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike. scala:732) [graylog.jar:?] at kafka.server.AbstractFetcherManager.addFetcherForPartitions( AbstractFetcherManager.scala:78) [graylog.jar:?] at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork( ConsumerFetcherManager.scala:95) [graylog.jar:?] at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) [graylog .jar:?] 2017-02-19T09:02:26.144Z WARN [ConsumerFetcherManager$LeaderFinderThread] [ graylog2_graylog-1487494433777-5108bdee-leader-finder-thread], Failed to add leader for partitions ....; will retry java.nio.channels.ClosedChannelException: null at kafka.network.BlockingChannel.send(BlockingChannel.scala:110) ~[graylog. jar:?] at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:98) ~[ graylog.jar:?] at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest (SimpleConsumer.scala:83) ~[graylog.jar:?] at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:149) ~[graylog.jar:?] at kafka.consumer.SimpleConsumer.earliestOrLatestOffset(SimpleConsumer. scala:188) ~[graylog.jar:?] at kafka.consumer.ConsumerFetcherThread.handleOffsetOutOfRange( ConsumerFetcherThread.scala:84) ~[graylog.jar:?] at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply( AbstractFetcherThread.scala:187) ~[graylog.jar:?] at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply( AbstractFetcherThread.scala:182) ~[graylog.jar:?] at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply( TraversableLike.scala:733) ~[graylog.jar:?] at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) ~[graylog.jar :?] at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike. scala:732) ~[graylog.jar:?] at kafka.server.AbstractFetcherThread.addPartitions(AbstractFetcherThread. scala:182) ~[graylog.jar:?] at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2. apply(AbstractFetcherManager.scala:88) ~[graylog.jar:?] at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2. apply(AbstractFetcherManager.scala:78) ~[graylog.jar:?] at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply( TraversableLike.scala:733) ~[graylog.jar:?] at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221) ~[graylog.jar:?] at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428 ) ~[graylog.jar:?] at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike. scala:732) ~[graylog.jar:?] at kafka.server.AbstractFetcherManager.addFetcherForPartitions( AbstractFetcherManager.scala:78) ~[graylog.jar:?] at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork( ConsumerFetcherManager.scala:95) [graylog.jar:?] at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) [graylog .jar:?] 2017-02-19T09:02:26.145Z INFO [ConsumerFetcherThread] [ ConsumerFetcherThread-graylog2_graylog-1487494433777-5108bdee-0-3], Shutting down 2017-02-19T09:02:26.145Z INFO [ConsumerFetcherThread] [ ConsumerFetcherThread-graylog2_graylog-1487494433777-5108bdee-0-3], Stopped 2017-02-19T09:02:26.145Z INFO [ConsumerFetcherThread] [ ConsumerFetcherThread-graylog2_graylog-1487494433777-5108bdee-0-3], Shutdown completed 2. After a few minutes, the processing buffer is filling up with no cleaning. 3. When processing buffer is full - The output rate goes down to 0 msg/s. Can you please help? Best Nitzan On Thursday, February 16, 2017 at 3:37:02 PM UTC+2, Nitzan Haimovich wrote: > > Hi, > > We have a cluster of 3 Graylog nodes. Each node had 8 cores and 32GB > memory. > The cluster works pretty well, we gain a very nice throughput (around > 40,000 msgs for input and output). > We encounter a very strange problem tho - Sometimes, with no clear reason, > one or two nodes suddenly stops to process messages and output to ES. > Then, we have two options: > 1. Wait for it to come back to work. It usually happens after the Journal > get filled. > 2. Restart the Graylog service. > > Any idea why such a thing could happen? > Let me know if you need me to attach more info. > > Thanks! > > Nitzan > -- You received this message because you are subscribed to the Google Groups "Graylog Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/graylog2/5d2d0160-f906-4add-a5af-38d05a17daa9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
