I will add more important info for this case:

1. I tailed the logs and found out we suddenly starting to get those error 
messages:

2017-02-19T09:01:56.114Z INFO  [SimpleConsumer] Reconnect due to error:

 

java.nio.channels.ClosedChannelException: null 

 at kafka.network.BlockingChannel.send(BlockingChannel.scala:110) ~[graylog.
jar:?] 

 at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:85) [
graylog.jar:?] 

 at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest
(SimpleConsumer.scala:83) [graylog.jar:?] 

 at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:149) 
[graylog.jar:?] 

 at kafka.consumer.SimpleConsumer.earliestOrLatestOffset(SimpleConsumer.
scala:188) [graylog.jar:?] 

 at kafka.consumer.ConsumerFetcherThread.handleOffsetOutOfRange(
ConsumerFetcherThread.scala:84) [graylog.jar:?] 

 at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(
AbstractFetcherThread.scala:187) [graylog.jar:?] 

 at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(
AbstractFetcherThread.scala:182) [graylog.jar:?] 

 at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
TraversableLike.scala:733) [graylog.jar:?] 

 at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) [graylog.jar
:?] 

 at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.
scala:732) [graylog.jar:?] 

 at kafka.server.AbstractFetcherThread.addPartitions(AbstractFetcherThread.
scala:182) [graylog.jar:?] 

 at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.
apply(AbstractFetcherManager.scala:88) [graylog.jar:?] 

 at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.
apply(AbstractFetcherManager.scala:78) [graylog.jar:?] 

 at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
TraversableLike.scala:733) [graylog.jar:?] 

 at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221) [
graylog.jar:?] 

 at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428
) [graylog.jar:?] 

 at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.
scala:732) [graylog.jar:?] 

 at kafka.server.AbstractFetcherManager.addFetcherForPartitions(
AbstractFetcherManager.scala:78) [graylog.jar:?] 

 at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(
ConsumerFetcherManager.scala:95) [graylog.jar:?] 

 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) [graylog
.jar:?] 

2017-02-19T09:02:26.144Z WARN  [ConsumerFetcherManager$LeaderFinderThread] [
graylog2_graylog-1487494433777-5108bdee-leader-finder-thread], Failed to 
add leader for partitions ....; will retry 

java.nio.channels.ClosedChannelException: null 

 at kafka.network.BlockingChannel.send(BlockingChannel.scala:110) ~[graylog.
jar:?] 

 at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:98) ~[
graylog.jar:?] 

 at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest
(SimpleConsumer.scala:83) ~[graylog.jar:?] 

 at kafka.consumer.SimpleConsumer.getOffsetsBefore(SimpleConsumer.scala:149) 
~[graylog.jar:?] 

 at kafka.consumer.SimpleConsumer.earliestOrLatestOffset(SimpleConsumer.
scala:188) ~[graylog.jar:?] 

 at kafka.consumer.ConsumerFetcherThread.handleOffsetOutOfRange(
ConsumerFetcherThread.scala:84) ~[graylog.jar:?] 

 at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(
AbstractFetcherThread.scala:187) ~[graylog.jar:?] 

 at kafka.server.AbstractFetcherThread$$anonfun$addPartitions$2.apply(
AbstractFetcherThread.scala:182) ~[graylog.jar:?] 

 at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
TraversableLike.scala:733) ~[graylog.jar:?] 

 at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) ~[graylog.jar
:?] 

 at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.
scala:732) ~[graylog.jar:?] 

 at kafka.server.AbstractFetcherThread.addPartitions(AbstractFetcherThread.
scala:182) ~[graylog.jar:?] 

 at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.
apply(AbstractFetcherManager.scala:88) ~[graylog.jar:?] 

 at kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.
apply(AbstractFetcherManager.scala:78) ~[graylog.jar:?] 

 at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(
TraversableLike.scala:733) ~[graylog.jar:?] 

 at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221) 
~[graylog.jar:?] 

 at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428
) ~[graylog.jar:?] 

 at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.
scala:732) ~[graylog.jar:?] 

 at kafka.server.AbstractFetcherManager.addFetcherForPartitions(
AbstractFetcherManager.scala:78) ~[graylog.jar:?] 

 at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(
ConsumerFetcherManager.scala:95) [graylog.jar:?] 

 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63) [graylog
.jar:?] 

2017-02-19T09:02:26.145Z INFO  [ConsumerFetcherThread] [
ConsumerFetcherThread-graylog2_graylog-1487494433777-5108bdee-0-3], Shutting 
down 

2017-02-19T09:02:26.145Z INFO  [ConsumerFetcherThread] [
ConsumerFetcherThread-graylog2_graylog-1487494433777-5108bdee-0-3], Stopped 
 

2017-02-19T09:02:26.145Z INFO  [ConsumerFetcherThread] [
ConsumerFetcherThread-graylog2_graylog-1487494433777-5108bdee-0-3], Shutdown 
completed



2. After a few minutes, the processing buffer is filling up with no 
cleaning.
3. When processing buffer is full - The output rate goes down to 0 msg/s.

Can you please help?

Best

Nitzan

On Thursday, February 16, 2017 at 3:37:02 PM UTC+2, Nitzan Haimovich wrote:
>
> Hi,
>
> We have a cluster of 3 Graylog nodes. Each node had 8 cores and 32GB 
> memory.
> The cluster works pretty well, we gain a very nice throughput (around 
> 40,000 msgs for input and output).
> We encounter a very strange problem tho - Sometimes, with no clear reason, 
> one or two nodes suddenly stops to process messages and output to ES.
> Then, we have two options:
> 1. Wait for it to come back to work. It usually happens after the Journal 
> get filled.
> 2. Restart the Graylog service.
>
> Any idea why such a thing could happen?
> Let me know if you need me to attach more info.
>
> Thanks!
>
> Nitzan
>

-- 
You received this message because you are subscribed to the Google Groups 
"Graylog Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/graylog2/5d2d0160-f906-4add-a5af-38d05a17daa9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to