I have not dug into the code, but judging from the property name, the data 
structure is related to recovering from failures (recovery). Are these out of 
memory errors happening around the time of other problems? Are you seeing 
network issues? Do you see “long JVM pauses” in the logs?

> On 30 Nov 2021, at 12:28, Eduard Llull Pou <eduard.ll...@bluekiri.com> wrote:
> 
> Hi Ibrahim,
> 
> We'll test it but even if your suggested parameters reduce the number of 
> OOMs, the instance of the 
> org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper class 
> will still retain a lot of memory because the nodes of the `recoveryDescs` 
> ConcurrentHashMap are not weak references so, as long the nodes are 
> referenced by the ConcurrentHashMap they won't be collectected by the Garbage 
> Collector.
> 
> A proper solution would be to find a way to reduce the number of entries in 
> the `recoveryDescs` ConcurrentHashMap.
> 
> Going deeper, the values of the `recoveryDescs` ConcurrentHaspMap are 
> instances of org.apache.ignite.internal.util.nio.GridNioRecoveryDescriptor 
> which contain the `msgReqs` ArrayDeque and most of the memory is retained 
> because of the elements of that ArrayDeque. I see that the elements of the 
> `msgReqs` ArrayDeque are instances of 
> org.apache.ignite.internal.util.nio.GridNioServer$WriteRequestImpl
> 
> <image.png>
> 
> El mar, 30 nov 2021 a las 12:44, Ibrahim Altun (<ibrahim.al...@segmentify.com 
> <mailto:ibrahim.al...@segmentify.com>>) escribió:
> Hi,
> 
> We have faced same problems for a long time, 
> https://medium.com/@hoan.nguyen.it/how-did-g1gc-tuning-flags-affect-our-back-end-web-app-c121d38dfe56
>  
> <https://medium.com/@hoan.nguyen.it/how-did-g1gc-tuning-flags-affect-our-back-end-web-app-c121d38dfe56>
>  helped a lot solving the problem on our side. We have added following gc 
> parameters and problem solved in our case;
> 
> -XX:ParallelGCThreads=6 -XX:ConcGCThreads=2 -XX:MaxGCPauseMillis=200 
> -XX:InitiatingHeapOccupancyPercent=40
> 
> 
> 
> On Tue, 30 Nov 2021 at 14:22, Eduard Llull Pou <eduard.ll...@bluekiri.com 
> <mailto:eduard.ll...@bluekiri.com>> wrote:
> Hello Igniters,
> 
> We have an Apache Ignite 2.10.0 cluster with several server nodes and a bunch 
> of thick client nodes. At least once every week we have at least one of the 
> server nodes that crashes because of a "  java.lang.OutOfMemoryError: Java 
> heap space"
> 
> The servers JVMs are started with the ignite.sh script setting:
> JVM_OPTS=-server -Xms6g -Xmx6g -XX:+AlwaysPreTouch -XX:+UseG1GC 
> -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC 
> -Djava.net.preferIPv4Stack=true
> 
> This is the heap usage of one of the servers
> <image.png>
> 
> Strangelly, not all servers have this memory usage. Most of them never go 
> above 4.5GB of heap.
> 
> I have a memory dump of one of the servers when it reached 5GB of heap usage 
> for several minutes and using the Eclipse Memory Analyzer I can see that from 
> the 3.8GB of live heap, 3.3GB are allocated in an instance of the 
> org.apache.ignite.spi.communication.tcp.internal.GridNioServerWrapper class.
> 
> <image.png>
> 
> And almost all of the 3.3GB of that GridNioServerWrapper instance are 
> retained because of the recoveryDescs ConcurrentHashMap nodes:
> <image.png>
> 
> Is there anything we can configure to avoid this map growing that large? is 
> it a bug?
> 
> I'm assuming that the ~2GB of difference between the memory dump size (3.8GB) 
> and the Xmx value (6GB) are short lived objects so they don't appear in the 
> dump as we used the `jmap -dump:live,...` command to generate the memory dump.
> 
> 
> Thank you.
> 
> -- 
> 
> 
> Eduard Llull | Technical Architect 
> eduard.ll...@bluekiri.com <mailto:eduard.ll...@bluekiri.com> | +34 971925981
> Bluekiri 
> https://bluekiri.com <https://bluekiri.com/>
> Blaise Pascal, ParcBit - Edificio Europa, bajos 07121 Palma (Spain)
>  <https://cloud.bluekiri.com/>  
> <https://cloud.withgoogle.com/partners/detail/?id=CIGAgICAgICzQg%3D%3D&language=en>
>   
> <https://medium.com/bluekiri/bluekiri-is-now-silver-microsoft-partner-69887ad25d82>
>   <https://medium.com/bluekiri/announcing-iso-27001-certification-b0923982441>
> This email may be confidential and privileged. If you received this 
> communication by mistake, please don't forward it to anyone else, please 
> erase all copies and attachments, and please let me know that it has gone to 
> the wrong person. The above terms reflect a potential business arrangement, 
> are provided solely as a basis for further discussion, and are not intended 
> to be and do not constitute a legally binding obligation. No legally binding 
> obligations will be created, implied, or inferred until an agreement in final 
> form is executed in writing by all parties involved.
> 
> 
> 
> -- 
>  <https://www.segmentify.com/>
> İbrahim Halil Altun
> Senior Software Engineer
> 
> +90 536 3327510 • segmentify.com → <https://www.segmentify.com/>
> UK • Germany • Turkey
> 
>  <https://www.segmentify.com/ecommerce-growth-show> 
> <https://www.g2.com/products/segmentify/reviews>
> 
> -- 
> 
> 
> Eduard Llull | Technical Architect 
> eduard.ll...@bluekiri.com <mailto:eduard.ll...@bluekiri.com> | +34 971925981
> Bluekiri 
> https://bluekiri.com <https://bluekiri.com/>
> Blaise Pascal, ParcBit - Edificio Europa, bajos 07121 Palma (Spain)
>  <https://cloud.bluekiri.com/>  
> <https://cloud.withgoogle.com/partners/detail/?id=CIGAgICAgICzQg%3D%3D&language=en>
>   
> <https://medium.com/bluekiri/bluekiri-is-now-silver-microsoft-partner-69887ad25d82>
>   <https://medium.com/bluekiri/announcing-iso-27001-certification-b0923982441>
> This email may be confidential and privileged. If you received this 
> communication by mistake, please don't forward it to anyone else, please 
> erase all copies and attachments, and please let me know that it has gone to 
> the wrong person. The above terms reflect a potential business arrangement, 
> are provided solely as a basis for further discussion, and are not intended 
> to be and do not constitute a legally binding obligation. No legally binding 
> obligations will be created, implied, or inferred until an agreement in final 
> form is executed in writing by all parties involved.
> 


Reply via email to