What if you reduce MaxSize to some small number, like 10, does it solve the problem? Can you please run jvisualvm and see what happens with the JVM heap?
On Wed, Feb 5, 2020 at 12:28 PM Eduard Llull <edu...@llull.net> wrote: > Hi Pavel, > > We have six servers, but these don't have any issue, and 40 client nodes > (the Igntie node is started with IgniteConfiguration.ClientMode = true). > > The 40 client nodes are the ones where we are having the memory issue. > > The _ignite.GetOrCreateNearCache is execute on the client nodes. We also > tried to using the following code but the memory issue was the same: > var nearCacheCfg = new NearCacheConfiguration > { > // Use LRU eviction policy to automatically evict entries whenever it > reaches 100000 in size. > EvictionPolicy = new LruEvictionPolicy > { > MaxSize = 5000, // 5000 elements > MaxMemorySize = 500000000 > } > }; > return _ignite.GetOrCreateCache<TKey, TValue>(new CacheConfiguration( > cacheName), nearCacheCfg); > > > On Wed, Feb 5, 2020 at 10:02 AM Pavel Tupitsyn <ptupit...@apache.org> > wrote: > >> Hi Eduard, >> >> Do you have any client nodes (IgniteConfiguration.ClientMode=true), or >> just servers? >> >> Is the following line executed on Ignite server node? >> _ignite.GetOrCreateNearCache >> >> On Wed, Feb 5, 2020 at 11:44 AM Eduard Llull <edu...@llull.net> wrote: >> >>> Hi everyone, >>> >>> We have been using Ignite and Ignite.NET in the recent months in a >>> project. We currently have six Ignite servers (started with ignite.sh) and >>> a bunch of thick clients split in two .NET Core application deployed in 30 >>> servers. >>> >>> We store de-normalized data in the Ignite data grid: one of the .NET >>> Core applications puts data into the cache and the other application is a >>> gRPC service that just reads that data to compute a response. The data is >>> split in a dozen of caches which are created programatically from the >>> application that writes into the caches. >>> >>> The caches are PARTITIONED and TRANSACTIONAL and the partitions have two >>> backups. >>> >>> It's been working fine so far but we identified that one particular >>> cache was the most read and to reduce network usage and improve response >>> time of the gRPC service we decided to use a near cache. That particular >>> cache has ~2300 entries which occupies ~110MB of space and the near cache >>> is configured with a maxSize=5000 and maxMemorySize=500000000 >>> >>> [image: image.png] >>> >>> The embedded JVM in the gRPC .NET Core application is started with the >>> following parameters: >>> -Xmx=1024 >>> -Xms=1024 >>> -Djava.net.preferIPv4Stack=true >>> -Xrs >>> -XX:+AlwaysPreTouch >>> -XX:+UseG1GC >>> -XX:+ScavengeBeforeFullGC >>> -XX:+DisableExplicitGC >>> -DIGNITE_NO_SHUTDOWN_HOOK=true >>> -Dcom.sun.management.jmxremote >>> -Dcom.sun.management.jmxremote.ssl=false >>> -Dcom.sun.management.jmxremote.authenticate=false >>> -Dcom.sun.management.jmxremote.local.only=false >>> -Dcom.sun.management.jmxremote.port=12345 >>> >>> If we don't use the near cache, at every gRPC call the server receives >>> it executes the following code to get the cache (this works fine): >>> return _ignite.GetCache<TKey, TValue>(cacheName); >>> >>> And if we want to use the near cache, that line is changed to: >>> var nearCacheCfg = new NearCacheConfiguration >>> { >>> // Use LRU eviction policy to automatically evict entries whenever it >>> reaches 100000 in size. >>> EvictionPolicy = new LruEvictionPolicy >>> { >>> MaxSize = 5000, // 5000 elements >>> MaxMemorySize = 500000000 >>> } >>> }; >>> return _ignite.GetOrCreateNearCache<TKey, TValue>(cacheName, >>> nearCacheCfg); >>> >>> But since we added the near cache the application memory usage never >>> stabilizes: without the near cache the application uses ~2.5GB of RAM in >>> every server but wen we use the near cache, the application memory usage >>> never stops growing. >>> >>> This is the memory usage of one of the servers with the gRPC application. >>> [image: image.png] >>> >>> In the graph above, the version with the near cache was deployed on >>> February the 3rd at 17:00. At 01:30 of Febreary the 4th the server started >>> swapping and at arround 7:45 the application crashed. This is a detail: >>> [image: image.png] >>> >>> I would very much like to create a reproducer but it looks like it would >>> take a very long time to execute the reproduce the issue as the gRPC >>> application needs several hours to use all the memory and if we take into >>> account that every server with the gRPC application receives around 90 >>> requests per second, if the memory leak exists it is very slow. >>> >>> Does anybody have any idea where the problem can be or how to find it? >>> >>> >>> Thank you very much >>> >>>