Good to hear from you , I have had the same issue for quite a long time and am still looking for a fix.
What do you think has exactly resolved the heap starvation issue, is it the GC related configuration or the threadpool configuration. ? Default thread pool is the number of the cores of the server, if this is true, we don't need to specify any config for all these thread pool Thanks Naveen On Wed, Sep 29, 2021 at 2:35 PM Ibrahim Altun <ibrahim.al...@segmentify.com> wrote: > after many configuration changes and optimizations, i think i've solved > the heap problem. > > here are the changes that i applied to the system; > JVM changes -> > https://medium.com/@hoan.nguyen.it/how-did-g1gc-tuning-flags-affect-our-back-end-web-app-c121d38dfe56 > helped a lot > > nodes are running on 12CORE and 64GB MEM servers, i've added the following > jvm parameters > > -XX:ParallelGCThreads=6 > -XX:ConcGCThreads=2 > -XX:MaxGCPauseMillis=200 > -XX:InitiatingHeapOccupancyPercent=40 > > on ignite configuration i've changed all thread pool sizes, which were > much more than these; > <property name="systemThreadPoolSize" value="12"/> > <property name="publicThreadPoolSize" value="12"/> > <property name="queryThreadPoolSize" value="12"/> > <property name="serviceThreadPoolSize" value="12"/> > <property name="stripedPoolSize" value="12"/> > <property name="dataStreamerThreadPoolSize" value="12"/> > <property name="rebalanceThreadPoolSize" value="12"/> > > Here is the 16 hours of GC report; > > https://gceasy.io/diamondgc-report.jsp?p=c2hhcmVkLzIwMjEvMDkvMjkvLS1nYy5sb2cuMC5jdXJyZW50LS04LTU4LTMx&channel=WEB > > > > On 2021/09/27 17:11:21, Ilya Korol <llivezk...@gmail.com> wrote: > > Actually Query interface doesn't define close() method, but QueryCursor > > does. > > In your snippets you're using try-with-resource construction for SELECT > > queries which is good, but when you run MERGE INTO query you would also > > get an QueryCursor as a result of > > > > igniteCacheService.getCache(ID, > IgniteCacheType.LABEL).query(insertQuery); > > > > so maybe this QueryCursor objects still hold some resources/memory. > > Javadoc for QueryCursor states that you should always close cursors. > > > > To simplify cursor closing there is a cursor.getAll() method that will > > do this for you under the hood. > > > > > > On 2021/09/13 06:17:21, Ibrahim Altun <i...@segmentify.com> wrote: > > > Hi Ilya,> > > > > > > since this is production environment i could not risk to take heap > > dump for now, but i will try to convince my superiors to get one and > > analyze it.> > > > > > > Queries are heavily used in our system but aren't they autoclosable > > objects? do we have to close them anyway?> > > > > > > here are some usage examples on our system;> > > > --insert query is like this; MERGE INTO "ProductLabel" ("productId", > > "label", "language") VALUES (?, ?, ?)> > > > igniteCacheService.getCache(ID, > > IgniteCacheType.LABEL).query(insertQuery);> > > > > > > another usage example;> > > > --sqlFieldsQuery is like this; > > > > String sql = "SELECT _val FROM \"UserRecord\" WHERE \"email\" IN > (?)";> > > > SqlFieldsQuery sqlFieldsQuery = new SqlFieldsQuery(sql);> > > > sqlFieldsQuery.setLazy(true);> > > > sqlFieldsQuery.setArgs(emails.toArray());> > > > > > > try (QueryCursor<List<?>> ignored = igniteCacheService.getCache(ID, > > IgniteCacheType.USER).query(sqlFieldsQuery)) {...}> > > > > > > > > > > > > On 2021/09/12 20:28:09, Shishkov Ilya <sh...@gmail.com> wrote: > > > > > Hi, Ibrahim!> > > > > Have you analyzed the heap dump of the server node JVMs?> > > > > In case your application executes queries are their cursors closed?> > > > > > > > > > пт, 10 сент. 2021 г. в 11:54, Ibrahim Altun <ib...@segmentify.com > >:> > > > > > > > > > > Igniters any comment on this issue, we are facing huge GC > > problems on> > > > > > production environment, please advise.> > > > > >> > > > > > On 2021/09/07 14:11:09, Ibrahim Altun <ib...@segmentify.com>> > > > > > wrote:> > > > > > > Hi,> > > > > > >> > > > > > > totally 400 - 600K reads/writes/updates> > > > > > > 12core> > > > > > > 64GB RAM> > > > > > > no iowait> > > > > > > 10 nodes> > > > > > >> > > > > > > On 2021/09/07 12:51:28, Piotr Jagielski <pj...@touk.pl> wrote:> > > > > > > > Hi,> > > > > > > > Can you provide some information on how you use the cluster? > > How many> > > > > > reads/writes/updates per second? Also CPU / RAM spec of cluster > > nodes?> > > > > > > >> > > > > > > > We observed full GC / CPU load / OOM killer when loading big > > amount of> > > > > > data (15 mln records, data streamer + allowOverwrite=true). We've > > seen> > > > > > 200-400k updates per sec on JMX metrics, but load up to 10 on > > nodes, iowait> > > > > > to 30%. Our cluster is 3 x 4CPU, 16GB RAM (already upgradingto > > 8CPU, 32GB> > > > > > RAM). Ignite 2.10> > > > > > > >> > > > > > > > Regards,> > > > > > > > Piotr> > > > > > > >> > > > > > > > On 2021/09/02 08:36:07, Ibrahim Altun <ib...@segmentify.com>> > > > > > wrote:> > > > > > > > > After upgrading from 2.7.1 version to 2.10.0 version ignite > > nodes> > > > > > facing> > > > > > > > > huge full GC operations after 24-36 hours after node start.> > > > > > > > >> > > > > > > > > We try to increase heap size but no luck, here is the start> > > > > > configuration> > > > > > > > > for nodes;> > > > > > > > >> > > > > > > > > JVM_OPTS="$JVM_OPTS -Xms12g -Xmx12g -server> > > > > > > > >> > > > > > > > > -javaagent:/etc/prometheus/jmx_prometheus_javaagent-0.14.0.jar=8090:/etc/prometheus/jmx.yml> > > > > > > > > > > > -Dcom.sun.management.jmxremote> > > > > > > > > -Dcom.sun.management.jmxremote.authenticate=false> > > > > > > > > -Dcom.sun.management.jmxremote.port=49165> > > > > > > > > -Dcom.sun.management.jmxremote.host=localhost> > > > > > > > > -XX:MaxMetaspaceSize=256m -XX:MaxDirectMemorySize=1g> > > > > > > > > -DIGNITE_SKIP_CONFIGURATION_CONSISTENCY_CHECK=true> > > > > > > > > -DIGNITE_WAL_MMAP=true > > -DIGNITE_BPLUS_TREE_LOCK_RETRIES=100000> > > > > > > > > -Djava.net.preferIPv4Stack=true"> > > > > > > > >> > > > > > > > > JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch -XX:+UseG1GC> > > > > > > > > -XX:+ScavengeBeforeFullGC -XX:+DisableExplicitGC> > > > > > > > > -XX:+UseStringDeduplication > > -Xloggc:/var/log/apache-ignite/gc.log> > > > > > > > > -XX:+PrintGCDetails -XX:+PrintGCDateStamps> > > > > > > > > -XX:+PrintTenuringDistribution -XX:+PrintGCCause> > > > > > > > > -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10> > > > > > > > > -XX:GCLogFileSize=100M"> > > > > > > > >> > > > > > > > > here is the 80 hours of GC analyize report:> > > > > > > > >> > > > > > > > > https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMjEvMDgvMzEvLS1nYy5sb2cuMC5jdXJyZW50LnppcC0tNS01MS0yOQ==&channel=WEB> > > > > > > > > > > >> > > > > > > > > do we need more heap size or is there a BUG that we need to > > be aware?> > > > > > > > >> > > > > > > > > here is the node configuration:> > > > > > > > >> > > > > > > > > <?xml version="1.0" encoding="UTF-8"?>> > > > > > > > > <beans xmlns="http://www.springframework.org/schema/beans"> > > > > > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> > > > > > > > > xsi:schemaLocation="> > > > > > > > > http://www.springframework.org/schema/beans> > > > > > > > > > > http://www.springframework.org/schema/beans/spring-beans.xsd">> > > > > > > > > <bean id="ignite.cfg"> > > > > > > > > > class="org.apache.ignite.configuration.IgniteConfiguration">> > > > > > > > > <property name="gridLogger">> > > > > > > > > <bean class="org.apache.ignite.logger.log4j2.Log4J2Logger">> > > > > > > > > <constructor-arg type="java.lang.String"> > > > > > > > > value="/etc/apache-ignite/ignite-log4j2.xml"/>> > > > > > > > > </bean>> > > > > > > > > </property>> > > > > > > > > <property name="communicationSpi">> > > > > > > > > <bean> > > > > > > > class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">> > > > > > > > > <property name="usePairedConnections" value="true"/>> > > > > > > > > </bean>> > > > > > > > > </property>> > > > > > > > > <property name="failureDetectionTimeout" value="60000"/>> > > > > > > > > <property name="systemThreadPoolSize" value="128"/>> > > > > > > > > <property name="publicThreadPoolSize" value="128"/>> > > > > > > > > <property name="queryThreadPoolSize" value="128"/>> > > > > > > > > <property name="serviceThreadPoolSize" value="128"/>> > > > > > > > > <property name="stripedPoolSize" value="128"/>> > > > > > > > > <property name="dataStreamerThreadPoolSize" value="4"/>> > > > > > > > > <property name="rebalanceThreadPoolSize" value="16"/>> > > > > > > > >> > > > > > > > > <!-- Explicitly enable peer class loading. -->> > > > > > > > > <property name="peerClassLoadingEnabled" value="true"/>> > > > > > > > >> > > > > > > > > <!-- Enable deploymentSpi,> > > > > > > > > /usr/share/apache-ignite/libs/segmentify directory will be > > checked> > > > > > > > > every 5 seconds for changed files-->> > > > > > > > > <property name="deploymentSpi">> > > > > > > > > <bean> > > > > > class="org.apache.ignite.spi.deployment.uri.UriDeploymentSpi">> > > > > > > > > <property name="temporaryDirectoryPath"> > > > > > > > > value="/tmp/temp_ignite_libs"/>> > > > > > > > > <property name="uriList">> > > > > > > > > <list>> > > > > > > > >> > > > > > > > > <value>file://freq=5000@localhost> > > > > > /usr/share/apache-ignite/libs/segmentify/</value>> > > > > > > > > </list>> > > > > > > > > </property>> > > > > > > > > </bean>> > > > > > > > > </property>> > > > > > > > >> > > > > > > > > <property name="cacheConfiguration">> > > > > > > > > <list>> > > > > > > > > <!-- Partitioned cache example configuration (Atomic> > > > > > mode). -->> > > > > > > > > <bean> > > > > > class="org.apache.ignite.configuration.CacheConfiguration">> > > > > > > > > <property name="name" value="default"/>> > > > > > > > > <property name="atomicityMode" value="ATOMIC"/>> > > > > > > > > <property name="backups" value="1"/>> > > > > > > > > </bean>> > > > > > > > > </list>> > > > > > > > > </property>> > > > > > > > >> > > > > > > > > <!-- Explicitly configure TCP discovery SPI to provide list > > of> > > > > > > > > initial nodes. -->> > > > > > > > > <property name="discoverySpi">> > > > > > > > > <bean> > > > > > class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">> > > > > > > > > <property name="networkTimeout" value="60000"/>> > > > > > > > > <property name="ipFinder">> > > > > > > > > <bean> > > > > > > > >> > > > > > > > > class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">> > > > > > > > > > > > <property name="addresses">> > > > > > > > > <list>> > > > > > > > > <!-- THERE ARE 10 NODES -->> > > > > > > > > </list>> > > > > > > > > </property>> > > > > > > > > </bean>> > > > > > > > > </property>> > > > > > > > > </bean>> > > > > > > > > </property>> > > > > > > > >> > > > > > > > > <!-- Enabling Apache Ignite native persistence. -->> > > > > > > > > <property name="dataStorageConfiguration">> > > > > > > > > <bean> > > > > > class="org.apache.ignite.configuration.DataStorageConfiguration">> > > > > > > > > <property name="defaultDataRegionConfiguration">> > > > > > > > > <bean> > > > > > > > > > > class="org.apache.ignite.configuration.DataRegionConfiguration">> > > > > > > > > <property name="persistenceEnabled"> > > > > > value="true"/>> > > > > > > > > <property name="checkpointPageBufferSize"> > > > > > > > > value="#{ 2L * 1024 * 1024 * 1024}"/>> > > > > > > > > <property name="maxSize" value="#{ 40L * 1024 *> > > > > > > > > 1024 * 1024 }"/>> > > > > > > > > </bean>> > > > > > > > > </property>> > > > > > > > > <property name="storagePath"> > > > > > value="/srv/ignite/persist"/>> > > > > > > > > <property name="walPath" value="/srv/ignite/wal"/>> > > > > > > > > <property name="walArchivePath" value="/srv/ignite/wal"/>> > > > > > > > > <property name="walMode" value="LOG_ONLY"/>> > > > > > > > > <property name="walSegmentSize" value="#{ 256L * 1024 *> > > > > > 1024 }"/>> > > > > > > > > <property name="walFlushFrequency" value="5000"/>> > > > > > > > > <property name="maxWalArchiveSize" value="#{ 512L * 1024> > > > > > * 1024 }"/>> > > > > > > > > <property name="writeThrottlingEnabled" value="true"/>> > > > > > > > > <property name="checkpointFrequency" value="300000"/>> > > > > > > > > <property name="checkpointWriteOrder" value="SEQUENTIAL"> > > > > > />> > > > > > > > > </bean>> > > > > > > > > </property>> > > > > > > > > </bean>> > > > > > > > >> > > > > > > > >> > > > > > > > > --> > > > > > > > > <https://www.segmentify.com/>İbrahim Halil AltunSenior > > Software> > > > > > Engineer+90> > > > > > > > > 536 3327510 • segmentify.com → > > <https://www.segmentify.com/>UK •> > > > > > Germany •> > > > > > > > > Turkey <https://www.segmentify.com/ecommerce-growth-show>> > > > > > > > > <https://www.g2.com/products/segmentify/reviews>> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > -- Thanks & Regards, Naveen Bandaru