To easily collect what Zhenya is asking for and expedite the troubleshooting, you can do a simple monitoring set up: https://www.gridgain.com/docs/tutorials/management-monitoring/ignite-storage-monitoring
- Denis On Fri, Jul 30, 2021 at 2:32 PM Zhenya Stanilovsky <arzamas...@mail.ru> wrote: > hi, Devakumar J > There is not enough information for analysis. > Do you have any monitoring ? If no — plz enable it and try to understand > how huge cpu consumption and possibly gc pauses correlates with you tasks. > Do you have enough heap (-Xmx param) ? What kind of processes are consume > most heap ? > Without all these info we can`t move forward in analysis. > > thanks ! > > > Hi, > > We have 3server+2client cluster setup. Also we have 2 completely different > clusters for different regions. > > Both has similar set of integrations in terms of SQL queries/ CQ > listeners/ Client connections. > > Also the VM hardware/OS settings also same. > > In cluster 1 through we have disk of 20GB but the cluster performance is > really good and heap usage/CPU usage is optimal. > > In cluster 2 we do have less data only in disk but there is heavy > fluctuations in heap usage and lot FULL GC happening pausing JVM for 7 to 8 > secs every minute. Only restart helps in this case. > > > Only difference noticed between machines is memory page cache utilization. > We have done page cache cleanup and restarted the cluster and page cache > utilization become 105 GB out of 126GB RAM with in a day. > > Please find the metrics below and suggest any debugging steps to carry > out/document to refer. > > > Cluster 1: > > Metrics for local node (to disable set 'metricsLogFrequency' to 0) > ^-- Node [id=27529ecd, name=server-node-3, uptime=2 days, 17:48:09.803] > ^-- H/N/C [hosts=3, nodes=5, CPUs=24] > ^-- CPU [cur=1.33%, avg=3.05%, GC=0%] > ^-- PageMemory [pages=3375870] > ^-- Heap [used=4372MB, free=73.31%, comm=5600MB] > ^-- Off-heap [used=13341MB, free=20.03%, comm=16584MB] > ^-- sysMemPlc region [used=0MB, free=99.99%, comm=100MB] > ^-- metastoreMemPlc region [used=0MB, free=99.85%, comm=0MB] > ^-- TxLog region [used=0MB, free=100%, comm=100MB] > ^-- DefaultRegion region [used=13341MB, free=18.57%, comm=16384MB] > ^-- Ignite persistence [used=20052MB] > ^-- sysMemPlc region [used=0MB] > ^-- metastoreMemPlc region [used=0MB] > ^-- TxLog region [used=0MB] > ^-- DefaultRegion region [used=20052MB] > ^-- Outbound messages queue [size=0] > ^-- Public thread pool [active=0, idle=0, qSize=0] > ^-- System thread pool [active=0, idle=7, qSize=0] > > Cluster 2: > Metrics for local node (to disable set 'metricsLogFrequency' to 0) > ^-- Node [id=5905afb7, name=server-node-1, uptime=2 days, 05:49:04.925] > ^-- H/N/C [hosts=3, nodes=5, CPUs=24] > ^-- CPU [cur=1.23%, avg=6.4%, GC=0%] > ^-- PageMemory [pages=1173731] > ^-- Heap [used=13043MB, free=20.39%, comm=16384MB] > ^-- Off-heap [used=4638MB, free=72.2%, comm=16584MB] > ^-- sysMemPlc region [used=0MB, free=99.99%, comm=100MB] > ^-- metastoreMemPlc region [used=0MB, free=99.91%, comm=0MB] > ^-- TxLog region [used=0MB, free=100%, comm=100MB] > ^-- DefaultRegion region [used=4638MB, free=71.69%, comm=16384MB] > ^-- Ignite persistence [used=5423MB] > ^-- sysMemPlc region [used=0MB] > ^-- metastoreMemPlc region [used=0MB] > ^-- TxLog region [used=0MB] > ^-- DefaultRegion region [used=5422MB] > ^-- Outbound messages queue [size=0] > ^-- Public thread pool [active=0, idle=0, qSize=0] > ^-- System thread pool [active=0, idle=5, qSize=0] > > > Thanks & Regards, > Devakumar J > > > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > Virus-free. > www.avast.com > <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> > > > > > >