hi, Devakumar J
There is not enough information for analysis.
Do you have any monitoring ? If no — plz enable it and try to understand how
huge cpu consumption and possibly gc pauses correlates with you tasks.
Do you have enough heap (-Xmx param) ? What kind of processes are consume most
heap ?
Without all these info we can`t move forward in analysis.
thanks !
>Hi,
>
>We have 3server+2client cluster setup. Also we have 2 completely different
>clusters for different regions.
>
>Both has similar set of integrations in terms of SQL queries/ CQ listeners/
>Client connections.
>
>Also the VM hardware/OS settings also same.
>
>In cluster 1 through we have disk of 20GB but the cluster performance is
>really good and heap usage/CPU usage is optimal.
>
>In cluster 2 we do have less data only in disk but there is heavy fluctuations
>in heap usage and lot FULL GC happening pausing JVM for 7 to 8 secs every
>minute. Only restart helps in this case.
>
>
>Only difference noticed between machines is memory page cache utilization. We
>have done page cache cleanup and restarted the cluster and page cache
>utilization become 105 GB out of 126GB RAM with in a day.
>
>Please find the metrics below and suggest any debugging steps to carry
>out/document to refer.
>
>
>Cluster 1:
>
>Metrics for local node (to disable set 'metricsLogFrequency' to 0)
> ^-- Node [id=27529ecd, name=server-node-3, uptime=2 days, 17:48:09.803]
> ^-- H/N/C [hosts=3, nodes=5, CPUs=24]
> ^-- CPU [cur=1.33%, avg=3.05%, GC=0%]
> ^-- PageMemory [pages=3375870]
> ^-- Heap [used=4372MB, free=73.31%, comm=5600MB]
> ^-- Off-heap [used=13341MB, free=20.03%, comm=16584MB]
> ^-- sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
> ^-- metastoreMemPlc region [used=0MB, free=99.85%, comm=0MB]
> ^-- TxLog region [used=0MB, free=100%, comm=100MB]
> ^-- DefaultRegion region [used=13341MB, free=18.57%, comm=16384MB]
> ^-- Ignite persistence [used=20052MB]
> ^-- sysMemPlc region [used=0MB]
> ^-- metastoreMemPlc region [used=0MB]
> ^-- TxLog region [used=0MB]
> ^-- DefaultRegion region [used=20052MB]
> ^-- Outbound messages queue [size=0]
> ^-- Public thread pool [active=0, idle=0, qSize=0]
> ^-- System thread pool [active=0, idle=7, qSize=0]
>
>Cluster 2:
>Metrics for local node (to disable set 'metricsLogFrequency' to 0)
> ^-- Node [id=5905afb7, name=server-node-1, uptime=2 days, 05:49:04.925]
> ^-- H/N/C [hosts=3, nodes=5, CPUs=24]
> ^-- CPU [cur=1.23%, avg=6.4%, GC=0%]
> ^-- PageMemory [pages=1173731]
> ^-- Heap [used=13043MB, free=20.39%, comm=16384MB]
> ^-- Off-heap [used=4638MB, free=72.2%, comm=16584MB]
> ^-- sysMemPlc region [used=0MB, free=99.99%, comm=100MB]
> ^-- metastoreMemPlc region [used=0MB, free=99.91%, comm=0MB]
> ^-- TxLog region [used=0MB, free=100%, comm=100MB]
> ^-- DefaultRegion region [used=4638MB, free=71.69%, comm=16384MB]
> ^-- Ignite persistence [used=5423MB]
> ^-- sysMemPlc region [used=0MB]
> ^-- metastoreMemPlc region [used=0MB]
> ^-- TxLog region [used=0MB]
> ^-- DefaultRegion region [used=5422MB]
> ^-- Outbound messages queue [size=0]
> ^-- Public thread pool [active=0, idle=0, qSize=0]
> ^-- System thread pool [active=0, idle=5, qSize=0]
>
> Thanks & Regards ,
>Devakumar J
>
>Virus-free. www.avast.com