[ 
https://issues.apache.org/jira/browse/IGNITE-25008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Korotkov updated IGNITE-25008:
-------------------------------------
    Description: 
In the load ducktest with:
 * 3 server nodes
 * 1 client node
 * 5Gb heap on nodes

Problems for scale=1.0 on client:
 * Query doesn't finish in 5 minutes and was cancelled
 * GC takes almost all the time on the client - only 5.68% of time 
(throughput_%) client does the usefull calculations
 * 764 garbage collector pauses and 99% of pauses were FullGC ones. 
 * Average pause is 566ms, max 1330ms.
 * 16Gb totally allocated before a query was cancelled



The client's gc log along with the summary GC statistics calculated by GCViewer 
are attached for references.

 
|  
|*{color:#000000}args{color}*|*{color:#000000}scale=0.01{color}*|*{color:#000000}scale=0.1{color}*|*{color:#000000}scale=1.0{color}*|
|*{color:#000000}benchmark{color}*|*{color:#000000}metric{color}*|*{color:#000000}value{color}*|*{color:#000000}value{color}*|*{color:#000000}value{color}*|
|{color:#000000}Q05{color}|{color:#000000}client_gc_fullGCPausePc_%_mean{color}|{color:#000000}0.00{color}|{color:#000000}0.00{color}|{color:#000000}99.20{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_inTestPauseAverage_ms_mean{color}|{color:#000000}10.67{color}|{color:#000000}21.12{color}|{color:#000000}566.23{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_inTestPauseCount_-_mean{color}|{color:#000000}1.00{color}|{color:#000000}11.00{color}|{color:#000000}764.00{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_inTestPauseSum_ms_mean{color}|{color:#000000}10.67{color}|{color:#000000}232.33{color}|{color:#000000}432,596.00{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_pauseMax_ms_mean{color}|{color:#000000}16.97{color}|{color:#000000}36.92{color}|{color:#000000}1,330.86{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_throughput_%_mean{color}|{color:#000000}98.63{color}|{color:#000000}97.48{color}|{color:#000000}5.68{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_totalHeapUsedMax_MB_mean{color}|{color:#000000}501.67{color}|{color:#000000}1,014.00{color}|{color:#000000}5,120.00{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_totalHeapUsedMaxpc_%_mean{color}|{color:#000000}9.83{color}|{color:#000000}19.80{color}|{color:#000000}100.00{color}|
|{color:#000000}Q05{color}|{color:#000000}client_heap_inTestAllocatedPerCall_MB/call_mean{color}|{color:#000000}25.60{color}|{color:#000000}1,958.88{color}|{color:#000000}16,798.14{color}|
|{color:#000000}Q05{color}|{color:#000000}client_heap_inTestTotalAllocated_MB_mean{color}|{color:#000000}675.25{color}|{color:#000000}3,917.75{color}|{color:#000000}16,798.14{color}|
|{color:#000000}Q05{color}|{color:#000000}count{color}|{color:#000000}26.67{color}|{color:#000000}2.00{color}|{color:#000000}1.00{color}|
|{color:#000000}Q05{color}|{color:#000000}failed_count{color}|{color:#000000} 
{color}|{color:#000000} {color}|{color:#000000}1.00{color}|

 

***

Just crash with OOM in the Jmh test

https://github.com/apache/ignite/blob/master/modules/benchmarks/src/main/java/org/apache/ignite/internal/benchmarks/jmh/sql/tpch/TpchBenchmark.java

{noformat}
class org.apache.ignite.IgniteException: Unexpected exception
        at 
org.apache.ignite.internal.processors.query.calcite.exec.ExecutionContext.lambda$execute$1(ExecutionContext.java:434)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.task.AbstractQueryTaskExecutor$SecurityAwareTask.run(AbstractQueryTaskExecutor.java:75)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
        at java.base/java.util.PriorityQueue.grow(PriorityQueue.java:305)
        at java.base/java.util.PriorityQueue.offer(PriorityQueue.java:344)
        at java.base/java.util.PriorityQueue.add(PriorityQueue.java:326)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.SortNode.push(SortNode.java:135)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.MergeJoinNode$InnerJoin.join(MergeJoinNode.java:388)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.MergeJoinNode.pushRight(MergeJoinNode.java:176)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.MergeJoinNode$2.push(MergeJoinNode.java:134)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.SortNode.flush(SortNode.java:197)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.SortNode$$Lambda$2708/0x0000000840d12c40.runx(Unknown
 Source)
        at org.apache.ignite.internal.util.lang.RunnableX.run(RunnableX.java:37)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.ExecutionContext.lambda$execute$1(ExecutionContext.java:429)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.ExecutionContext$$Lambda$2484/0x0000000840c60840.run(Unknown
 Source)
        ... 4 more
{noformat}
 

  was:
In the load ducktest with:
 * 3 server nodes
 * 1 client node
 * 5Gb heap on nodes

Problems for scale=1.0 on client:
 * Query doesn't finish in 5 minutes and was cancelled
 * GC takes almost all the time on the client - only 5.68% of time 
(throughput_%) client does the usefull calculations
 * 764 garbage collector pauses and 99% of pauses were FullGC ones. 
 * Average pause is 566ms, max 1330ms.
 * 16Gb totally allocated before a query was cancelled


The client's gc log along with the summary GC statistics calculated by GCViewer 
is attached for references.

 
|  
|*{color:#000000}args{color}*|*{color:#000000}scale=0.01{color}*|*{color:#000000}scale=0.1{color}*|*{color:#000000}scale=1.0{color}*|
|*{color:#000000}benchmark{color}*|*{color:#000000}metric{color}*|*{color:#000000}value{color}*|*{color:#000000}value{color}*|*{color:#000000}value{color}*|
|{color:#000000}Q05{color}|{color:#000000}client_gc_fullGCPausePc_%_mean{color}|{color:#000000}0.00{color}|{color:#000000}0.00{color}|{color:#000000}99.20{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_inTestPauseAverage_ms_mean{color}|{color:#000000}10.67{color}|{color:#000000}21.12{color}|{color:#000000}566.23{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_inTestPauseCount_-_mean{color}|{color:#000000}1.00{color}|{color:#000000}11.00{color}|{color:#000000}764.00{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_inTestPauseSum_ms_mean{color}|{color:#000000}10.67{color}|{color:#000000}232.33{color}|{color:#000000}432,596.00{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_pauseMax_ms_mean{color}|{color:#000000}16.97{color}|{color:#000000}36.92{color}|{color:#000000}1,330.86{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_throughput_%_mean{color}|{color:#000000}98.63{color}|{color:#000000}97.48{color}|{color:#000000}5.68{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_totalHeapUsedMax_MB_mean{color}|{color:#000000}501.67{color}|{color:#000000}1,014.00{color}|{color:#000000}5,120.00{color}|
|{color:#000000}Q05{color}|{color:#000000}client_gc_totalHeapUsedMaxpc_%_mean{color}|{color:#000000}9.83{color}|{color:#000000}19.80{color}|{color:#000000}100.00{color}|
|{color:#000000}Q05{color}|{color:#000000}client_heap_inTestAllocatedPerCall_MB/call_mean{color}|{color:#000000}25.60{color}|{color:#000000}1,958.88{color}|{color:#000000}16,798.14{color}|
|{color:#000000}Q05{color}|{color:#000000}client_heap_inTestTotalAllocated_MB_mean{color}|{color:#000000}675.25{color}|{color:#000000}3,917.75{color}|{color:#000000}16,798.14{color}|
|{color:#000000}Q05{color}|{color:#000000}count{color}|{color:#000000}26.67{color}|{color:#000000}2.00{color}|{color:#000000}1.00{color}|
|{color:#000000}Q05{color}|{color:#000000}failed_count{color}|{color:#000000} 
{color}|{color:#000000} {color}|{color:#000000}1.00{color}|

 

***

Just crash with OOM in the Jmh test

https://github.com/apache/ignite/blob/master/modules/benchmarks/src/main/java/org/apache/ignite/internal/benchmarks/jmh/sql/tpch/TpchBenchmark.java

{noformat}
class org.apache.ignite.IgniteException: Unexpected exception
        at 
org.apache.ignite.internal.processors.query.calcite.exec.ExecutionContext.lambda$execute$1(ExecutionContext.java:434)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.task.AbstractQueryTaskExecutor$SecurityAwareTask.run(AbstractQueryTaskExecutor.java:75)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
        at java.base/java.util.PriorityQueue.grow(PriorityQueue.java:305)
        at java.base/java.util.PriorityQueue.offer(PriorityQueue.java:344)
        at java.base/java.util.PriorityQueue.add(PriorityQueue.java:326)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.SortNode.push(SortNode.java:135)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.MergeJoinNode$InnerJoin.join(MergeJoinNode.java:388)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.MergeJoinNode.pushRight(MergeJoinNode.java:176)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.MergeJoinNode$2.push(MergeJoinNode.java:134)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.SortNode.flush(SortNode.java:197)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.rel.SortNode$$Lambda$2708/0x0000000840d12c40.runx(Unknown
 Source)
        at org.apache.ignite.internal.util.lang.RunnableX.run(RunnableX.java:37)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.ExecutionContext.lambda$execute$1(ExecutionContext.java:429)
        at 
org.apache.ignite.internal.processors.query.calcite.exec.ExecutionContext$$Lambda$2484/0x0000000840c60840.run(Unknown
 Source)
        ... 4 more
{noformat}
 


> Calcite. TPC-H query #5: scale=1.0: too long execTime, high heap usage, full 
> GC pauses
> --------------------------------------------------------------------------------------
>
>                 Key: IGNITE-25008
>                 URL: https://issues.apache.org/jira/browse/IGNITE-25008
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Sergey Korotkov
>            Priority: Major
>              Labels: ise, tpch
>         Attachments: gc.log, gc.log-summary.csv, gc.log-summary.json
>
>
> In the load ducktest with:
>  * 3 server nodes
>  * 1 client node
>  * 5Gb heap on nodes
> Problems for scale=1.0 on client:
>  * Query doesn't finish in 5 minutes and was cancelled
>  * GC takes almost all the time on the client - only 5.68% of time 
> (throughput_%) client does the usefull calculations
>  * 764 garbage collector pauses and 99% of pauses were FullGC ones. 
>  * Average pause is 566ms, max 1330ms.
>  * 16Gb totally allocated before a query was cancelled
> The client's gc log along with the summary GC statistics calculated by 
> GCViewer are attached for references.
>  
> |  
> |*{color:#000000}args{color}*|*{color:#000000}scale=0.01{color}*|*{color:#000000}scale=0.1{color}*|*{color:#000000}scale=1.0{color}*|
> |*{color:#000000}benchmark{color}*|*{color:#000000}metric{color}*|*{color:#000000}value{color}*|*{color:#000000}value{color}*|*{color:#000000}value{color}*|
> |{color:#000000}Q05{color}|{color:#000000}client_gc_fullGCPausePc_%_mean{color}|{color:#000000}0.00{color}|{color:#000000}0.00{color}|{color:#000000}99.20{color}|
> |{color:#000000}Q05{color}|{color:#000000}client_gc_inTestPauseAverage_ms_mean{color}|{color:#000000}10.67{color}|{color:#000000}21.12{color}|{color:#000000}566.23{color}|
> |{color:#000000}Q05{color}|{color:#000000}client_gc_inTestPauseCount_-_mean{color}|{color:#000000}1.00{color}|{color:#000000}11.00{color}|{color:#000000}764.00{color}|
> |{color:#000000}Q05{color}|{color:#000000}client_gc_inTestPauseSum_ms_mean{color}|{color:#000000}10.67{color}|{color:#000000}232.33{color}|{color:#000000}432,596.00{color}|
> |{color:#000000}Q05{color}|{color:#000000}client_gc_pauseMax_ms_mean{color}|{color:#000000}16.97{color}|{color:#000000}36.92{color}|{color:#000000}1,330.86{color}|
> |{color:#000000}Q05{color}|{color:#000000}client_gc_throughput_%_mean{color}|{color:#000000}98.63{color}|{color:#000000}97.48{color}|{color:#000000}5.68{color}|
> |{color:#000000}Q05{color}|{color:#000000}client_gc_totalHeapUsedMax_MB_mean{color}|{color:#000000}501.67{color}|{color:#000000}1,014.00{color}|{color:#000000}5,120.00{color}|
> |{color:#000000}Q05{color}|{color:#000000}client_gc_totalHeapUsedMaxpc_%_mean{color}|{color:#000000}9.83{color}|{color:#000000}19.80{color}|{color:#000000}100.00{color}|
> |{color:#000000}Q05{color}|{color:#000000}client_heap_inTestAllocatedPerCall_MB/call_mean{color}|{color:#000000}25.60{color}|{color:#000000}1,958.88{color}|{color:#000000}16,798.14{color}|
> |{color:#000000}Q05{color}|{color:#000000}client_heap_inTestTotalAllocated_MB_mean{color}|{color:#000000}675.25{color}|{color:#000000}3,917.75{color}|{color:#000000}16,798.14{color}|
> |{color:#000000}Q05{color}|{color:#000000}count{color}|{color:#000000}26.67{color}|{color:#000000}2.00{color}|{color:#000000}1.00{color}|
> |{color:#000000}Q05{color}|{color:#000000}failed_count{color}|{color:#000000} 
> {color}|{color:#000000} {color}|{color:#000000}1.00{color}|
>  
> ***
> Just crash with OOM in the Jmh test
> https://github.com/apache/ignite/blob/master/modules/benchmarks/src/main/java/org/apache/ignite/internal/benchmarks/jmh/sql/tpch/TpchBenchmark.java
> {noformat}
> class org.apache.ignite.IgniteException: Unexpected exception
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.ExecutionContext.lambda$execute$1(ExecutionContext.java:434)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.task.AbstractQueryTaskExecutor$SecurityAwareTask.run(AbstractQueryTaskExecutor.java:75)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.OutOfMemoryError: Java heap space
>       at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
>       at java.base/java.util.PriorityQueue.grow(PriorityQueue.java:305)
>       at java.base/java.util.PriorityQueue.offer(PriorityQueue.java:344)
>       at java.base/java.util.PriorityQueue.add(PriorityQueue.java:326)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.SortNode.push(SortNode.java:135)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.MergeJoinNode$InnerJoin.join(MergeJoinNode.java:388)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.MergeJoinNode.pushRight(MergeJoinNode.java:176)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.MergeJoinNode$2.push(MergeJoinNode.java:134)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.SortNode.flush(SortNode.java:197)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.rel.SortNode$$Lambda$2708/0x0000000840d12c40.runx(Unknown
>  Source)
>       at org.apache.ignite.internal.util.lang.RunnableX.run(RunnableX.java:37)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.ExecutionContext.lambda$execute$1(ExecutionContext.java:429)
>       at 
> org.apache.ignite.internal.processors.query.calcite.exec.ExecutionContext$$Lambda$2484/0x0000000840c60840.run(Unknown
>  Source)
>       ... 4 more
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to