[ 
https://issues.apache.org/jira/browse/FLINK-31104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695585#comment-17695585
 ] 

Weijie Guo edited comment on FLINK-31104 at 3/2/23 9:33 AM:
------------------------------------------------------------

Hi all,

To be honest, the root cause of this ticket is difficult to investigate because 
there is no enough context information provided here like thread dump or heap 
dump. But I have reproduced the problem of TPC-DS timeout on our internal 
cluster. I guess that is caused by the same reason as this ticket.

After some investigations, I found that we do have a bug in 
{{{}LocalBufferPool{}}}, but this should be due to FLINK-26762, which has been 
merged into the master since 1.16.

To fix this bug, I created FLINK-31293. At the same time, I found that in the 
batch scenario, we do not need the overdraft buffer actually. I will disable it 
in FLINK-31288, which should make our TPC-DS test stable.

Since the problem was not introduced in 1.17 and can be temporarily fixed 
through FLINK-31288, I suggest lowering the priority to unblock 1.17 release 
process. [~renqs] [~mapohl] WDYT?


was (Author: weijie guo):
Hi all,

To be honest, the root cause of this ticket is difficult to investigate because 
there is no enough context information provided here like thread dump or heap 
dump. But I have reproduced the problem of TPC-DS timeout on our internal 
cluster. I guess that is caused by the same reason as this ticket.

After some investigations, I found that we do have a bug in 
{{{}LocalBufferPool{}}}, but this should be due to FLINK-26762, which has been 
merged into the master since 1.16.

To fix this bug, I created FLINK-31293. At the same time, I found that in the 
batch scenario, we do not need the overdraft buffer actually. I will disable it 
in FLINK-31288, which should make our TPC-DS stable.

Since the problem was not introduced in 1.17 and can be temporarily fixed 
through FLINK-31288, I suggest lowering the priority to unblock 1.17 release 
process. [~renqs] [~mapohl] WDYT?

> TPC-DS test timed out in query 36
> ---------------------------------
>
>                 Key: FLINK-31104
>                 URL: https://issues.apache.org/jira/browse/FLINK-31104
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Runtime, Tests
>    Affects Versions: 1.17.0
>            Reporter: Matthias Pohl
>            Assignee: Weijie Guo
>            Priority: Blocker
>              Labels: test-stability
>
> There has a timeout happened in 
> [apache-flink:flink-end-to-end-tests/flink-tpcds-test/tpcds-tool/query/query36.sql|https://github.com/apache/flink/blob/20c983c26262057c4d59bd591aed89969a8ff525/flink-end-to-end-tests/flink-tpcds-test/tpcds-tool/query/query36.sql]
>  of the TPC-DS test suite:
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46202&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1&l=880
> {code}
> [...]
> Feb 16 04:58:23 [INFO]Run TPC-DS query 36 ...
> Feb 16 04:58:23 Job has been submitted with JobID 
> 4d0c1e6cbde9f0b6ae8b9f9afd159c06
> {code}
> Unfortunately, no further logs are provided.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to