[ https://issues.apache.org/jira/browse/FLINK-31104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17695585#comment-17695585 ]
Weijie Guo edited comment on FLINK-31104 at 3/2/23 9:33 AM: ------------------------------------------------------------ Hi all, To be honest, the root cause of this ticket is difficult to investigate because there is no enough context information provided here like thread dump or heap dump. But I have reproduced the problem of TPC-DS timeout on our internal cluster. I guess that is caused by the same reason as this ticket. After some investigations, I found that we do have a bug in {{{}LocalBufferPool{}}}, but this should be due to FLINK-26762, which has been merged into the master since 1.16. To fix this bug, I created FLINK-31293. At the same time, I found that in the batch scenario, we do not need the overdraft buffer actually. I will disable it in FLINK-31288, which should make our TPC-DS test stable. Since the problem was not introduced in 1.17 and can be temporarily fixed through FLINK-31288, I suggest lowering the priority to unblock 1.17 release process. [~renqs] [~mapohl] WDYT? was (Author: weijie guo): Hi all, To be honest, the root cause of this ticket is difficult to investigate because there is no enough context information provided here like thread dump or heap dump. But I have reproduced the problem of TPC-DS timeout on our internal cluster. I guess that is caused by the same reason as this ticket. After some investigations, I found that we do have a bug in {{{}LocalBufferPool{}}}, but this should be due to FLINK-26762, which has been merged into the master since 1.16. To fix this bug, I created FLINK-31293. At the same time, I found that in the batch scenario, we do not need the overdraft buffer actually. I will disable it in FLINK-31288, which should make our TPC-DS stable. Since the problem was not introduced in 1.17 and can be temporarily fixed through FLINK-31288, I suggest lowering the priority to unblock 1.17 release process. [~renqs] [~mapohl] WDYT? > TPC-DS test timed out in query 36 > --------------------------------- > > Key: FLINK-31104 > URL: https://issues.apache.org/jira/browse/FLINK-31104 > Project: Flink > Issue Type: Bug > Components: Table SQL / Runtime, Tests > Affects Versions: 1.17.0 > Reporter: Matthias Pohl > Assignee: Weijie Guo > Priority: Blocker > Labels: test-stability > > There has a timeout happened in > [apache-flink:flink-end-to-end-tests/flink-tpcds-test/tpcds-tool/query/query36.sql|https://github.com/apache/flink/blob/20c983c26262057c4d59bd591aed89969a8ff525/flink-end-to-end-tests/flink-tpcds-test/tpcds-tool/query/query36.sql] > of the TPC-DS test suite: > https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=46202&view=logs&j=6e8542d7-de38-5a33-4aca-458d6c87066d&t=5846934b-7a4f-545b-e5b0-eb4d8bda32e1&l=880 > {code} > [...] > Feb 16 04:58:23 [INFO]Run TPC-DS query 36 ... > Feb 16 04:58:23 Job has been submitted with JobID > 4d0c1e6cbde9f0b6ae8b9f9afd159c06 > {code} > Unfortunately, no further logs are provided. -- This message was sent by Atlassian Jira (v8.20.10#820010)