[ https://issues.apache.org/jira/browse/FLINK-24169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653560#comment-17653560 ]
Gabor Somogyi commented on FLINK-24169: --------------------------------------- YARN tests are behaving differently on local vs CI. The log file creation and search is both implemented w/ relative directories which makes all those tests flaky. When log files are generated in a different directory which not found by the tests then it times out. I think it would be good to use full path instead of relative to make the tests stable. > Flaky local YARN tests relying on log files > ------------------------------------------- > > Key: FLINK-24169 > URL: https://issues.apache.org/jira/browse/FLINK-24169 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN > Reporter: Matthias Pohl > Assignee: Zsombor Chikán > Priority: Major > Labels: pull-request-available, stale-assigned, test-stability > Fix For: 1.17.0 > > > While working on [PR #16989|https://github.com/apache/flink/pull/16989] for > FLINK-23611, we experienced some flakiness when running > {{YARNSessionCapacitySchedulerITCase.testDetachedPerJobYarnCluster}} locally. > [~dmvk] discovered a bug in log4j (see > [LOG4J2-3155|https://issues.apache.org/jira/browse/LOG4J2-3155]). The bug > affects the test because they check the log files for specific log messages. > The log messages ends up in the wrong log file if the rolling update > mechanism is trigger. This does not seem to be an issue on AzureCI due to the > slower hardware used for the worker machines. > A solution to overcome this issue would be to add a custom log4j > configuration that disables the {{appender.main.policies.startup.type = > OnStartupTriggeringPolicy}} which is present in {{flink-dist}}'s log4j > configuration. -- This message was sent by Atlassian Jira (v8.20.10#820010)