ferenc-csaky opened a new pull request, #26508: URL: https://github.com/apache/flink/pull/26508
## What is the purpose of the change A long-running YARN app will eventually fill the local disk with logs, unless there is a rolling Log4J strategy is applied to limit the log files to X archive with Y size. But if there is a policy to store the logs for months or years, throwing away the logs too early is also problematic. YARN is able to aggregate specific files for running applications, so this way, it is possible to aggregate the rolled over logs to external storage and store it until the end of times if that's the reqquirement. :) ## Brief change log - Added new optional config options to define `include` and `exclude` regex patterns. - Wired in these values into YARN's `LogAggregationContext` during cluster deployment. - Updated docs. ## Verifying this change - Added new unit test for the added logic. - Existing unit tests guarantee that by default the deployment will work exactly as before. - Also E2E tested on a YARN cluster, and log aggregation is triggered for the given files as expected. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? yes - If yes, how is the feature documented? docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org