ferenc-csaky opened a new pull request, #26508:
URL: https://github.com/apache/flink/pull/26508

   ## What is the purpose of the change
   
   A long-running YARN app will eventually fill the local disk with logs, 
unless there is a rolling Log4J strategy is applied to limit the log files to X 
archive with Y size. But if there is a policy to store the logs for months or 
years, throwing away the logs too early is also problematic. YARN is able to 
aggregate specific files for running applications, so this way, it is possible 
to aggregate the rolled over logs to external storage and store it until the 
end of times if that's the reqquirement. :)
   
   ## Brief change log
   
   - Added new optional config options to define `include` and `exclude` regex 
patterns.
   - Wired in these values into YARN's `LogAggregationContext` during cluster 
deployment.
   - Updated docs.
   
   ## Verifying this change
   
   - Added new unit test for the added logic.
   - Existing unit tests guarantee that by default the deployment will work 
exactly as before.
   - Also E2E tested on a YARN cluster, and log aggregation is triggered for 
the given files as expected.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to