sam-dumont opened a new pull request, #62985:
URL: https://github.com/apache/airflow/pull/62985

   <!-- SPDX-License-Identifier: Apache-2.0
         https://www.apache.org/licenses/LICENSE-2.0 -->
   
   Fix `CloudwatchTaskHandler` not deleting local log files after streaming to 
CloudWatch.
   
   In Airflow 3, logs stream to CloudWatch in real-time via structlog 
processors, so `upload()` was a no-op. But `delete_local_copy` was never 
honoured: local log files kept accumulating on shared storage indefinitely.
   
   We hit this in production on EFS. Storage grew at ~31 GB/day from log 
accumulation alone. After deploying this fix, net growth dropped to effectively 
zero. A separate cleanup job removed the ~780 GB backlog that had built up.
   
   **What changed:**
   
   - `CloudWatchRemoteLogIO.upload()` deletes the local log parent directory 
when `delete_local_copy` is True
   - `CloudwatchTaskHandler.close()` calls `upload()` with the rendered log 
path stored during `set_context()`
   - Path traversal guard: `upload()` refuses to delete anything outside 
`base_log_folder`
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (Claude Opus 4.6)
   
   Generated-by: Claude Code (Claude Opus 4.6) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to