mhaure-touze opened a new issue, #55368: URL: https://github.com/apache/airflow/issues/55368
### Apache Airflow Provider(s) cncf-kubernetes, amazon ### Versions of Apache Airflow Providers apache-airflow-providers-amazon==9.12.0 apache-airflow-providers-cncf-kubernetes==10.7.0 ### Apache Airflow version 2.10.3 ### Operating System amazon linux ### Deployment Amazon (AWS) MWAA ### Deployment details EksPodOperator which launch a pod on a EKS cluster v1.32 ### What happened 1. the operator launch the pod 2. the triggerer pause the task as "DEFERRED" 3. the triggerer send a "running" event 4. the pod launch trigger_reentry method 5. some how the task wait for pod completion 6. task stay alive until a hearbeat timeout kill it ``` ip-172-29-129-41.eu-west-1.compute.internal *** Reading remote log from Cloudwatch log_group: airflow-data-eng-mwaa-env-Task log_stream: dag_id=waititng/run_id=manual__2025-09-05T09_23_19.615897+00_00/task_id=waititng/attempt=10.log 2025-09-05T16:22:04.599378194Z 2025-09-05T16:22:04.653984267Z 2025-09-05T16:22:04.654174133Z ... 2025-09-05T23:50:51.065801031Z 2025-09-05T23:50:51.078800975Z 2025-09-05T23:50:51.125942120Z [Invalid date] {local_task_job_runner.py:123} ▶ Pre task execution logs [Invalid date] {base.py:84} INFO - Retrieving connection 'aws_eks_role' [Invalid date] {baseoperator.py:416} WARNING - EksPodOperator.execute cannot be called outside TaskInstance! [Invalid date] {pod.py:1280} INFO - Building pod waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-n48xxaw6 with labels: {'dag_id': 'waititng', 'task_id': 'waititng', 'run_id': 'manual__2025-09-05T092319.6158970000-3503ec696', 'kubernetes_pod_operator': 'True', 'try_number': '10'} [Invalid date] {pod.py:572} INFO - Found matching pod waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5 with labels {'airflow_kpo_in_cluster': 'False', 'airflow_version': '2.10.3', 'component': 'singleuser-server', 'dag_id': 'waititng', 'kubernetes_pod_operator': 'True', 'run_id': 'manual__2025-09-05T092319.6158970000-3503ec696', 'task_id': 'waititng', 'try_number': '7'} [Invalid date] {pod.py:573} INFO - `try_number` of task_instance: 10 [Invalid date] {pod.py:574} INFO - `try_number` of pod: 7 [Invalid date] {pod.py:584} INFO - Reusing existing pod 'waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5' (phase=Running, reason=) since it is not terminated or evicted. [Invalid date] {taskinstance.py:288} INFO - Pausing task as DEFERRED. dag_id=waititng, task_id=waititng, run_id=manual__2025-09-05T09:23:19.615897+00:00, execution_date=20250905T092319, start_date=20250906T001827 [Invalid date] {taskinstance.py:340} ▶ Post task execution logs [Invalid date] {pod.py:146} INFO - Checking pod 'waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5' in namespace 'namespace'. [Invalid date] {triggerer_job_runner.py:631} INFO - Trigger waititng/manual__2025-09-05T09:23:19.615897+00:00/waititng/-1/10 (ID 20) fired: TriggerEvent<{'status': 'running', 'last_log_time': None, 'namespace': 'namespace', 'name': 'waititng-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5', 'eks_cluster_name': 'cluster'}> [Invalid date] {local_task_job_runner.py:123} ▶ Pre task execution logs [Invalid date] {base.py:84} INFO - Retrieving connection 'aws_eks_role' [Invalid date] {pod_manager.py:713} INFO - Pod waiting-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5 has phase Running [Invalid date] {pod_manager.py:713} INFO - Pod waiting-9c5b6348-8893-405e-b769-5f0ffe3ee776-e0pnddo5 has phase Running [Invalid date] {job.py:229} INFO - Heartbeat recovered after 71.80 seconds [Invalid date] {local_task_job_runner.py:266} INFO - Task exited with return code -9. For more information, see https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#LocalTaskJob-killed [Invalid date] {local_task_job_runner.py:245} ▲▲▲ Log group end ``` ### What you think should happen instead I am expecting the task to alternate between a running and deferred state until pod completion/failure - Operator mode is deferrable=true - logging_interval is set to 600 seconds ### How to reproduce ``` import datetime from airflow.decorators import dag from airflow.providers.amazon.aws.operators.eks import EksPodOperator @dag( dag_id="wait", start_date=datetime.datetime(2025, 8, 4), schedule=None, catchup=False, ) def wait() -> None: EksPodOperator( task_id="wait", aws_conn_id="aws_eks_role", cluster_name="cluster, deferrable=True, namespace="namespace", region="eu-west-1", pod_name=f"chromium-{pipeline_config.pipeline_id}", cmds=["/bin/sh", "-c"], arguments=["while true; do echo 'sleeping...'; sleep 2; done"], image="alpine:3.22.1", on_finish_action="delete_pod", poll_interval=60, logging_interval=600, ) wait() ``` ### Anything else _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org