Hi,

I have been working on this PR
<https://github.com/apache/airflow/pull/46257> to update our documentation
on zombie tasks to reflect the terminology used in the user-facing event
logs in Airflow 2.10+. The event logs use the terminology "heartbeat
timeout" whereas the documentation uses the terminology "zombie tasks". I
would like to update the documentation to focus on the "heartbeat timeout"
terminology so that users are able to find and understand this
documentation easily when they see a "heartbeat timeout" in the event logs.

In the same vein, I think other user-facing configurations should also be
updated to use the same terminology. I am proposing that we make the
following changes to Airflow configuration variables:

scheduler_zombie_task_threshold  -->  scheduler_task_heartbeat_
timeout_threshold
zombie_detection_interval --> task_heartbeat_timeout_detection_interval

In addition to this, I propose that we also change the logs emitted by the
scheduler to use the "task heartbeat timeout" terminology.

For example, the below logs
<https://github.com/apache/airflow/blob/dea2cc9afc61caf49621c3b1923bcf90e96e17e9/airflow/jobs/scheduler_job_runner.py#L2040>
:
self.log.error(
                "Detected zombie job: %s "
                "(See https://airflow.apache.org/docs/apache-airflow/";
                "stable/core-concepts/tasks.html#zombie-tasks)",
                request,
            )

should become:

self.log.error(
                "Detected task heartbeat timeout: %s "
                "(See https://airflow.apache.org/docs/apache-airflow/";
                "stable/core-concepts/tasks.html#zombie-tasks)",
                request,
            )

I wanted to start this discussion to get everyone's thoughts on my
proposal. Do you agree (or disagree) that at least all user-facing elements
of Airflow should use the "task heartbeat timeout" terminology instead of
"zombie tasks" for uniformity?

I can add all of these changes to my PR.

Best,
Karen Braganza


<https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#zombie-detection-interval>

Reply via email to