I also agree with the idea that we should go for a name that's more accurate 
and easier to understand. Also, +1 to starting with Airflow 3.

Tbh "heartbeat" itself is an overused term/concept in Airflow. I think we 
already have 6 configurations with "heartbeat" in it, and they're different 
types of heartbeats.

Anyways, I am against this name change: 
scheduler_zombie_task_threshold  -->  scheduler_task_heartbeat_timeout_threshold

We already have scheduler heartbeat, and let's drop the scheduler word from 
this, so that users know that this is Task Instance heartbeat, not scheduler.

I also think we should combine "local_task_job_heartbeat_sec" with 
"scheduler_zombie_task_threshold". That configurations description says that it 
already defaults to zombie task threshold when set to 0. I haven't dug into the 
code to see why they are different, but I really hope our configuration 
documentation doesn't read like below in the future:
"local_task_job_heartbeat_sec: The frequency (in seconds) at which the 
LocalTaskJob should send heartbeat signals to the scheduler to notify it’s 
still alive. If this value is set to 0, the heartbeat interval will default to 
the value of [scheduler] scheduler_task_heartbeat_timeout_threshold." 

Thanks
Shubham

On 2025-02-11, 1:15 PM, "Karen Braganza" <karenbraganz...@gmail.com 
<mailto:karenbraganz...@gmail.com>> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.






Hi,


I have been working on this PR
<https://github.com/apache/airflow/pull/46257> 
<https://github.com/apache/airflow/pull/46257&gt;> to update our documentation
on zombie tasks to reflect the terminology used in the user-facing event
logs in Airflow 2.10+. The event logs use the terminology "heartbeat
timeout" whereas the documentation uses the terminology "zombie tasks". I
would like to update the documentation to focus on the "heartbeat timeout"
terminology so that users are able to find and understand this
documentation easily when they see a "heartbeat timeout" in the event logs.


In the same vein, I think other user-facing configurations should also be
updated to use the same terminology. I am proposing that we make the
following changes to Airflow configuration variables:


scheduler_zombie_task_threshold --> scheduler_task_heartbeat_
timeout_threshold
zombie_detection_interval --> task_heartbeat_timeout_detection_interval


In addition to this, I propose that we also change the logs emitted by the
scheduler to use the "task heartbeat timeout" terminology.


For example, the below logs
<https://github.com/apache/airflow/blob/dea2cc9afc61caf49621c3b1923bcf90e96e17e9/airflow/jobs/scheduler_job_runner.py#L2040>
 
<https://github.com/apache/airflow/blob/dea2cc9afc61caf49621c3b1923bcf90e96e17e9/airflow/jobs/scheduler_job_runner.py#L2040&gt;>
:
self.log.error(
"Detected zombie job: %s "
"(See https://airflow.apache.org/docs/apache-airflow/"; 
<https://airflow.apache.org/docs/apache-airflow/&quot;>
"stable/core-concepts/tasks.html#zombie-tasks)",
request,
)


should become:


self.log.error(
"Detected task heartbeat timeout: %s "
"(See https://airflow.apache.org/docs/apache-airflow/"; 
<https://airflow.apache.org/docs/apache-airflow/&quot;>
"stable/core-concepts/tasks.html#zombie-tasks)",
request,
)


I wanted to start this discussion to get everyone's thoughts on my
proposal. Do you agree (or disagree) that at least all user-facing elements
of Airflow should use the "task heartbeat timeout" terminology instead of
"zombie tasks" for uniformity?


I can add all of these changes to my PR.


Best,
Karen Braganza




<https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#zombie-detection-interval>
 
<https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#zombie-detection-interval&gt;>




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to