Seems reasonable to me, having both `scheduler` and `schedulers` is a bit
odd, but I see the reasoning for back compat. We can eventually deprecate
`scheduler`.

Is there any way we can get some executor state returned in this new data?
If we're expanding it anyway?

Also you tagged this as a [DISCUSS] thread in the subject, but also
proposed a lazy consensus, let's maybe discuss it a bit and then propose a
lazy consensus in another email thread.

Cheers,
Niko

On Wed, Jun 24, 2026 at 5:06 PM Jung-Hyun Kim <[email protected]>
wrote:

> The Problem
> In distributed Airflow environments running multiple schedulers, the
> current health endpoint contains a significant monitoring blind spot.
> Currently, the health check determines the status of the scheduler by
> querying the metadata database using the most_recent_job method found in
> job.py:
>
> @provide_session
> def most_recent_job(job_type: str, *, session: Session = NEW_SESSION) ->
> Job | None:
>     """
>     Return the most recent job of this type, if any, based on last
> heartbeat received.
>
>     Jobs in "running" state take precedence over others to make sure alive
>     job is returned if it is available.
>
>     :param job_type: job type to query for to get the most recent job for
>     :param session: Database session
>     :end_date: None
>     """
>     return session.scalar(
>         select(Job)
>         .where(Job.job_type == job_type)
>         .order_by(
>             # Put "running" jobs at the front.
>             case({JobState.RUNNING: 0}, value=Job.state, else_=1),
>             Job.latest_heartbeat.desc(),
>         )
>         .limit(1)
>     )
>
>
> This database query explicitly sorts records by the RUNNING state and
> applies .limit(1), returning only a single, absolute newest job record.
> This result is then processed in airflow_health.py via the
> get_airflow_health endpoint method:
>
> def get_airflow_health() -> dict[str, Any]:
>     """Get the health for Airflow metadatabase, scheduler and triggerer."""
>     metadatabase_status = HEALTHY
>     latest_scheduler_heartbeat = None
>     latest_triggerer_heartbeat = None
>     latest_dag_processor_heartbeat = None
>
>     scheduler_status = UNHEALTHY
>     triggerer_status: str | None = UNHEALTHY
>     dag_processor_status: str | None = UNHEALTHY
>
>     try:
>         latest_scheduler_job = SchedulerJobRunner.most_recent_job()
>
>         if latest_scheduler_job:
>             if latest_scheduler_job.latest_heartbeat:
>                 latest_scheduler_heartbeat =
> latest_scheduler_job.latest_heartbeat.isoformat()
>             if latest_scheduler_job.is_alive():
>                 scheduler_status = HEALTHY
>     except Exception:
>         metadatabase_status = UNHEALTHY
>
>
> Because the health endpoint evaluates only the single job returned by
> most_recent_job(), the check can only ever validate the health of one
> scheduler at a time.
> In a distributed deployment with multiple active schedulers, if even one
> instance is running cleanly, the endpoint will flag as healthy even if all
> other parallel scheduler instances have gone down.
> To get meaningful information regarding the scheduler status from the
> health endpoint it is worth it to monitor every scheduler in the
> distributed environment instead of just a single scheduler.
> The Proposed Solution
> To deal with this problem we can add a new field called schedulers (plural
> for multiple schedulers) in the health endpoint that returns a 3-tier
> aggregated status that covers the following:
>
>   *
> HEALTHY: All registered scheduler instances are fully operational and
> actively heartbeating.
>   *
> DEGRADED: At least one scheduler instance is down or failing, but at least
> one remaining instance is still working.
>   *
> DOWN: All scheduler instances have failed or stopped working.
>
> Per-Instance Diagnostic Breakdown
> We should also add a per instance breakdown as a nested list that will
> show the following:
>
>   1.
> hostname
>   2.
> status: Individual status
>   3.
> latest_heartbeat
>
> Example
>
> {
>   "metadatabase": {
>     "status": "healthy"
>   },
>   "scheduler": {
>     "scheduler_status": "healthy",
>     "latest_scheduler_heartbeat": "2026-06-24T23:15:02+00:00"
>   },
>   "schedulers": {
>     "status": "DEGRADED",
>     "instances": [
>       {
>         "hostname": "scheduler-ha-instance-1",
>         "status": "HEALTHY",
>         "latest_heartbeat": "2026-06-24T23:15:02+00:00"
>       },
>       {
>         "hostname": "scheduler-ha-instance-2",
>         "status": "DOWN",
>         "latest_heartbeat": "2026-06-24T23:10:14+00:00"
>       },
>       {
>         "hostname": "scheduler-ha-instance-3",
>         "status": "HEALTHY",
>         "latest_heartbeat": "2026-06-24T23:14:59+00:00"
>       }
>     ]
>   }
> }
>
> Could end up looking roughly like this, resulting in a more meaningful
> health endpoint that will make it easier to diagnose issues with the
> scheduler. This is a LAZY CONSENSUS proposal.
>

Reply via email to