jason810496 commented on issue #54479:
URL: https://github.com/apache/airflow/issues/54479#issuecomment-3995757161
Hi @chAnvesh,
May I ask you a few questions to help reproduce the error?
- If you’re using multiple schedulers, could you please share the following
scheduler configuration so I can reproduce the issue?
- How many schedulers and API servers are you using in your deployment?
- The `[scheduler/scheduler_health_check_threshold]` configuration value.
- Which Airflow 2 version was your team using previously that did not
encounter this error?
- What is your long‑running use case doing in the task (e.g., simply
sleeping, polling for external state, performing actual compute work, etc.)?
Related issues/PRs that come to mind:
- https://github.com/apache/airflow/issues/57618
- https://github.com/apache/airflow/issues/58441
- https://github.com/apache/airflow/pull/60855
As a quick workaround before we identify and fix the root cause: if your
task is just sleeping or polling for external state (for example, submitting a
long‑running job to an external compute system), I recommend splitting the
atomic task into two smaller tasks.
One task is only responsible for submitting the job, and another task waits
for the external job to reach a terminal state (success/failure) **using a
Trigger** (the task polling external state second _should_ use Trigger to avoid
the above error behavior).
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]