GitHub user simonjobs created a discussion: GlueJobOperator in deferred mode
does not include final status details
Hello,
Starting this discussion with the intention to share our use case, current
issues and to get any ideas or inspiration on how to best proceed.
### Background
We are currently using MWAA 2.10.3 to orchestrate among others, Glue jobs. For
this we are using the GlueJobOperator to trigger runs of already defined jobs
with minimal arguments provided.
```
task = GlueJobOperator(
task_id="task-id",
job_name="job-name",
region_name=AWS_DEFAULT_REGION,
deferrable=True,
retries=3
)
```
Key detail to note is that we are using `deferrable=True`, main reason for this
is that we have longer running jobs and sensors and we do not want to reserve
workers for them over longer periods.
### Issue
We are using `on_failure_callback` with a custom implemented function that
extracts the error message from the context of a failed task and posts it as a
card to our Teams channel. `exception = context.get('exception')`
When a glue job fails while the task is in deferred status it will only pick up
that the state has failed and our callback simply extracts "Trigger failure".
This is an issue because in our Teams error notifications we want to
immediately be able to see the high level cause of failure. Currently we would
need to either go to Glue logs directly or via the Airflow logs.
### Possibly solutions
We have considered the following solutions or workarounds
- verbose=True
While this should include all detailed logs in our Airflow tasks we are not
confident that this will actually solve our issue as status check on the final
attempt will still fail. We are also hesitant to enable this as it would
further duplicate our existing logs 1:1.
- Wrap GlueJobOperator and execute_complete function
This could possible be a good solution to modify the behaviour of that final
status check. But we are hesitant to wrap the original operator as that would
complicate further MWAA version upgrades for us.
- Enhance custom callback to include additional get_job_run call based on
job_run_id from context
This is currently our preferred approach with the caveat that the final error
message of the Glue job will not be included in the task logs. But it will be
included in our error notification in Teams.
### Summary
Happy to receive any thoughts or inputs on the described issue. Let me know if
I have missed to describe any essential part.
Also interested to know if this type of behaviour would be encouraged to be
added to the functionality of GlueJobOperator or if this has been a concious
decision to not include.
GitHub link: https://github.com/apache/airflow/discussions/63706
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]