shivaam opened a new pull request, #64085:
URL: https://github.com/apache/airflow/pull/64085

   When a deferred AWS task hits a terminal failure state, `async_wait()` raises
   `AirflowException` with the error details. But `AwsBaseWaiterTrigger.run()` 
did
   not catch it — the exception propagated to the triggerer framework which 
replaced
   it with a generic "Trigger failure" message. `execute_complete()` was never 
called,
   so operators and `on_failure_callback` lost all error context.
   
   **Fix:** Catch `AirflowException` in `run()` and yield
   `TriggerEvent(status="error", message=str(e))`, routing failures through
   `execute_complete()` where operators handle non-success events.
   
   Also adds `"JobRun.ErrorMessage"` to `GlueJobCompleteTrigger.status_queries` 
so the
   actual Glue error text is included alongside `JobRunState`.
   
   related: https://github.com/apache/airflow/discussions/63706
   
   ## Tested on EC2 with real Glue job
   
   Ran a Glue job designed to fail with `deferrable=True`. Before the fix,
   `on_failure_callback` received `TaskDeferralError("Trigger failure")`. After:
   
   ```
   AirflowException: Error in glue job: {
     'status': 'error',
     'message': 'AWS Glue job failed.: FAILED - RuntimeError: GLUE_TEST: 
Intentional failure
       to test error propagation in deferrable mode\nWaiter job_complete 
failed: Waiter
       encountered a terminal failure state: For expression 
"JobRun.JobRunState" we matched
       expected path: "FAILED"',
     'run_id': 
'jr_a485d84c34f952e1c8d9d3199455d2d6eda07529034ceb7e1b554514d367f417'
   }
   ```
   
   `execute_complete()` was called, error details preserved, 
`on_failure_callback` received
   the actual Glue error.
   
   ## Operator compatibility audit — feedback requested
   
   This change means `execute_complete()` will now be called with 
`status="error"` events
   where previously it was never called on failure. I audited all 59 
`execute_complete`
   methods across AWS operators and sensors:
   
   - **47 are safe** — use `validated_event["status"] != "success"` with raise
   - **4 not affected** — their triggers override `run()` with own error 
handling
   - **8 have missing error handling** — would silently succeed on error events:
   
   | Operator | Issue |
   |----------|-------|
   | 
[DmsDeleteReplicationConfigOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/dms.py#L512)
 | No status check |
   | 
[DmsStartReplicationOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/dms.py#L707)
 | No status check |
   | 
[DmsStopReplicationOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/dms.py#L800)
 | No status check |
   | 
[NeptuneStartDbClusterOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/neptune.py#L189)
 | Reads status but never validates |
   | 
[NeptuneStopDbClusterOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/neptune.py#L316)
 | Reads status but never validates |
   | 
[EmrServerlessStopApplicationOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py#L1636)
 | `== "success"` with no else/raise |
   | 
[EmrServerlessDeleteApplicationOperator](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py#L1743)
 | `== "success"` with no else/raise |
   | 
[MwaaTaskSensor](https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/sensors/mwaa.py#L283)
 | `return None` unconditionally |
   
   **These are pre-existing bugs** (they never validated the event status), but 
merging the
   base trigger fix without addressing them would change their failure mode 
from "Trigger
   failure" crash to silent success — which is worse.
   
   **Reviewers:** Does the `TriggerEvent(status="error")` approach look right? 
If so, I'll
   add commits to fix all 8 operators in this PR before merge.
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (Claude Opus 4.6)
   
   Generated-by: Claude Code (Claude Opus 4.6) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to