dabla commented on PR #62922:
URL: https://github.com/apache/airflow/pull/62922#issuecomment-4005706579

   > Thanks for working on this — DTI is an interesting concept and I can see 
the use case. I've gone through the full diff and have a number of concerns, 
some are bugs that would crash at runtime, others are architectural questions 
worth discussing before this goes further.
   > 
   > A few high-level things:
   > 
   > 1. **No tests.** ~700 lines of new production code with zero test 
coverage. We need tests for `IterableOperator`, `TaskExecutor`, 
`MappedTaskInstance`, `HybridExecutor`, `XComIterable`, 
`DecoratedDeferredAsyncOperator`, and the `iterate`/`iterate_kwargs` methods — 
covering success, failure, retry, deferral, and edge cases.
   
   Thanks for pointing this out. As mentioned earlier on Slack, this PR is 
currently intended as an initial draft to demonstrate the concept and gather 
early architectural feedback.
   
   I agree that proper test coverage is essential before this can move forward. 
The plan is to add unit tests covering the components you mentioned 
(IterableOperator, TaskExecutor, MappedTaskInstance, HybridExecutor, 
XComIterable, DecoratedDeferredAsyncOperator, and the iterate/iterate_kwargs 
APIs), including scenarios for success, retries, failures, deferral, and edge 
cases.
   
   Once we converge on the architectural direction, I will add the 
corresponding test suite.
   
   > 2. **Architectural concern.** This builds a mini-executor inside an 
operator — running N tasks in threads with in-memory XCom, custom retry logic, 
and `sleep()`-based retry delays. The scheduler has no visibility into sub-task 
states, so if the worker dies mid-execution there's no record of which 
sub-tasks completed. This feels like it needs broader design discussion 
(probably an AIP) before merging, since it fundamentally changes how task 
execution works.
   
   I agree this is an important architectural concern and worth discussing 
further.
   
   The goal of this prototype is to explore a trade-off between observability 
and scheduling overhead, @ashb and @potiuk mentioned the same remark before. If 
we try to preserve the same visibility and lifecycle guarantees as Dynamic Task 
Mapping, we essentially end up re-implementing DTM semantics, which brings back 
the same scheduler overhead that this approach is trying to avoid.
   
   This proposal intentionally explores a different point in that trade-off 
space: executing iterations within a single task while allowing controlled 
parallelism. That does mean the scheduler has indeed less visibility (but also 
less load) into the internal execution units.
   
   > 3. **Thread safety.** Several shared mutable structures (`context` dict, 
`os.environ`) are accessed concurrently from multiple threads without 
synchronization.
   
   Good point — thread safety needs to be handled carefully here.
   
   Regarding the task context, my understanding is that operators already 
receive a per-task context instance, but you're right that when running 
iterations concurrently we should avoid sharing mutable structures across 
threads. One possible approach would be to create a shallow or deep copy of the 
context for each execution unit to ensure isolation.
   
   If you have concerns about specific structures (e.g., os.environ or others), 
I'm happy to address them and introduce appropriate synchronization or 
isolation mechanisms where needed.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to