> Good points but I think at a surface level, every SDK can be compared on principles / general goals. And that's it. Differences start coming in when you delve into any details.
Yep. That's my point as well. I would not single out Spark for that at all, simply :) On Tue, Nov 25, 2025 at 8:04 AM Amogh Desai <[email protected]> wrote: > Good points but I think at a surface level, every SDK can be compared on > principles / general > goals. And that's it. Differences start coming in when you delve into any > details. > > > Thanks & Regards, > Amogh Desai > > > On Sun, Nov 23, 2025 at 10:29 PM Jarek Potiuk <[email protected]> wrote: > > > Just to correct the two statements as I see that my autocompletion failed > > me. What I mean was the "workload" not "workflows" in the context of both > > SDKs > > > > * In Spark the "server" basically executes the workloads submitted by > the > > client - because Spark Server is a *workload* execution engine > > * In Airflow the "server" is the one that submits *workloads* to be > > executed by the client - because Airflow Server is an orchestrator that > > tells their workers what to do (and often those tasks delegate that tasks > > to other servers like Spark, which means that the worker often acts as a > > client for both -> orchestrating engine of Airflow, and execution engine > > (for example Spark). > > > > On Sun, Nov 23, 2025 at 5:54 PM Jarek Potiuk <[email protected]> wrote: > > > > > And just to add - I would not draw too many analogies specifically for > > > Spark SDK <> Airflow except the underlying "general IT design > > principles" - > > > which are sound regardless of particular SDK implementation. > > > > > > Generally the principles that we follow in Task SDK apply to literally > > > **any** SDK where you want to define a stable API for whoever is the > > user. > > > Also it follows the principles of basically **any** SDK where you > utilise > > > client-server architecture and you want to make use of the > > > HTTP/Security/Proxy architecture and fine-grained permissions granted > for > > > the client. Those are just "common practices" we implemented with Task > > SDK. > > > There was no particular inspiration or parallel to Spark SDK, > > > > > > Moreover, there are significant differences vs Spark on the "logic" > > level > > > of the APIS and I would definitely not compare the two, because people > > > might be misguided if you say "it's like Spark SDK". Spark SDK and > > Airflow > > > Task SDK are fundamentally different (on a logic level). > > > > > > While the underlying technologies and principles (decoupling of the > > client > > > from server code, very clear boundary and "SDK" that exposes only > what's > > > really possible to be done with the client) - there are fundamental > > > differences of what we do in Airflow. > > > > > > The main stark difference is the direction of workload submission and > > > execution that are basically 100% reversed Spark vs. Airflow. > > > > > > * In Spark the "server" basically executes the workloads submitted by > the > > > client - because Spark Server is a workflow execution engine > > > * In Airflow the "server" is the one that submits workflows to be > > executed > > > by the client - because Airflow Server is an orchestrator that tells > > their > > > workers what to do (and often those tasks delegate that tasks to other > > > servers like Spark, which means that the worker often acts as a client > > for > > > both -> orchestrating engine of Airflow, and execution engine (for > > example > > > Spark). > > > > > > This is 100% reverse of control - even if the underlying low-level > > > principles (isolation, decoupling, HTTP communication, security) are > > > similar - but mostly because they just make sense in general > engineering, > > > not because those two SDKs do things in a similar way. They don't. > > > > > > J. > > > > > > > > > On Sun, Nov 23, 2025 at 10:57 AM Amogh Desai <[email protected]> > > > wrote: > > > > > >> Answering as one of the Airflow developers contributing to the task > SDK. > > >> > > >> Q1: If Engine = Execution and API = Server side, the analogy is > > >> comparable. > > >> The goal of task SDK is to decouple > > >> Dag authoring from Airflow internals and providing a version agnostic > > >> stable interface for writing Dags. > > >> > > >> Q2: Yes, that's the intention. Custom executor's might require a some > > >> adaptation while adopting AF3 the first time > > >> because Airflow 3, deals in *workloads > > >> < > > >> > > > https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html#workloads > > >> > > > >> *vs CLI commands in < 3.0 > > >> > > >> Q3: You can compare / draw relations provided that the comparison is > in > > >> context to the server-client separation and future > > >> proofing consumers from internal changes. > > >> > > >> Thanks & Regards, > > >> Amogh Desai > > >> > > >> > > >> On Sat, Nov 22, 2025 at 10:10 AM Kyungjun Lee < > [email protected] > > > > > >> wrote: > > >> > > >> > Hi Airflow developers, > > >> > > > >> > I’ve been studying the Airflow *Task SDK* in detail, and I find its > > >> > direction very interesting—especially the idea of introducing a > > stable, > > >> > user-facing API layer that is decoupled from the internal executor, > > >> > scheduler, and runtime behavior. > > >> > > > >> > While going through the design notes and recent changes around the > > Task > > >> > SDK, it reminded me of the architectural philosophy behind *Apache > > Spark > > >> > Connect*, which also emphasizes: > > >> > > > >> > - > > >> > > > >> > separating user-facing APIs from the underlying execution engine > > >> > - > > >> > > > >> > providing a stable long-term public API surface > > >> > - > > >> > > > >> > enabling flexible execution models > > >> > - > > >> > > > >> > reducing coupling between API definitions and the actual runtime > > >> > environment > > >> > > > >> > This made me wonder whether the philosophical direction is similar > or > > >> if I > > >> > am drawing an incorrect analogy. > > >> > I would like to ask a few questions to better understand Airflow’s > > >> > long-term intent: > > >> > ------------------------------ > > >> > *Q1.* > > >> > > > >> > Is the Task SDK intentionally aiming for a form of *API–engine > > >> decoupling* > > >> > similar to Spark Connect? > > >> > Or is the motivation fundamentally different? > > >> > *Q2.* > > >> > > > >> > Is the long-term vision that tasks will be defined through a stable > > Task > > >> > SDK interface while the underlying scheduler/executor > implementations > > >> > evolve independently without breaking user code? > > >> > *Q3.* > > >> > > > >> > *https://issues.apache.org/jira/browse/SPARK-39375 > > >> > <https://issues.apache.org/jira/browse/SPARK-39375> # > spark-connect* > > >> > > > >> > From the perspective of the Airflow dev community, does it make > sense > > to > > >> > compare Task SDK ↔ Spark Connect, or is the architectural direction > of > > >> > Airflow fundamentally different? > > >> > ------------------------------ > > >> > > > >> > I’m asking these questions because I want to *better understand the > > >> > philosophy that Airflow is trying to pursue*, and confirm whether my > > >> > interpretation of the Task SDK direction is accurate. > > >> > > > >> > Any insights or clarifications would be greatly appreciated. > > >> > Thank you for your continued work on Airflow. > > >> > > > >> > Best regards, > > >> > *Kyungjun Lee* > > >> > > > >> > > > > > >
