Good points but I think at a surface level, every SDK can be compared on principles / general goals. And that's it. Differences start coming in when you delve into any details.
Thanks & Regards, Amogh Desai On Sun, Nov 23, 2025 at 10:29 PM Jarek Potiuk <[email protected]> wrote: > Just to correct the two statements as I see that my autocompletion failed > me. What I mean was the "workload" not "workflows" in the context of both > SDKs > > * In Spark the "server" basically executes the workloads submitted by the > client - because Spark Server is a *workload* execution engine > * In Airflow the "server" is the one that submits *workloads* to be > executed by the client - because Airflow Server is an orchestrator that > tells their workers what to do (and often those tasks delegate that tasks > to other servers like Spark, which means that the worker often acts as a > client for both -> orchestrating engine of Airflow, and execution engine > (for example Spark). > > On Sun, Nov 23, 2025 at 5:54 PM Jarek Potiuk <[email protected]> wrote: > > > And just to add - I would not draw too many analogies specifically for > > Spark SDK <> Airflow except the underlying "general IT design > principles" - > > which are sound regardless of particular SDK implementation. > > > > Generally the principles that we follow in Task SDK apply to literally > > **any** SDK where you want to define a stable API for whoever is the > user. > > Also it follows the principles of basically **any** SDK where you utilise > > client-server architecture and you want to make use of the > > HTTP/Security/Proxy architecture and fine-grained permissions granted for > > the client. Those are just "common practices" we implemented with Task > SDK. > > There was no particular inspiration or parallel to Spark SDK, > > > > Moreover, there are significant differences vs Spark on the "logic" > level > > of the APIS and I would definitely not compare the two, because people > > might be misguided if you say "it's like Spark SDK". Spark SDK and > Airflow > > Task SDK are fundamentally different (on a logic level). > > > > While the underlying technologies and principles (decoupling of the > client > > from server code, very clear boundary and "SDK" that exposes only what's > > really possible to be done with the client) - there are fundamental > > differences of what we do in Airflow. > > > > The main stark difference is the direction of workload submission and > > execution that are basically 100% reversed Spark vs. Airflow. > > > > * In Spark the "server" basically executes the workloads submitted by the > > client - because Spark Server is a workflow execution engine > > * In Airflow the "server" is the one that submits workflows to be > executed > > by the client - because Airflow Server is an orchestrator that tells > their > > workers what to do (and often those tasks delegate that tasks to other > > servers like Spark, which means that the worker often acts as a client > for > > both -> orchestrating engine of Airflow, and execution engine (for > example > > Spark). > > > > This is 100% reverse of control - even if the underlying low-level > > principles (isolation, decoupling, HTTP communication, security) are > > similar - but mostly because they just make sense in general engineering, > > not because those two SDKs do things in a similar way. They don't. > > > > J. > > > > > > On Sun, Nov 23, 2025 at 10:57 AM Amogh Desai <[email protected]> > > wrote: > > > >> Answering as one of the Airflow developers contributing to the task SDK. > >> > >> Q1: If Engine = Execution and API = Server side, the analogy is > >> comparable. > >> The goal of task SDK is to decouple > >> Dag authoring from Airflow internals and providing a version agnostic > >> stable interface for writing Dags. > >> > >> Q2: Yes, that's the intention. Custom executor's might require a some > >> adaptation while adopting AF3 the first time > >> because Airflow 3, deals in *workloads > >> < > >> > https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html#workloads > >> > > >> *vs CLI commands in < 3.0 > >> > >> Q3: You can compare / draw relations provided that the comparison is in > >> context to the server-client separation and future > >> proofing consumers from internal changes. > >> > >> Thanks & Regards, > >> Amogh Desai > >> > >> > >> On Sat, Nov 22, 2025 at 10:10 AM Kyungjun Lee <[email protected] > > > >> wrote: > >> > >> > Hi Airflow developers, > >> > > >> > I’ve been studying the Airflow *Task SDK* in detail, and I find its > >> > direction very interesting—especially the idea of introducing a > stable, > >> > user-facing API layer that is decoupled from the internal executor, > >> > scheduler, and runtime behavior. > >> > > >> > While going through the design notes and recent changes around the > Task > >> > SDK, it reminded me of the architectural philosophy behind *Apache > Spark > >> > Connect*, which also emphasizes: > >> > > >> > - > >> > > >> > separating user-facing APIs from the underlying execution engine > >> > - > >> > > >> > providing a stable long-term public API surface > >> > - > >> > > >> > enabling flexible execution models > >> > - > >> > > >> > reducing coupling between API definitions and the actual runtime > >> > environment > >> > > >> > This made me wonder whether the philosophical direction is similar or > >> if I > >> > am drawing an incorrect analogy. > >> > I would like to ask a few questions to better understand Airflow’s > >> > long-term intent: > >> > ------------------------------ > >> > *Q1.* > >> > > >> > Is the Task SDK intentionally aiming for a form of *API–engine > >> decoupling* > >> > similar to Spark Connect? > >> > Or is the motivation fundamentally different? > >> > *Q2.* > >> > > >> > Is the long-term vision that tasks will be defined through a stable > Task > >> > SDK interface while the underlying scheduler/executor implementations > >> > evolve independently without breaking user code? > >> > *Q3.* > >> > > >> > *https://issues.apache.org/jira/browse/SPARK-39375 > >> > <https://issues.apache.org/jira/browse/SPARK-39375> # spark-connect* > >> > > >> > From the perspective of the Airflow dev community, does it make sense > to > >> > compare Task SDK ↔ Spark Connect, or is the architectural direction of > >> > Airflow fundamentally different? > >> > ------------------------------ > >> > > >> > I’m asking these questions because I want to *better understand the > >> > philosophy that Airflow is trying to pursue*, and confirm whether my > >> > interpretation of the Task SDK direction is accurate. > >> > > >> > Any insights or clarifications would be greatly appreciated. > >> > Thank you for your continued work on Airflow. > >> > > >> > Best regards, > >> > *Kyungjun Lee* > >> > > >> > > >
