Re: Is Airflow’s Task SDK philosophically similar to Spark Connect's API–engine decoupling?

Amogh Desai Mon, 24 Nov 2025 23:04:59 -0800

Good points but I think at a surface level, every SDK can be compared on
principles / general
goals. And that's it. Differences start coming in when you delve into any
details.



Thanks & Regards,
Amogh Desai


On Sun, Nov 23, 2025 at 10:29 PM Jarek Potiuk <[email protected]> wrote:

> Just to correct the two statements as I see that my autocompletion failed
> me. What I mean was the "workload" not "workflows" in the context of both
> SDKs
>
> *  In Spark the "server" basically executes the workloads submitted by the
> client - because Spark Server is a *workload* execution engine
> * In Airflow the "server" is the one that submits *workloads* to be
> executed by the client  - because Airflow Server is an orchestrator that
> tells their workers what to do (and often those tasks delegate that tasks
> to other servers like Spark, which means that the worker often acts as a
> client for both -> orchestrating engine of Airflow, and execution engine
> (for example Spark).
>
> On Sun, Nov 23, 2025 at 5:54 PM Jarek Potiuk <[email protected]> wrote:
>
> > And just to add - I would not draw too many analogies specifically for
> > Spark SDK <> Airflow except the underlying "general IT design
> principles" -
> > which are sound regardless of particular SDK implementation.
> >
> > Generally the principles that we follow in Task SDK apply to literally
> > **any** SDK where you want to define a stable API for whoever is the
> user.
> > Also it follows the principles of basically **any** SDK where you utilise
> > client-server architecture and you want to make use of the
> > HTTP/Security/Proxy architecture and fine-grained permissions granted for
> > the client. Those are just "common practices" we implemented with Task
> SDK.
> > There was no particular inspiration or parallel to Spark SDK,
> >
> > Moreover,  there are significant differences vs Spark on the "logic"
> level
> > of the APIS and I would definitely not compare the two, because people
> > might be misguided if you say "it's like Spark SDK". Spark SDK and
> Airflow
> > Task SDK are fundamentally different (on a logic level).
> >
> > While the underlying technologies and principles (decoupling of the
> client
> > from server code, very clear boundary and "SDK" that exposes only what's
> > really possible to be done with the client) - there are fundamental
> > differences of what we do in Airflow.
> >
> > The main stark difference is the direction of workload submission and
> > execution that are basically 100% reversed Spark vs. Airflow.
> >
> > * In Spark the "server" basically executes the workloads submitted by the
> > client - because Spark Server is a workflow execution engine
> > * In Airflow the "server" is the one that submits workflows to be
> executed
> > by the client  - because Airflow Server is an orchestrator that tells
> their
> > workers what to do (and often those tasks delegate that tasks to other
> > servers like Spark, which means that the worker often acts as a client
> for
> > both -> orchestrating engine of Airflow, and execution engine (for
> example
> > Spark).
> >
> > This is 100% reverse of control - even if the underlying low-level
> > principles (isolation, decoupling, HTTP communication, security) are
> > similar - but mostly because they just make sense in general engineering,
> > not because those two SDKs do things in a similar way. They don't.
> >
> > J.
> >
> >
> > On Sun, Nov 23, 2025 at 10:57 AM Amogh Desai <[email protected]>
> > wrote:
> >
> >> Answering as one of the Airflow developers contributing to the task SDK.
> >>
> >> Q1: If Engine = Execution and API = Server side, the analogy is
> >> comparable.
> >> The goal of task SDK is to decouple
> >> Dag authoring from Airflow internals and providing a version agnostic
> >> stable interface for writing Dags.
> >>
> >> Q2: Yes, that's the intention. Custom executor's might require a some
> >> adaptation while adopting AF3 the first time
> >> because Airflow 3, deals in *workloads
> >> <
> >>
> https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html#workloads
> >> >
> >> *vs CLI commands in < 3.0
> >>
> >> Q3: You can compare / draw relations provided that the comparison is in
> >> context to the server-client separation and future
> >> proofing consumers from internal changes.
> >>
> >> Thanks & Regards,
> >> Amogh Desai
> >>
> >>
> >> On Sat, Nov 22, 2025 at 10:10 AM Kyungjun Lee <[email protected]
> >
> >> wrote:
> >>
> >> > Hi Airflow developers,
> >> >
> >> > I’ve been studying the Airflow *Task SDK* in detail, and I find its
> >> > direction very interesting—especially the idea of introducing a
> stable,
> >> > user-facing API layer that is decoupled from the internal executor,
> >> > scheduler, and runtime behavior.
> >> >
> >> > While going through the design notes and recent changes around the
> Task
> >> > SDK, it reminded me of the architectural philosophy behind *Apache
> Spark
> >> > Connect*, which also emphasizes:
> >> >
> >> >    -
> >> >
> >> >    separating user-facing APIs from the underlying execution engine
> >> >    -
> >> >
> >> >    providing a stable long-term public API surface
> >> >    -
> >> >
> >> >    enabling flexible execution models
> >> >    -
> >> >
> >> >    reducing coupling between API definitions and the actual runtime
> >> >    environment
> >> >
> >> > This made me wonder whether the philosophical direction is similar or
> >> if I
> >> > am drawing an incorrect analogy.
> >> > I would like to ask a few questions to better understand Airflow’s
> >> > long-term intent:
> >> > ------------------------------
> >> > *Q1.*
> >> >
> >> > Is the Task SDK intentionally aiming for a form of *API–engine
> >> decoupling*
> >> > similar to Spark Connect?
> >> > Or is the motivation fundamentally different?
> >> > *Q2.*
> >> >
> >> > Is the long-term vision that tasks will be defined through a stable
> Task
> >> > SDK interface while the underlying scheduler/executor implementations
> >> > evolve independently without breaking user code?
> >> > *Q3.*
> >> >
> >> > *https://issues.apache.org/jira/browse/SPARK-39375
> >> > <https://issues.apache.org/jira/browse/SPARK-39375>  # spark-connect*
> >> >
> >> > From the perspective of the Airflow dev community, does it make sense
> to
> >> > compare Task SDK ↔ Spark Connect, or is the architectural direction of
> >> > Airflow fundamentally different?
> >> > ------------------------------
> >> >
> >> > I’m asking these questions because I want to *better understand the
> >> > philosophy that Airflow is trying to pursue*, and confirm whether my
> >> > interpretation of the Task SDK direction is accurate.
> >> >
> >> > Any insights or clarifications would be greatly appreciated.
> >> > Thank you for your continued work on Airflow.
> >> >
> >> > Best regards,
> >> > *Kyungjun Lee*
> >> >
> >>
> >
>

Re: Is Airflow’s Task SDK philosophically similar to Spark Connect's API–engine decoupling?

Reply via email to