Re: Is Airflow’s Task SDK philosophically similar to Spark Connect's API–engine decoupling?

Jarek Potiuk Tue, 25 Nov 2025 00:41:17 -0800

> Good points but I think at a surface level, every SDK can be compared on
principles / general
goals. And that's it. Differences start coming in when you delve into any
details.



Yep. That's my point as well. I would not single out Spark for that at all,
simply :)

On Tue, Nov 25, 2025 at 8:04 AM Amogh Desai <[email protected]> wrote:

> Good points but I think at a surface level, every SDK can be compared on
> principles / general
> goals. And that's it. Differences start coming in when you delve into any
> details.
>
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Sun, Nov 23, 2025 at 10:29 PM Jarek Potiuk <[email protected]> wrote:
>
> > Just to correct the two statements as I see that my autocompletion failed
> > me. What I mean was the "workload" not "workflows" in the context of both
> > SDKs
> >
> > *  In Spark the "server" basically executes the workloads submitted by
> the
> > client - because Spark Server is a *workload* execution engine
> > * In Airflow the "server" is the one that submits *workloads* to be
> > executed by the client  - because Airflow Server is an orchestrator that
> > tells their workers what to do (and often those tasks delegate that tasks
> > to other servers like Spark, which means that the worker often acts as a
> > client for both -> orchestrating engine of Airflow, and execution engine
> > (for example Spark).
> >
> > On Sun, Nov 23, 2025 at 5:54 PM Jarek Potiuk <[email protected]> wrote:
> >
> > > And just to add - I would not draw too many analogies specifically for
> > > Spark SDK <> Airflow except the underlying "general IT design
> > principles" -
> > > which are sound regardless of particular SDK implementation.
> > >
> > > Generally the principles that we follow in Task SDK apply to literally
> > > **any** SDK where you want to define a stable API for whoever is the
> > user.
> > > Also it follows the principles of basically **any** SDK where you
> utilise
> > > client-server architecture and you want to make use of the
> > > HTTP/Security/Proxy architecture and fine-grained permissions granted
> for
> > > the client. Those are just "common practices" we implemented with Task
> > SDK.
> > > There was no particular inspiration or parallel to Spark SDK,
> > >
> > > Moreover,  there are significant differences vs Spark on the "logic"
> > level
> > > of the APIS and I would definitely not compare the two, because people
> > > might be misguided if you say "it's like Spark SDK". Spark SDK and
> > Airflow
> > > Task SDK are fundamentally different (on a logic level).
> > >
> > > While the underlying technologies and principles (decoupling of the
> > client
> > > from server code, very clear boundary and "SDK" that exposes only
> what's
> > > really possible to be done with the client) - there are fundamental
> > > differences of what we do in Airflow.
> > >
> > > The main stark difference is the direction of workload submission and
> > > execution that are basically 100% reversed Spark vs. Airflow.
> > >
> > > * In Spark the "server" basically executes the workloads submitted by
> the
> > > client - because Spark Server is a workflow execution engine
> > > * In Airflow the "server" is the one that submits workflows to be
> > executed
> > > by the client  - because Airflow Server is an orchestrator that tells
> > their
> > > workers what to do (and often those tasks delegate that tasks to other
> > > servers like Spark, which means that the worker often acts as a client
> > for
> > > both -> orchestrating engine of Airflow, and execution engine (for
> > example
> > > Spark).
> > >
> > > This is 100% reverse of control - even if the underlying low-level
> > > principles (isolation, decoupling, HTTP communication, security) are
> > > similar - but mostly because they just make sense in general
> engineering,
> > > not because those two SDKs do things in a similar way. They don't.
> > >
> > > J.
> > >
> > >
> > > On Sun, Nov 23, 2025 at 10:57 AM Amogh Desai <[email protected]>
> > > wrote:
> > >
> > >> Answering as one of the Airflow developers contributing to the task
> SDK.
> > >>
> > >> Q1: If Engine = Execution and API = Server side, the analogy is
> > >> comparable.
> > >> The goal of task SDK is to decouple
> > >> Dag authoring from Airflow internals and providing a version agnostic
> > >> stable interface for writing Dags.
> > >>
> > >> Q2: Yes, that's the intention. Custom executor's might require a some
> > >> adaptation while adopting AF3 the first time
> > >> because Airflow 3, deals in *workloads
> > >> <
> > >>
> >
> https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html#workloads
> > >> >
> > >> *vs CLI commands in < 3.0
> > >>
> > >> Q3: You can compare / draw relations provided that the comparison is
> in
> > >> context to the server-client separation and future
> > >> proofing consumers from internal changes.
> > >>
> > >> Thanks & Regards,
> > >> Amogh Desai
> > >>
> > >>
> > >> On Sat, Nov 22, 2025 at 10:10 AM Kyungjun Lee <
> [email protected]
> > >
> > >> wrote:
> > >>
> > >> > Hi Airflow developers,
> > >> >
> > >> > I’ve been studying the Airflow *Task SDK* in detail, and I find its
> > >> > direction very interesting—especially the idea of introducing a
> > stable,
> > >> > user-facing API layer that is decoupled from the internal executor,
> > >> > scheduler, and runtime behavior.
> > >> >
> > >> > While going through the design notes and recent changes around the
> > Task
> > >> > SDK, it reminded me of the architectural philosophy behind *Apache
> > Spark
> > >> > Connect*, which also emphasizes:
> > >> >
> > >> >    -
> > >> >
> > >> >    separating user-facing APIs from the underlying execution engine
> > >> >    -
> > >> >
> > >> >    providing a stable long-term public API surface
> > >> >    -
> > >> >
> > >> >    enabling flexible execution models
> > >> >    -
> > >> >
> > >> >    reducing coupling between API definitions and the actual runtime
> > >> >    environment
> > >> >
> > >> > This made me wonder whether the philosophical direction is similar
> or
> > >> if I
> > >> > am drawing an incorrect analogy.
> > >> > I would like to ask a few questions to better understand Airflow’s
> > >> > long-term intent:
> > >> > ------------------------------
> > >> > *Q1.*
> > >> >
> > >> > Is the Task SDK intentionally aiming for a form of *API–engine
> > >> decoupling*
> > >> > similar to Spark Connect?
> > >> > Or is the motivation fundamentally different?
> > >> > *Q2.*
> > >> >
> > >> > Is the long-term vision that tasks will be defined through a stable
> > Task
> > >> > SDK interface while the underlying scheduler/executor
> implementations
> > >> > evolve independently without breaking user code?
> > >> > *Q3.*
> > >> >
> > >> > *https://issues.apache.org/jira/browse/SPARK-39375
> > >> > <https://issues.apache.org/jira/browse/SPARK-39375>  #
> spark-connect*
> > >> >
> > >> > From the perspective of the Airflow dev community, does it make
> sense
> > to
> > >> > compare Task SDK ↔ Spark Connect, or is the architectural direction
> of
> > >> > Airflow fundamentally different?
> > >> > ------------------------------
> > >> >
> > >> > I’m asking these questions because I want to *better understand the
> > >> > philosophy that Airflow is trying to pursue*, and confirm whether my
> > >> > interpretation of the Task SDK direction is accurate.
> > >> >
> > >> > Any insights or clarifications would be greatly appreciated.
> > >> > Thank you for your continued work on Airflow.
> > >> >
> > >> > Best regards,
> > >> > *Kyungjun Lee*
> > >> >
> > >>
> > >
> >
>

Re: Is Airflow’s Task SDK philosophically similar to Spark Connect's API–engine decoupling?

Reply via email to