I realized that I got too immersed in the decoupled server–client
architecture and ended up missing the essence, focusing only on fragmented
details.

Thank you very much for your thoughtful explanation.

2025년 11월 25일 (화) PM 5:41, Jarek Potiuk <[email protected]>님이 작성:

> > Good points but I think at a surface level, every SDK can be compared on
> principles / general
> goals. And that's it. Differences start coming in when you delve into any
> details.
>
>
> Yep. That's my point as well. I would not single out Spark for that at all,
> simply :)
>
> On Tue, Nov 25, 2025 at 8:04 AM Amogh Desai <[email protected]> wrote:
>
> > Good points but I think at a surface level, every SDK can be compared on
> > principles / general
> > goals. And that's it. Differences start coming in when you delve into any
> > details.
> >
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> >
> > On Sun, Nov 23, 2025 at 10:29 PM Jarek Potiuk <[email protected]> wrote:
> >
> > > Just to correct the two statements as I see that my autocompletion
> failed
> > > me. What I mean was the "workload" not "workflows" in the context of
> both
> > > SDKs
> > >
> > > *  In Spark the "server" basically executes the workloads submitted by
> > the
> > > client - because Spark Server is a *workload* execution engine
> > > * In Airflow the "server" is the one that submits *workloads* to be
> > > executed by the client  - because Airflow Server is an orchestrator
> that
> > > tells their workers what to do (and often those tasks delegate that
> tasks
> > > to other servers like Spark, which means that the worker often acts as
> a
> > > client for both -> orchestrating engine of Airflow, and execution
> engine
> > > (for example Spark).
> > >
> > > On Sun, Nov 23, 2025 at 5:54 PM Jarek Potiuk <[email protected]> wrote:
> > >
> > > > And just to add - I would not draw too many analogies specifically
> for
> > > > Spark SDK <> Airflow except the underlying "general IT design
> > > principles" -
> > > > which are sound regardless of particular SDK implementation.
> > > >
> > > > Generally the principles that we follow in Task SDK apply to
> literally
> > > > **any** SDK where you want to define a stable API for whoever is the
> > > user.
> > > > Also it follows the principles of basically **any** SDK where you
> > utilise
> > > > client-server architecture and you want to make use of the
> > > > HTTP/Security/Proxy architecture and fine-grained permissions granted
> > for
> > > > the client. Those are just "common practices" we implemented with
> Task
> > > SDK.
> > > > There was no particular inspiration or parallel to Spark SDK,
> > > >
> > > > Moreover,  there are significant differences vs Spark on the "logic"
> > > level
> > > > of the APIS and I would definitely not compare the two, because
> people
> > > > might be misguided if you say "it's like Spark SDK". Spark SDK and
> > > Airflow
> > > > Task SDK are fundamentally different (on a logic level).
> > > >
> > > > While the underlying technologies and principles (decoupling of the
> > > client
> > > > from server code, very clear boundary and "SDK" that exposes only
> > what's
> > > > really possible to be done with the client) - there are fundamental
> > > > differences of what we do in Airflow.
> > > >
> > > > The main stark difference is the direction of workload submission and
> > > > execution that are basically 100% reversed Spark vs. Airflow.
> > > >
> > > > * In Spark the "server" basically executes the workloads submitted by
> > the
> > > > client - because Spark Server is a workflow execution engine
> > > > * In Airflow the "server" is the one that submits workflows to be
> > > executed
> > > > by the client  - because Airflow Server is an orchestrator that tells
> > > their
> > > > workers what to do (and often those tasks delegate that tasks to
> other
> > > > servers like Spark, which means that the worker often acts as a
> client
> > > for
> > > > both -> orchestrating engine of Airflow, and execution engine (for
> > > example
> > > > Spark).
> > > >
> > > > This is 100% reverse of control - even if the underlying low-level
> > > > principles (isolation, decoupling, HTTP communication, security) are
> > > > similar - but mostly because they just make sense in general
> > engineering,
> > > > not because those two SDKs do things in a similar way. They don't.
> > > >
> > > > J.
> > > >
> > > >
> > > > On Sun, Nov 23, 2025 at 10:57 AM Amogh Desai <[email protected]>
> > > > wrote:
> > > >
> > > >> Answering as one of the Airflow developers contributing to the task
> > SDK.
> > > >>
> > > >> Q1: If Engine = Execution and API = Server side, the analogy is
> > > >> comparable.
> > > >> The goal of task SDK is to decouple
> > > >> Dag authoring from Airflow internals and providing a version
> agnostic
> > > >> stable interface for writing Dags.
> > > >>
> > > >> Q2: Yes, that's the intention. Custom executor's might require a
> some
> > > >> adaptation while adopting AF3 the first time
> > > >> because Airflow 3, deals in *workloads
> > > >> <
> > > >>
> > >
> >
> https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html#workloads
> > > >> >
> > > >> *vs CLI commands in < 3.0
> > > >>
> > > >> Q3: You can compare / draw relations provided that the comparison is
> > in
> > > >> context to the server-client separation and future
> > > >> proofing consumers from internal changes.
> > > >>
> > > >> Thanks & Regards,
> > > >> Amogh Desai
> > > >>
> > > >>
> > > >> On Sat, Nov 22, 2025 at 10:10 AM Kyungjun Lee <
> > [email protected]
> > > >
> > > >> wrote:
> > > >>
> > > >> > Hi Airflow developers,
> > > >> >
> > > >> > I’ve been studying the Airflow *Task SDK* in detail, and I find
> its
> > > >> > direction very interesting—especially the idea of introducing a
> > > stable,
> > > >> > user-facing API layer that is decoupled from the internal
> executor,
> > > >> > scheduler, and runtime behavior.
> > > >> >
> > > >> > While going through the design notes and recent changes around the
> > > Task
> > > >> > SDK, it reminded me of the architectural philosophy behind *Apache
> > > Spark
> > > >> > Connect*, which also emphasizes:
> > > >> >
> > > >> >    -
> > > >> >
> > > >> >    separating user-facing APIs from the underlying execution
> engine
> > > >> >    -
> > > >> >
> > > >> >    providing a stable long-term public API surface
> > > >> >    -
> > > >> >
> > > >> >    enabling flexible execution models
> > > >> >    -
> > > >> >
> > > >> >    reducing coupling between API definitions and the actual
> runtime
> > > >> >    environment
> > > >> >
> > > >> > This made me wonder whether the philosophical direction is similar
> > or
> > > >> if I
> > > >> > am drawing an incorrect analogy.
> > > >> > I would like to ask a few questions to better understand Airflow’s
> > > >> > long-term intent:
> > > >> > ------------------------------
> > > >> > *Q1.*
> > > >> >
> > > >> > Is the Task SDK intentionally aiming for a form of *API–engine
> > > >> decoupling*
> > > >> > similar to Spark Connect?
> > > >> > Or is the motivation fundamentally different?
> > > >> > *Q2.*
> > > >> >
> > > >> > Is the long-term vision that tasks will be defined through a
> stable
> > > Task
> > > >> > SDK interface while the underlying scheduler/executor
> > implementations
> > > >> > evolve independently without breaking user code?
> > > >> > *Q3.*
> > > >> >
> > > >> > *https://issues.apache.org/jira/browse/SPARK-39375
> > > >> > <https://issues.apache.org/jira/browse/SPARK-39375>  #
> > spark-connect*
> > > >> >
> > > >> > From the perspective of the Airflow dev community, does it make
> > sense
> > > to
> > > >> > compare Task SDK ↔ Spark Connect, or is the architectural
> direction
> > of
> > > >> > Airflow fundamentally different?
> > > >> > ------------------------------
> > > >> >
> > > >> > I’m asking these questions because I want to *better understand
> the
> > > >> > philosophy that Airflow is trying to pursue*, and confirm whether
> my
> > > >> > interpretation of the Task SDK direction is accurate.
> > > >> >
> > > >> > Any insights or clarifications would be greatly appreciated.
> > > >> > Thank you for your continued work on Airflow.
> > > >> >
> > > >> > Best regards,
> > > >> > *Kyungjun Lee*
> > > >> >
> > > >>
> > > >
> > >
> >
>

Reply via email to