Re: what Jed wrote: I think concern about local git /github is not a big
issue (custom FSSpec implementation can be added as TP mentioned) , but I
fully agree with the second concern - and I think FSSpec is a bad choice to
base this work on.

Versioning of DAGs is way more than versioning files. We should also
version various metadata if task isolation is going to be in-place. Passing
connections, variables, potentially xcom results (or handles to those) from
previous tasks to isolated tasks should result eventually in "bundles" of
execution. I called them "snapshots" in some earlier comments, but the more
I think about the isolation case, the more "bundle" is a better name - as
it will be much more than a single "commit" snapshot.

I think simply FSSpec (and filesystem in general) is a wrong abstraction
for what Airflow needs in the new world of Airflow 3. FSSpec - even with
versioning  - is basically a more fancy way of DAG synchronization we
already have in Airflow 2 and sticking to "file" abstraction is mostly
rooting us in the past that we want to move away from.

I think whatever abstraction we come up with in Airflow 3 for task
isolation and DAG versioning should be opened to multiple cases (even if we
will not implement them in Airflow 3.0 straight away):

1) back-compatibility with the "shared FS" abstraction we have currently in
Airflow 2 - known as "DAG Folder"
2) bundling only needed Python files that are needed to run the task
together with Connections, Variables, XCom it needs
3) bunding **just** Connections, Variables, XComs  without even passing any
Python files (i.e. running any language native tasks with Airflow API to
access those)
4) "virtual" tasks where task are actually mapped into externally executing
workflows (similar to Cosmos case where DBT model DAG tasks are mapped into
Airflow Tasks)  - without running anything at all as "airflow task"

So while 1) and 2) are indeed mostly "FS" backed and FSSpec abstraction
could be somewhat OK (maybe connections and variables could be passed by
some tags/metadata potentially) - it pretty much breaks completely in case
3) and 4) - where we have generally "no fs at all".

J




On Sun, Jun 9, 2024 at 12:34 AM Tzu-ping Chung <t...@astronomer.io.invalid>
wrote:

> On Git (and other VCS for the matter) specifically, I believe respec only
> supports GitHub because it uses the GitHub API instead of Git. I’m only
> guessing, but using Git for random access would have terrible performance
> and is likely not an option for them.
>
> Airflow does not have the same problems though since we are allowed to
> cache things on-disk and do a batch update on each DAG-parsing iteration. I
> believe it is possible to add custom backends to fsspec, so Airflow should
> be able to extend the feature to cover arbitrary repositories without too
> much work.
>
>
> > On Jun 8, 2024, at 23:25, Jed Cunningham <jedcunning...@apache.org>
> wrote:
> >
> > Sorry for the delayed reply here. I've been chewing on this one a bit
> > though.
> >
> > One concern I have is that I highly value having a provider agnostic
> remote
> > git integration. fsspec, however, has local git or github - no arbitrary
> > remote git support. That means Airflow, in my view, can't just rely on
> > fsspec alone, but will have to also deal with cloning arbitrary repos. Or
> > we expect a "ready to go" cloned repo on local disk, which might be a
> good
> > tradeoff. This was something we'd have to tackle for AIP-63 anyways,
> though.
> >
> > Another concern, as I've been considering Airflow 3, is that I see
> wanting
> > to version "more" than just the DAG folder - possibly a whole venv (or
> > similar). Doing that versioning file-by-file doesn't feel particularly
> > practical to me. Which means that, I think, we could end up using fsspec
> to
> > get zips/tarballs/"assets", which is still very helpful imo. But it does
> > limit the "slickness" of being sorta behind the scenes like a custom
> module
> > loader and resource loader would be.
> >
> > I actually see this as also being coupled to Ash's upcoming “Task
> Execution
> > interface”.
> >
> > I admittedly didn't quite follow the VersionedFS idea you all were
> > discussing. I don't see how it could easily allow Airflows goal of "this
> > version of the dag folder and maybe more" to a backend that is versioned
> by
> > file. Maybe you can eli5 and connect the dots for me?
> >
> > All that said, my goal this next week is to spend some quality time in
> this
> > area so we can all start nailing down options for Airflow 3.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to