Re: what Jed wrote: I think concern about local git /github is not a big issue (custom FSSpec implementation can be added as TP mentioned) , but I fully agree with the second concern - and I think FSSpec is a bad choice to base this work on.
Versioning of DAGs is way more than versioning files. We should also version various metadata if task isolation is going to be in-place. Passing connections, variables, potentially xcom results (or handles to those) from previous tasks to isolated tasks should result eventually in "bundles" of execution. I called them "snapshots" in some earlier comments, but the more I think about the isolation case, the more "bundle" is a better name - as it will be much more than a single "commit" snapshot. I think simply FSSpec (and filesystem in general) is a wrong abstraction for what Airflow needs in the new world of Airflow 3. FSSpec - even with versioning - is basically a more fancy way of DAG synchronization we already have in Airflow 2 and sticking to "file" abstraction is mostly rooting us in the past that we want to move away from. I think whatever abstraction we come up with in Airflow 3 for task isolation and DAG versioning should be opened to multiple cases (even if we will not implement them in Airflow 3.0 straight away): 1) back-compatibility with the "shared FS" abstraction we have currently in Airflow 2 - known as "DAG Folder" 2) bundling only needed Python files that are needed to run the task together with Connections, Variables, XCom it needs 3) bunding **just** Connections, Variables, XComs without even passing any Python files (i.e. running any language native tasks with Airflow API to access those) 4) "virtual" tasks where task are actually mapped into externally executing workflows (similar to Cosmos case where DBT model DAG tasks are mapped into Airflow Tasks) - without running anything at all as "airflow task" So while 1) and 2) are indeed mostly "FS" backed and FSSpec abstraction could be somewhat OK (maybe connections and variables could be passed by some tags/metadata potentially) - it pretty much breaks completely in case 3) and 4) - where we have generally "no fs at all". J On Sun, Jun 9, 2024 at 12:34 AM Tzu-ping Chung <t...@astronomer.io.invalid> wrote: > On Git (and other VCS for the matter) specifically, I believe respec only > supports GitHub because it uses the GitHub API instead of Git. I’m only > guessing, but using Git for random access would have terrible performance > and is likely not an option for them. > > Airflow does not have the same problems though since we are allowed to > cache things on-disk and do a batch update on each DAG-parsing iteration. I > believe it is possible to add custom backends to fsspec, so Airflow should > be able to extend the feature to cover arbitrary repositories without too > much work. > > > > On Jun 8, 2024, at 23:25, Jed Cunningham <jedcunning...@apache.org> > wrote: > > > > Sorry for the delayed reply here. I've been chewing on this one a bit > > though. > > > > One concern I have is that I highly value having a provider agnostic > remote > > git integration. fsspec, however, has local git or github - no arbitrary > > remote git support. That means Airflow, in my view, can't just rely on > > fsspec alone, but will have to also deal with cloning arbitrary repos. Or > > we expect a "ready to go" cloned repo on local disk, which might be a > good > > tradeoff. This was something we'd have to tackle for AIP-63 anyways, > though. > > > > Another concern, as I've been considering Airflow 3, is that I see > wanting > > to version "more" than just the DAG folder - possibly a whole venv (or > > similar). Doing that versioning file-by-file doesn't feel particularly > > practical to me. Which means that, I think, we could end up using fsspec > to > > get zips/tarballs/"assets", which is still very helpful imo. But it does > > limit the "slickness" of being sorta behind the scenes like a custom > module > > loader and resource loader would be. > > > > I actually see this as also being coupled to Ash's upcoming “Task > Execution > > interface”. > > > > I admittedly didn't quite follow the VersionedFS idea you all were > > discussing. I don't see how it could easily allow Airflows goal of "this > > version of the dag folder and maybe more" to a backend that is versioned > by > > file. Maybe you can eli5 and connect the dots for me? > > > > All that said, my goal this next week is to spend some quality time in > this > > area so we can all start nailing down options for Airflow 3. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > For additional commands, e-mail: dev-h...@airflow.apache.org > >