For these you would need to make your code Path aware instead of relying on the OS. Note that the change is intended to be backwards compatible. So if your DAG folder path is local it should remain working the same. It's just for remote storage that this provides a challenge as it can not be done transparently on cloud storage.
Alternatively, the Loader could read a manifest / requirements file to make sure that these files become available in the cache of the DAG. While not intended in a first iteration, this might be a good idea and can work nicely with aip-63 but also your improvement on security. B Sent from my iPhone > On 26 May 2024, at 13:34, Ash Berlin-Taylor <a...@apache.org> wrote: > > >> >> Non DAG, Non module assets as part of the DAG folder are out of scope. So > say for example for some reason you include a GIF. This will not > automatically be available without changes to your code. > > What about SQL files a task uses, either as a template or via something else > such as dbt? How about YAML based dag generators? > > (This might be mentioned in the wiki page, but it's not loading for me right > now) > > -ash > >> On 26 May 2024 08:55:11 BST, Bolke de Bruin <bdbr...@gmail.com> wrote: >> Hi All, >> >> I would like to discuss a new AIP aimed at enhancing the DAG loading >> mechanism to support reading DAGs from ephemeral storage solutions. This >> proposal is intended to supersede AIP-5 Remote DAG Fetcher and provide a >> more flexible and scalable approach and to prepare for AIP-63. >> >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-71+Generalizing+DAG+Loader+and+Processor+for+Ephemeral+Storage >> >> *Abstract* >> This proposal aims to generalize the DAG loader and processor to use >> pathlib.Path for file operations instead of assuming direct OS filesystem >> access. It includes implementing a custom module loader that supports >> loading from ObjectStoragePath locations and other Path-like abstractions, >> with caching capabilities provided by fsspec. Furthermore, while this AIP >> does not directly implement DAG versioning, it creates a foundational layer >> that can be extended to support DAG versioning as outlined in AIP-63. >> >> A work in progress PR can be found here: >> https://github.com/apache/airflow/pull/39647 >> >> *Key points for discussion* >> >> Previous proposals, like AIP-5, suggested using a Fetcher mechanism. Kind >> of like an in-process git-sync. This proposal is about making that >> redundant by fully supporting object storage locations by leveraging >> ObjectStoragePath and fsspec caching mechanisms. >> >> Earlier feedback on AIP-5 was that we thought that having an additional >> Fetcher process was out of scope of the project. With the transient >> integration of pathlib.Path and ObjectStoragePath I think this argument >> does not hold anymore and the demand is there. In addition the added >> flexibility allows AIP-63 to be implemented easier (what that looks like >> remains to be seen). >> >> Airflow scans DAGs often. This very likely requires a caching mechanism for >> both the DAGs and their modules. Fsspec does implement caching, and it is >> planned to leverage this. >> >> Non DAG, Non module assets as part of the DAG folder are out of scope. So >> say for example for some reason you include a GIF. This will not >> automatically be available without changes to your code. >> >> I kindly request your thoughts :-). >> >> Bolke >> >> -- >> >> -- >> Bolke de Bruin >> bdbr...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org