Hi all,

Happy to tell you all that we have completed the first phase of DAG
Serialisation i.e. the Webserver is stateless and can now run without
access to DAG Files.

The 2 limitations we had in 1.10.7-1.10.9 (
https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations)
have been resolved.

Special thanks to @ash for his continuous guidance and contributions.

Also a special mention to Anita Fronczak and Zhou Fang for their
contributions along the way.

The next step is to remove SimpleDag representation in the Scheduler and
replace it with Serialized DAG (WIP PR:
https://github.com/apache/airflow/pull/7694)

*Advantages*:

   - *Reduction in Webserver startup time* for large number of DAGs.
   Without DAG Serialization all the DAGs are loaded in the DagBag during the
   Webserver startup. With DAG Serialization, an empty DagBag is created and
   Dags are loaded from DB only when needed (i.e. when a particular DAG is
   clicked on in the home page)
   - *No DAG Parsing / Consistency*: Webserver would load DAGs from DB and
   won't even need the DAG Files when DAG Serialization is turned on. DAGs are
   parsed, serialized and stored in DB by the Scheduler.
   - Rendered Templates for TasksInstances that have already run will now
   correctly display their value which was true at the time of the run instead
   of the current value.
   - Paves way for* DAG Versioning* (more details on it when I create a
   separate AIP / update an existing AIP for it) and *Scheduler HA *(AIP-15
   <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651>
   ).


I will create new JIRA issues for further steps with DAG Serialization and
DAG Versioning and would discuss them in our next sig-dag-serialization
call (later this month).

Regards,
Kaxil

Reply via email to