> 1) *Calendars are dynamic*.    Special events cannot be avoided:  911 -
financial markets and air flights were cancelled for a while in the US,
country wide protests due to law changes in France right now, etc.  These
are immediate changes to "the calendar".
We need to be able to push calendar definition(s) to Airflow in a file on a
daily basis (or on demand) without restarts to the
Airflow environment (scheduler) and have these take effect when we say so
without impacting DAGs in flight.

Sounds like a fantastic idea for another timetable that reads schedule from
a file placed in the DAGS folder (pointed at by the timetable class. That
would be following the Airflow philosophy and responding to your needs
perfectly.
As usual PRs are most welcome - maybe you would like to contribute one?

> 2) It is not clear why DAG definitions include schedule information ...
> Those are scheduling/calendar concepts completely separate from a DAG,
> which should just stick to describing the data pipeline.

Let me explain. It's clear to me, but maybe not that obvious, so let me
try. Airflow's whole philosophy is different.  In the Airflow world, DAGS,
tasks, schedule, configuration - all come from the same Python file
(usually). One of the reasons is the flexibility it brings.Your proposal
moves DAGs into more of the "declarative" realm, but our DAGs are purely
imperative and (thanks to Python - being Turing-complete). Which has also
the benefit that you can split it if you want (which is impossible to do
when you do it in the other direction. There is nothing wrong if you would
like to separate it now:

from my_schedule import SCHEDULE

with models.DAG(
    DAG_ID,
    schedule="@once",
    start_date=SCHEDULE,
    catchup=False,
    tags=["example", "ads"],
) as dag:
...

And keep your schedule separately.

You can even place it in a separate package and have it dynamically
selected based on env variables, or whatever you want. That gives you a
nice separation of concerns if you think it's worth it. Also, on the other
hand you can recreate a DAG object with different schedules in a loop and
have essentially the same DAGs run on several different schedules if you
want (Python's flexibility and Dynamic DAG generation covers that).

All about DAG including schedule is part of the Python code that defines
it. This is the foundation on how airflow works and it has some properties
that Airflow builds on. In our "corner of the world" (rather big one but
still a corner) DAG is intrinsically connected with Python code that
defines it (as a whole). If you want to split it, you are proposing to
teari the basic fabric that Airflow is built up - into pieces. The
"backbone" of Airflow DAG is that (complete DAG with the schedule) is a
Python object, defined by parsing Python.

J.



On Fri, Mar 31, 2023 at 5:35 PM David Robinson <drobin1...@gmail.com> wrote:

> I also like this feature suggestion and think it steps in the right
> direction.
> This idea is crucial for businesses who schedule their data pipelines and
> other actions
> based on Calendars.  We use multiple calendars depending on the countries
> or even
> regions in which we operate.
>
> In addition to the other examples of calendars already listed, Prefect
> RRules is another example, in case
> it was not yet mentioned.
>
> Two additional thoughts:
> 1) *Calendars are dynamic*.    Special events cannot be avoided:  911 -
> financial markets and air flights were cancelled for a while in the US,
> country wide protests due to law changes in France right now, etc.  These
> are immediate changes to "the calendar".
> We need to be able to push calendar definition(s) to Airflow in a file on a
> daily basis (or on demand) without restarts to the
> Airflow environment (scheduler) and have these take effect when we say so
> without impacting DAGs in flight.
>
> 2) It is not clear why DAG definitions include schedule information in them
> at all.  schedule start date, schedule interval, catch up, retry, even
> a pointer to a custom timetable?
> Those are scheduling/calendar concepts completely separate from a DAG,
> which should just stick to describing the data pipeline.
> Extending Timetable doesn't seem to go far enough for this feature for
> implementing a true Calendar concept?
> From a simplistic view there are 3 objects in play... a calendar, a DAG,
> and a separate schedule meta data thing
> that associates the calendar and the DAG.  To change things about a DAG's
> schedule or calendar should only require
> manipulation of "the meta data thing" leaving the DAG definition unchanged.
>
>
> On Fri, Mar 31, 2023 at 4:49 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
> > Yep. I also think it's a good addition. Though again - I don't think
> > AIP is something that we need here, simply a PR where it could be
> > iterated on, should be enough IMHO. It's **just** another timetable to
> > add, we are not changing any architectural decision here.
> >
> > I'd also separate out the discussion on any UI component. That one (if
> > it flies) might require AIP indeed as it it is not in-line with the
> > current philosophy of Airflow where we change behaviours of DAG via
> > code in the DAGs folder. But the timetable proposed would be extremely
> > useful on its own, being configurable as described by others.
> >
> > On Fri, Mar 31, 2023 at 9:27 AM Mehta, Shubham
> > <shu...@amazon.com.invalid> wrote:
> > >
> > > Hi Malthe
> > >
> > > I really like this feature suggestion, especially since it's crucial
> for
> > customers who are transitioning from Control-M or Autosys [1] to Airflow.
> > Implementing this feature has the potential to bring Airflow's scheduling
> > capabilities on par with these tools.
> > >
> > > > I actually added some sort of forecasting functionality to the
> > calendar view a while back
> > > This is incredible! I was going to bring it up as this becomes
> essential
> > when scheduling tasks such as "all days except US holidays.". Glad that
> you
> > already took care of it.
> > >
> > > > if we had a nicely generated string representation describing the
> > resulting schedule, perhaps that would be adequate
> > > While I understand the importance of having a human-readable string, I
> > agree with Niko that UI configuration is important here. In the same vein
> > as forecasting, having configuration in UI will enable any user to set up
> > these schedules without making any mistakes.
> > >
> > > Reference:
> > > 1. https://www.youtube.com/watch?v=qrwwxDKAY-w&ab_channel=BMCdocs
> > >
> > > Cheers,
> > > Shubham
> > >
> > > On 2023-03-30, 6:09 AM, "Malthe Borch" <mal...@apache.org <mailto:
> > mal...@apache.org>> wrote:
> > >
> > >
> > > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> > >
> > >
> > >
> > >
> > >
> > >
> > > > I think an interesting topic to discuss for this project would be
> some
> > kind of UI based date/calendar picker to help users construct these
> logical
> > compositions. Something like `days("D1", "D2", "THU-SAT", "4>", "L1")` is
> > quite inscrutable.
> > >
> > >
> > > Agreed – and really, the syntax is something to consider further. I
> > think implementing a human-readable `__str__` could perhaps go a long way
> > here!
> > >
> > >
> > > > A UI component to at least visualize the composition you've created
> > would be super powerful, and if it could be used to modify or create the
> > compositions that would be an even better user experience.
> > >
> > >
> > > It might be useful, but if we had a nicely generated string
> > representation describing the resulting schedule, perhaps that would be
> > adequate. This could be enough to confirm the developer that they've done
> > it correctly.
> > >
> > >
> > > Anticipating this "rich scheduling" capability, I actually added some
> > sort of forecasting functionality to the calendar view a while back:
> > https://github.com/apache/airflow/pull/22055 <
> > https://github.com/apache/airflow/pull/22055>.
> > >
> > >
> > > Cheers
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org <mailto:
> > dev-unsubscr...@airflow.apache.org>
> > > For additional commands, e-mail: dev-h...@airflow.apache.org <mailto:
> > dev-h...@airflow.apache.org>
> > >
> > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >
> >
>

Reply via email to