The problem with “data date” is that the date does not correspond to data. The date is, fundamentally, only an chosen date to represent a DAG run, and it simply happens that the arbitrarily logic chose the beginning date of the “data period”. However, since the logic is arbitrary, it can equally choose some date that’s not related to the processed data; in fact, aside from the fact that custom timetables are allowed to used any date as the logical date, Airflow also does not use a data-related date for manually triggered DAG runs (it uses the date that the run is triggered by the user).
One thing I “like” (in the ironical sense) about “logical date” is that literally nothing about it is logical 😆and my advice to any newcomers trying to make sense of the value is to, well, just don’t, and use something else instead (e.g. data interval). The value is inherently so ill-designed that not one name is able to describe the meaning behind it, and in a more serious sense, logical date is good IMO because it describes the value in the vaguest way possible—we can’t describe the value, so let’s don’t, because a wrong explanation is worse than no explanation. TP On Jan 27 2022, at 5:37 am, Howard Yoo <howard....@astronomer.io.invalid> wrote: > Hi TP, > > I do have another opinion about this change. > I agree that ’schedule_date’ is not the right word to describe it, but I also > believe that `logical date` is also a bit confusing. > My argument is that there is not that much ‘logical’ thing about the period > of which the DAG run will base its data processing against, other than the > fact that the date is not the ‘actual’ run date. Sometimes, the adjective > ‘logical’ describes the concept of something that’s virtual or only existing > conceptually (e.g. logical drive, logical deletion, etc.). However, one of > the problems that I see of using ‘logical’ is that people (who are new to > Airflow) can still be quite confusing to understand the meaning of ‘logical.’ > They might wonder: ‘if this date is the ‘logical’ date, then are all the > other dates illogical dates? What then, would be the real dates?’ > > My opinion is that perhaps using the term ‘Data Date’ can actually be clearer > to describe this period. > Airflow DAGs are meant, mostly, for Data Orchestration, and hence the DAG > runs are generally for processing DATA. In that case, the start and end > period of this processing can be called Data start and Data end date, to more > clearly describe that the data that the DAG is going to be ‘processing’ are > the ones between the data start and end date, hence less confusion as users > would definitely know what they represent more clearly. > > I’ve also seen a similar term being used in Airflow’s DAG run UI page, where > the popup actually has ‘data start date and end date’ and when I saw that, I > was able to immediately understand what it meant, rather than a rather > ambiguous ‘logical date’ (I still would have asked someone about the meaning > of ‘logical’ if I saw that). > I’m not sure this decision has been finalized and the team would like to move > forward with the ‘logical date,’ but since I have an opinion that that too > can be confusing for the first time users, I just wanted to let out my > opinion on that, and propose something different. > Hope this helps, > Howard > > On 2021/08/03 09:35:15 Tzu-ping Chung wrote: > > Hi all, > > > > I want to give a heads-up on a minor modification I just made to AIP-39. > > AIP-39 originally proposed renaming execution_date to schedule_date since > > the old name was confusing (it’s not when the DAG run is actually > > executed). However, while implementing the AIP and drafting documentation > > to it, I realised schedule_date is also quite confusing—the date is also > > not when the DAG run is scheduled to run. > > I went through the current documentation to get an idea how it currently > > explains execution_date, and found multiple instances the adjective > > “logical” is used: > > Concepts → DAG > > (https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#running-dags): > > “[Each DAG run] has a defined execution_date, which identifies the logical > > date and time it is running for - not the actual time when it was started.” > > Tutorial > > (https://airflow.apache.org/docs/apache-airflow/stable/tutorial.html): “The > > date specified in this context is called execution_date. This is the > > logical date, which simulates the scheduler running your task or dag at a > > specific date and time […].“ > > > > The GCS operator > > (https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/gcs.html): > > “The time span is defined by the DAG instance logical execution timestamp > > (execution_date, start of time span) and the timestamp when the next DAG > > instance execution is scheduled (end of time span).” > > > > So, after talking to Ash, I have renamed the field to logical_date. This > > would make the name more consistent to the term used to describe it, and > > hopefully the concept easier to understand. > > > > TP > > >