Hi TP,

I do have another opinion about this change.
I agree that ’schedule_date’ is not the right word to describe it, but I also 
believe that `logical date` is also a bit confusing.
My argument is that there is not that much ‘logical’ thing about the period of 
which the DAG run will base its data processing against, other than the fact 
that the date is not the ‘actual’ run date. Sometimes, the adjective ‘logical’ 
describes the concept of something that’s virtual or only existing conceptually 
(e.g. logical drive, logical deletion, etc.). However, one of the problems that 
I see of using ‘logical’ is that people (who are new to Airflow) can still be 
quite confusing to understand the meaning of ‘logical.’ They might wonder: ‘if 
this date is the ‘logical’ date, then are all the other dates illogical dates? 
What then, would be the real dates?’

My opinion is that perhaps using the term ‘Data Date’ can actually be clearer 
to describe this period.
Airflow DAGs are meant, mostly, for Data Orchestration, and hence the DAG runs 
are generally for processing DATA. In that case, the start and end period of 
this processing can be called Data start and Data end date, to more clearly 
describe that the data that the DAG is going to be ‘processing’ are the ones 
between the data start and end date, hence less confusion as users would 
definitely know what they represent more clearly.

I’ve also seen a similar term being used in Airflow’s DAG run UI page, where 
the popup actually has ‘data start date and end date’  and when I saw that, I 
was able to immediately understand what it meant, rather than a rather 
ambiguous ‘logical date’ (I still would have asked someone about the meaning of 
‘logical’ if I saw that).

I’m not sure this decision has been finalized and the team would like to move 
forward with the ‘logical date,’ but since I have an opinion that that too can 
be confusing for the first time users, I just wanted to let out my opinion on 
that, and propose something different.

Hope this helps,
Howard

On 2021/08/03 09:35:15 Tzu-ping Chung wrote:
> Hi all,
> 
> I want to give a heads-up on a minor modification I just made to AIP-39.
> AIP-39 originally proposed renaming execution_date to schedule_date since the 
> old name was confusing (it’s not when the DAG run is actually executed). 
> However, while implementing the AIP and drafting documentation to it, I 
> realised schedule_date is also quite confusing—the date is also not when the 
> DAG run is scheduled to run.
> I went through the current documentation to get an idea how it currently 
> explains execution_date, and found multiple instances the adjective “logical” 
> is used:
> Concepts → DAG 
> (https://airflow.apache.org/docs/apache-airflow/stable/concepts/dags.html#running-dags):
>  “[Each DAG run] has a defined execution_date, which identifies the logical 
> date and time it is running for - not the actual time when it was started.”
> Tutorial 
> (https://airflow.apache.org/docs/apache-airflow/stable/tutorial.html): “The 
> date specified in this context is called execution_date. This is the logical 
> date, which simulates the scheduler running your task or dag at a specific 
> date and time […].“
> 
> The GCS operator 
> (https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/gcs.html):
>  “The time span is defined by the DAG instance logical execution timestamp 
> (execution_date, start of time span) and the timestamp when the next DAG 
> instance execution is scheduled (end of time span).”
> 
> So, after talking to Ash, I have renamed the field to logical_date. This 
> would make the name more consistent to the term used to describe it, and 
> hopefully the concept easier to understand.
> 
> TP
> 

Reply via email to