IMHO It does not really matter if they are the same or not and which one is
the same. This is actually the beauty of the "abstract" and "vague"
logical_date. Those are different "concepts" that you use in different
cases.

The logical date **might** be the same as one of the interval_dates. It's
just an "abstract" representation of the particular "run_id" - and you
should not care, because "logical_date" makes sense for some cases, but
"data_interval_start/end" for other cases.

* If your task is about "data_interval" - by all means use the
data_interval_start and end.
* if your task is not about "interval" - use the "logical_date".

That is how I see it at least. By using a different approach when you use
different cases the users might free their "mental-mapping" - they do not
have to map the "logical_date" to either "start" or "end". It does not
matter. but if they process a data interval, they have very clear
boundaries of ("start" <-> "end") range that they can use without even
thinking on. how "logical_date" maps to it.

For me - those are completely different cases and they are orthogonal to
each other (even if some of those values are the same).

J.

On Sun, Feb 6, 2022 at 7:00 PM Howard Yoo <[email protected]> wrote:

> I see, thank you for the info.
> I didn’t know about the existence of the data_interval_start and end
> dates. I briefly looked at those definitions, and was wondering… wouldn’t
> they be equal to the logical dates? I do see those variables mentioned in
> https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html,
> and also see the ds and ts meaning logical dates. In practice, are those
> dates and timestamps supposed to be the same?
>
> Wonder also, if the ‘data_’ prefix would be necessary if airfow would be
> used to orchestrate far more things in the future (perhaps this may be
> another thread), but in general, we should have a continuous discussions to
> further clearly define all those dates for the improved usage of airflow.
>
> Howard
>
> Sent from my iPhone
>
> On Feb 6, 2022, at 11:15 AM, Jarek Potiuk <[email protected]> wrote:
>
> 
> We already have `data_interval_start` and `data_interval_end' as fields,
> and we need something else that can have more "abstract" meaning to apply
> to the whole run as "single thing". Using interval_date would be a bit
> ambiguous.
>
> "Did you mean start or end actually when you mentioned interval date?" -
> is the question that I anticipate happening a lot if we mix those.
>
> J.
>
>
>
> On Sun, Feb 6, 2022 at 6:04 PM Howard Yoo <[email protected]> wrote:
>
>> Now I can understand why the data_date may not be a perfect fit to
>> describe the term.
>>
>> This is not to be against the logical_date, but what about
>> ‘interval_date?’ We have the schedule interval, which defines the duration
>> of the interval (e.g. 1day), so wouldn’t interval start and end date be a
>> better representation of it rather than the logical date?
>>
>> Just want to hear whether that has been brought up already or not.
>>
>> Howard
>>
>> Sent from my iPhone
>>
>> On Feb 6, 2022, at 10:25 AM, Jarek Potiuk <[email protected]> wrote:
>>
>> 
>> I wholeheartedly agree with TP on that one.  I think while some time
>> ago "data date" could make sense, Airflow's future is much more than just
>> processing data intervals.
>> This is the primary use case and this is where Airflow shines od course,
>> but one of the good examples of how Airflow is used out there, and while we
>> are not really encouraging it, there are not only legitimate, but also
>> something that I hope Airflow will treat as first-time citizens soon (and
>> it kind of already is with custom timetables).
>>
>> Just an example here - for me one of the most eye-opening talks in last
>> year's Airflow Summit
>> https://airflowsummit.org/sessions/2021/provision-as-a-service/
>> In this talk Cloudflare engineers explain how they manage the CloudFlare
>> infrastructure using Airflow.
>>
>> The "Data date" has no meaning in this case. But the "logical Date"
>> (which is the vaguest-possible one as TP explained) continues to have one.
>> This is the "logical date of the infrastructure provisioning". Thanks
>> to Airflow (as I understand it) Cloudflare is able to re-provision their
>> services to "yesterday's logical date infrastructure"  today - for example.
>>
>> That would not fly with "data date".
>>
>> J,
>>
>>

Reply via email to