I think the process_state model would be useful in the HttpEventTrigger
that I am working on. The HttpEventTrigger sends requests to an API and
triggers an event based on a user-defined response_check function. If the
response_check function needs to evaluate multiple API responses
cumulatively, it would be useful to store and retrieve past API responses.
It would also be useful for task instances to retrieve the process_state
data. For the HttpEventTrigger, this would mean enabling task instances to
retrieve and act on API response data received within the trigger.

I'm sure there would be similar use cases in other EventTriggers as well.

On Thu, Jun 12, 2025 at 1:26 PM Daniel Standish
<daniel.stand...@astronomer.io.invalid> wrote:

> Alright since I was summoned...
>
> When I was an airflow user, I did a lot of incremental processes.  Pretty
> much everything was incremental.  Data warehousing / analytics shop /
> e-commerce reporting / integrations this kind of thing.
>
> One common use case is implementing something like a fivetran, which I did
> a few times.
>
> For me, execution date was almost entirely useless.  Execution date is
> there for partition-driven workloads.
>
> For incremental, you need to track your state somehow.
>
> That's why I experimented with various state storage interfaces, and
> developed a watermark operator, which we used a lot.  And I demoed a
> version of them here <https://github.com/apache/airflow/pull/19051>, and
> authored AIP-30
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-30%3A+State+persistence
> >
> .
>
> I wrote AIP-30 when I was still contributing to Airflow for funsies, and
> didn't get a ton of engagement on it so it sort of languished, then when I
> became full time airflow dev, there were other priorities.
>
> But to me the use case is still pretty obvious.  Nothing we have added
> since then really explicitly supports incremental workflows.
>
> To me the question is (as it was then, and I think I mentioned this in the
> AIP), do you provide a generic interface where user controls namespace and
> name of the state you are trying to persist?  Or instead do you provide
> mechanisms to store state on existing objects.  So e.g. on trigger, on
> task, on whatever, you can do `self.save_state(key...)` etc.  In my
> proposal I think I leaned towards generic, and it seems Jake leans the same
> way.  There are pros and cons.
>
> In terms of the underlying storage mechanism, it seems pretty reasonable to
> allow this to be pluggable like everything else.  I used different
> "backends" at different times -- s3, or database.  Typically you don't need
> mega low latency with the type of tasks Airflow is used for.
>

Reply via email to