I commented on the AIP. Overall, I agree on the issue and that this is something we should solve. This is something I brought up in AIP-82 when I wrote it as "infinite scheduling issue" (without offering any solution :)). I also think this is a blocker to implement many different triggers based on non queue based assets.
If some are interested in that topic, it would be great to have more feedbacks :) On 2025/07/28 14:16:10 Jake Roach wrote: > Guangyang and I have created a draft AIP, which you can find here: > https://cwiki.apache.org/confluence/display/AIRFLOW/%5BDRAFT%5D+AIP-93. > Curious to get folks' thoughts and opinions! > > > On Fri, Jul 25, 2025 at 9:12 PM Guangyang Li <lig...@gmail.com> wrote: > > > Jake and I have put down the details in this AIP draft doc > > < > > https://docs.google.com/document/d/1gnGpTDhTpxpC48-kvr3jxL4GWyagMKzItWV-30GZu2U/edit?tab=t.0 > > > > > . > > > > Instead of calling this feature state persisting, we decided to call it > > asset watermark, > > as it's mainly an enhancement of asset-oriented processing. You can find > > the details > > and examples in the doc. Please comment. We plan to convert it to an > > official AIP > > doc later once it's stable. > > > > > > Guangyang > > > > On Thu, Jun 12, 2025 at 4:48 PM Karen Braganza <karenbraganz...@gmail.com> > > wrote: > > > > > I think the process_state model would be useful in the HttpEventTrigger > > > that I am working on. The HttpEventTrigger sends requests to an API and > > > triggers an event based on a user-defined response_check function. If the > > > response_check function needs to evaluate multiple API responses > > > cumulatively, it would be useful to store and retrieve past API > > responses. > > > It would also be useful for task instances to retrieve the process_state > > > data. For the HttpEventTrigger, this would mean enabling task instances > > to > > > retrieve and act on API response data received within the trigger. > > > > > > I'm sure there would be similar use cases in other EventTriggers as well. > > > > > > On Thu, Jun 12, 2025 at 1:26 PM Daniel Standish > > > <daniel.stand...@astronomer.io.invalid> wrote: > > > > > > > Alright since I was summoned... > > > > > > > > When I was an airflow user, I did a lot of incremental processes. > > Pretty > > > > much everything was incremental. Data warehousing / analytics shop / > > > > e-commerce reporting / integrations this kind of thing. > > > > > > > > One common use case is implementing something like a fivetran, which I > > > did > > > > a few times. > > > > > > > > For me, execution date was almost entirely useless. Execution date is > > > > there for partition-driven workloads. > > > > > > > > For incremental, you need to track your state somehow. > > > > > > > > That's why I experimented with various state storage interfaces, and > > > > developed a watermark operator, which we used a lot. And I demoed a > > > > version of them here <https://github.com/apache/airflow/pull/19051>, > > and > > > > authored AIP-30 > > > > < > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-30%3A+State+persistence > > > > > > > > > . > > > > > > > > I wrote AIP-30 when I was still contributing to Airflow for funsies, > > and > > > > didn't get a ton of engagement on it so it sort of languished, then > > when > > > I > > > > became full time airflow dev, there were other priorities. > > > > > > > > But to me the use case is still pretty obvious. Nothing we have added > > > > since then really explicitly supports incremental workflows. > > > > > > > > To me the question is (as it was then, and I think I mentioned this in > > > the > > > > AIP), do you provide a generic interface where user controls namespace > > > and > > > > name of the state you are trying to persist? Or instead do you provide > > > > mechanisms to store state on existing objects. So e.g. on trigger, on > > > > task, on whatever, you can do `self.save_state(key...)` etc. In my > > > > proposal I think I leaned towards generic, and it seems Jake leans the > > > same > > > > way. There are pros and cons. > > > > > > > > In terms of the underlying storage mechanism, it seems pretty > > reasonable > > > to > > > > allow this to be pluggable like everything else. I used different > > > > "backends" at different times -- s3, or database. Typically you don't > > > need > > > > mega low latency with the type of tasks Airflow is used for. > > > > > > > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org