Re: Unaligned checkpoint blocked by long Async operation

2024-03-17 Thread Zakelly Lan
I agree. Also create https://issues.apache.org/jira/browse/FLINK-34704 for tracking and further discussion. Best, Zakelly On Fri, Mar 15, 2024 at 2:59 PM Gyula Fóra wrote: > Posting this to dev as well... > > Thanks Zakelly, > Sounds like a solution could be to add a new different version of y

Re: Unaligned checkpoint blocked by long Async operation

2024-03-15 Thread Gyula Fóra
Posting this to dev as well... Thanks Zakelly, Sounds like a solution could be to add a new different version of yield that would actually yield to the checkpoint barrier too. That way operator implementations could decide whether any state modification may or may not have happened and can optiona

Re: Unaligned checkpoint blocked by long Async operation

2024-03-14 Thread Zakelly Lan
Hi Gyula, Processing checkpoint halfway through `processElement` is problematic. The current element will not be included in the input in-flight data, and we cannot assume it has taken effect on the state by user code. So the best way is to treat `processElement` as an 'atomic' operation. I guess

Re: Unaligned checkpoint blocked by long Async operation

2024-03-14 Thread Gyula Fóra
Thank you for the detailed analysis Zakelly. I think we should consider whether yield should process checkpoint barriers because this puts quite a serious limitation on the unaligned checkpoints in these cases. Do you know what is the reason behind the current priority setting? Is there a problem

Re: Unaligned checkpoint blocked by long Async operation

2024-03-14 Thread Zakelly Lan
Hi Gyula, Well I tried your example in local mini-cluster, and it seems the source can take checkpoints but it will block in the following AsyncWaitOperator. IIUC, the unaligned checkpoint barrier should wait until the current `processElement` finishes its execution. In your example, the element q

Unaligned checkpoint blocked by long Async operation

2024-03-14 Thread Gyula Fóra
Hey all! I encountered a strange and unexpected behaviour when trying to use unaligned checkpoints with AsyncIO. If the async operation queue is full and backpressures the pipeline completely, then unaligned checkpoints cannot be completed. To me this sounds counterintuitive because one of the be