[jira] [Created] (FLINK-24694) Translate "Checkpointing under backpressure" page into Chinese

2021-10-29 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-24694: -- Summary: Translate "Checkpointing under backpressure" page into Chinese Key: FLINK-24694 URL: https://issues.apache.org/jira/browse/FLINK-24694 Proj

Re: Checkpointing under backpressure

2020-07-31 Thread Piotr Nowojski
buffers, we could first logging and processing these > >> buffers > >> >> and if they do not have buffers, we can still processing the buffers > >> from > >> >> the channels that has seen barriers. Therefore, It seems prioritizing > >>

Re: Checkpointing under backpressure

2020-07-30 Thread Arvid Heise
ot;drained" >> during >> >> the timeout, as pointed out by Stephan. With such a timeout, we are >> very >> >> likely not need to snapshot the input buffers, which would be very >> similar >> >> to the current aligned checkpoint mechanism. &g

Re: Checkpointing under backpressure

2019-12-04 Thread Thomas Weise
ay get "drained" > during > >> the timeout, as pointed out by Stephan. With such a timeout, we are very > >> likely not need to snapshot the input buffers, which would be very > similar > >> to the current aligned checkpoint mechanism. > >> > >> Best, &

Re: Checkpointing under backpressure

2019-10-02 Thread Arvid Heise
gt; implement to make task could continue processing buffers as soon as >> possible. >> >> Thanks for the further explannation of requirements for speeding up >> checkpoints in backpressure scenario. To make the savepoint finish quickly >> and then tune the setting to avoid backpressure is really a pratical case. >> I think this s

Re: Checkpointing under backpressure

2019-10-02 Thread Arvid Heise
--- > From:zhijiang > Send Time:2019 Aug. 15 (Thu.) 02:22 > To:dev > Subject:Re: Checkpointing under backpressure > > > For the checkpoint to complete, any buffer that > > arrived prior to the barrier would be to be part of the checkpointed > state. > > Yes, I agree

Re: Checkpointing under backpressure

2019-08-15 Thread Yun Gao
Subject:Re: Checkpointing under backpressure > For the checkpoint to complete, any buffer that > arrived prior to the barrier would be to be part of the checkpointed state. Yes, I agree. > So wouldn't it be important to finish persisting these buffers as fast as > possible by pri

Re: Checkpointing under backpressure

2019-08-15 Thread Stephan Ewen
> >>> snapshot > > > >>>>> isolation. Also, the original snapshotting gives a lot of > potential > > > for > > > >>>>> flink to make proper transactional commits externally. > > > >>>>>> > > > &

Re: Checkpointing under backpressure

2019-08-14 Thread zhijiang
ss issue also applies to the savepoint use case. We would need to be able to take a savepoint fast in order to roll forward a fix that can alleviate the backpressure (like changing parallelism or making a different configuration change). > > Best, > Zhijiang >

Re: Checkpointing under backpressure

2019-08-14 Thread Thomas Weise
savepoint use case. We would need to be able to take a savepoint fast in order to roll forward a fix that can alleviate the backpressure (like changing parallelism or making a different configuration change). > > Best, > Zhijiang > -- > From:Stephan Ewen

Re: Checkpointing under backpressure

2019-08-14 Thread zhijiang
ign. E.g. the sink commit delay might not be coverd by unaligned solution. Best, Zhijiang ------ From:Stephan Ewen Send Time:2019年8月14日(星期三) 17:43 To:dev Subject:Re: Checkpointing under backpressure Quick note: The current implementation is Align ->

Re: Checkpointing under backpressure

2019-08-14 Thread Stephan Ewen
> >>>>>>> > >>>>>>> Paris: > >>>>>>> > >>>>>>> Thanks for the explanation Paris. I’m starting to understand this > more > >>>>> and I like the idea of snapshot

Re: Checkpointing under backpressure

2019-08-14 Thread Piotr Nowojski
>>>>> Another thing is that from the wiki description I understood that the >>>>> initial checkpointing is not initialised by any checkpoint barrier, but >>> by >>>>> an independent call/message from the Observer. I haven’t played with >>> this >>>>> idea a lot, but I had some discussion with Nico and it s

Re: Checkpointing under backpressure

2019-08-14 Thread Paris Carbone
ata) >>>>>> b) for any input channel for which it hasn’t yet received checkpoint >>>> barrier, the data are being added to the checkpoint >>>>>> c) once a channel (for example l1) receives a checkpoint barrier, the >>>> Task blocks reads from that channel (?) >>>

Re: Checkpointing under backpressure

2019-08-14 Thread Stephan Ewen
;> slowest Tasks. Right? > >>>> > >>>> Couple of intriguing thoughts are: > >>>> 3. checkpoint barriers overtaking the output buffers > >>>> 4. can we keep processing some data (in order to not waste CPU cycles) > >> after we have taking the snapshot of the

Re: Checkpointing under backpressure

2019-08-14 Thread Paris Carbone
;> intriguing. But probably there is also a benefit of to not continue >> reading >>>>> I1 since that could speed up retrieval from I2. Also, if the user code >> is >>>>> the cause of backpressure, this would avoid pumping more data into the >>>>> pro

Re: Checkpointing under backpressure

2019-08-14 Thread Paris Carbone
n a fixed interval with less IO overhead. > > Best, > Yun > > -- > From:Piotr Nowojski > Send Time:2019 Aug. 14 (Wed.) 18:38 > To:Paris Carbone > Cc:dev ; zhijiang ; Nico > Kruber > Subject:Re: Checkpointing under backpressure >

Re: Checkpointing under backpressure

2019-08-14 Thread Stephan Ewen
y similar with > the > >>>> way of overtaking we proposed before. > >>>> > >>>> There are some tiny difference: > >>>> The way of overtaking might need to snapshot all the input/output > queues. > >>>> Chandy Lamport seems only need to snaphost (n-1) input

Re: Checkpointing under backpressure

2019-08-14 Thread Yun Gao
(Wed.) 18:38 To:Paris Carbone Cc:dev ; zhijiang ; Nico Kruber Subject:Re: Checkpointing under backpressure Hi again, Zhu Zhu let me think about this more. Maybe as Paris is writing, we do not need to block any channels at all, at least assuming credit base flow control. Regarding what should

Re: Checkpointing under backpressure

2019-08-14 Thread Piotr Nowojski
he state sizea bit. But normally >>>> there should be less buffers for the first input channel with barrier. >>>> The output barrier still follows with regular data stream in Chandy >>>> Lamport, the same way as current flink. For overtaking way, we need to pay >>&g

Re: Checkpointing under backpressure

2019-08-14 Thread Paris Carbone
> channel, so the Chandy Lamport could benefit well. But for the case of all >>> balanced heavy load input channels, I mean the first arrived barrier might >>> still take much time, then the overtaking way could still fit well to speed >>> up checkpoint. >>> A

Re: Checkpointing under backpressure

2019-08-14 Thread Zhu Zhu
on downstream side. > >> In the backpressure caused by data skew, the first barrier in almost > empty > >> input channel should arrive much eariler than the last heavy load input > >> channel, so the Chandy Lamport could benefit well. But for the case of > all >

Re: Checkpointing under backpressure

2019-08-14 Thread Piotr Nowojski
lly >> considering some implementation details . >> >> Best, >> Zhijiang >> -- >> From:Paris Carbone >> Send Time:2019年8月13日(星期二) 14:03 >> To:dev >> Cc:zhijiang >> Subj

Re: Checkpointing under backpressure

2019-08-13 Thread zhijiang
jiang -- From:Thomas Weise Send Time:2019年8月14日(星期三) 06:00 To:dev ; zhijiang Cc:Paris Carbone Subject:Re: Checkpointing under backpressure Great discussion! I'm excited that this is already under consideration! Are there any JIRAs or other traces of discussion to follow? Paris, if I

Re: Checkpointing under backpressure

2019-08-13 Thread Thomas Weise
- > From:Paris Carbone > Send Time:2019年8月13日(星期二) 14:03 > To:dev > Cc:zhijiang > Subject:Re: Checkpointing under backpressure > > yes! It’s quite similar I think. Though mind that the devil is in the > details, i.

Re: Checkpointing under backpressure

2019-08-13 Thread zhijiang
proposed suggestion is helpful on my side, especially considering some implementation details . Best, Zhijiang -- From:Paris Carbone Send Time:2019年8月13日(星期二) 14:03 To:dev Cc:zhijiang Subject:Re: Checkpointing under backpressure

Re: Checkpointing under backpressure

2019-08-13 Thread Paris Carbone
yes! It’s quite similar I think. Though mind that the devil is in the details, i.e., the temporal order actions are taken. To clarify, let us say you have a task T with two input channels I1 and I2. The Chandy Lamport execution flow is the following: 1) T receives barrier from I1 and... 2) ..

Re: Checkpointing under backpressure

2019-08-13 Thread Piotr Nowojski
Thanks for the input. Regarding the Chandy-Lamport snapshots don’t you still have to wait for the “checkpoint barrier” to arrive in order to know when have you already received all possible messages from the upstream tasks/operators? So instead of processing the “in flight” messages (as the Flin

Re: Checkpointing under backpressure

2019-08-13 Thread Paris Carbone
Interesting problem! Thanks for bringing it up Thomas. Ignore/Correct me if I am wrong but I believe Chandy-Lamport snapshots [1] would help out solve this problem more elegantly without sacrificing correctness. - They do not need alignment, only (async) logging for in-flight records between th

Re: Checkpointing under backpressure

2019-08-13 Thread Piotr Nowojski
Hi Thomas, As Zhijiang has responded, we are now in the process of discussing how to address this issue and one of the solution that we are discussing is exactly what you are proposing: checkpoint barriers overtaking the in flight data and make the in flight data part of the checkpoint. If eve

Re: Checkpointing under backpressure

2019-08-12 Thread zhijiang
Hi Thomas, Thanks for proposing this concern. The barrier alignment takes long time in backpressure case which could cause several problems: 1. Checkpoint timeout as you mentioned. 2. The recovery cost is high once failover, because much data needs to be replayed. 3. The delay for commit-based s

Checkpointing under backpressure

2019-08-12 Thread Thomas Weise
Hi, One of the major operational difficulties we observe with Flink are checkpoint timeouts under backpressure. I'm looking for both confirmation of my understanding of the current behavior as well as pointers for future improvement work: Prior to introduction of credit based flow control in the