Thanks for the clarification Gen. On Fri, Dec 23, 2022, 14:20 Gen Luo <luogen...@gmail.com> wrote:
> Hi Raihan, > If the job is a streaming job, all exchanges are pipeline exchanges. That > means the upstream and downstream tasks must be running at the same time > and exchange data realtimely. While blocking data exchange means the > upstream task will store the result somewhere(on the local disk by > default), and when the downstream task starts, it will consume the result > from there. Blocking data exchanges only exist in the batch jobs. > > >Is windowing in a streaming job considered a blocking data exchange? > No, it's pipelined. The window operator stores the input data in the > state, while the input data comes from the upstream task realtimely. > > >In a job with a DAG such as A->B, B->C, B->D where all edges are forward > edges, how many pipelined regions are there? > Since all edges are forward edges, A, B, C and D must have the same > parallelism, and there will be the same pipelined regions as the > parallelism. > > On Wed, Dec 21, 2022 at 5:14 PM Raihan Sunny <raihan.su...@selise.ch> > wrote: > >> Hello Gen, >> >> I do have a basic understanding of fault tolerance and how checkpointing >> is used to achieve it. What I'm confused about from reading the blog post >> [1] is what are considered blocking data exchanges. Is windowing in a >> streaming job considered a blocking data exchange? In a job with a DAG such >> as A->B, B->C, B->D where all edges are forward edges, how many pipelined >> regions are there? >> >> [1] >> https://flink.apache.org/2020/12/15/pipelined-region-sheduling.html#intermediate-results >> >> On Tue, Dec 20, 2022 at 3:26 PM Gen Luo <luogen...@gmail.com> wrote: >> >>> Hi Raihan, >>> >>> As the description of PipelinedRegion says, a pipelined region is a set >>> of vertices connected via pipelined data exchanges. For example in a job >>> with such a dag A->B, both of the tasks have two subtasks. If the edge >>> between A and B is a forward edge, there are two pipelined regions: (A1, >>> B1) and (A2, B2). While if the edge is a keyby edge or a rebalance edge, >>> there'll be only one pipelined region containing all 4 subtasks. >>> >>> PipelinedRegion is an important concept in fault tolerance. When one >>> subtask fails, all subtasks in the pipelined region it belongs to must >>> restart and recover from the most recent checkpoint together to ensure the >>> result is correct, while other pipelined regions can keep running. You can >>> learn more about the fault tolerance from the following articles, which I'm >>> sure will be helpful to understand the pipelined region. >>> >>> [1] >>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/learn-flink/fault_tolerance/ >>> [2] >>> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/ >>> >>> On Tue, Dec 20, 2022 at 4:32 PM Schwalbe Matthias < >>> matthias.schwa...@viseca.ch> wrote: >>> >>>> Hi Sunny, >>>> >>>> >>>> >>>> Welcome to Flink đ. >>>> >>>> >>>> >>>> The next thing for you to consider is to setup checkpointing [1] which >>>> allows a failing job to pick up from where it stopped. >>>> >>>> >>>> >>>> Sincere greetings from the supposed close-by Zurich đ >>>> >>>> >>>> >>>> Thias >>>> >>>> >>>> >>>> >>>> >>>> [1] >>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/ >>>> >>>> >>>> >>>> >>>> >>>> *From:* Raihan Sunny <raihan.su...@selise.ch> >>>> *Sent:* Tuesday, December 20, 2022 6:30 AM >>>> *To:* user@flink.apache.org >>>> *Subject:* Understanding pipelined regions >>>> >>>> >>>> >>>> â *EXTERNAL MESSAGE â **CAUTION: Think Before You Click *â >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> I'm quite new to the world of stream and batch processing. I've been >>>> reading about pipelined regions in Flink and am quite confused by what it >>>> means. My specific problem involves a streaming job that looks like the >>>> following: >>>> >>>> >>>> >>>> 1. There is a Kafka source that takes in an input data that sets off a >>>> series of operations >>>> >>>> 2. As part of the first operation, I have an operator that produces >>>> multiple values, each of which has to be fed into several different >>>> operators in parallel >>>> >>>> 3. The operators each produce a result which I keyBy and merge together >>>> using the union operator >>>> >>>> 4. The merged result is then written to a Kafka sink >>>> >>>> >>>> >>>> The problem is that when one of the parallel operators throws an >>>> exception, all the tasks in the entire pipeline gets restarted including >>>> the source which then replays the input data and the process starts off >>>> once again. My question is if it's possible to make the tasks of only the >>>> branch that failed restart rather than the whole job. I do realize that it >>>> is possible to split up the job such that the first operator produces its >>>> output to a sink and having that as the source to the subsequent operations >>>> can mitigate the problem. I was just wondering if it's possible in the >>>> scenario that I have described above. In general, how can I "create" a >>>> pipelined region? >>>> >>>> >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Sunny >>>> >>>> >>>> >>>> [image: Image removed by sender.] >>>> >>>> Secure Link Services Group >>>> ZĂŒrich: The Circle 37, 8058 ZĂŒrich-Airport, Switzerland >>>> Munich: Tal 44, 80331 MĂŒnchen, Germany >>>> Dubai: Building 3, 3rd Floor, Dubai Design District, Dubai, United Arab >>>> Emirates >>>> Dhaka: Midas Center, Road 16, Dhanmondi, Dhaka 1209, Bangladesh >>>> Thimphu: Bhutan Innovation Tech Center, Babesa, P.O. Box 633, Thimphu, >>>> Bhutan >>>> >>>> Visit us: www.selise.ch >>>> >>>> >>>> >>>> *Important Note:** This e-mail and any attachment are confidential and >>>> may contain trade secrets and may well also be legally privileged or >>>> otherwise protected from disclosure. If you have received it in error, you >>>> are on notice of its status. Please notify us immediately by reply e-mail >>>> and then delete this e-mail and any attachment from your system. If you are >>>> not the intended recipient please understand that you must not copy this >>>> e-mail or any attachment or disclose the contents to any other person. >>>> Thank you for your cooperation.* >>>> Diese Nachricht ist ausschliesslich fĂŒr den Adressaten bestimmt und >>>> beinhaltet unter UmstĂ€nden vertrauliche Mitteilungen. Da die >>>> Vertraulichkeit von e-Mail-Nachrichten nicht gewĂ€hrleistet werden kann, >>>> ĂŒbernehmen wir keine Haftung fĂŒr die GewĂ€hrung der Vertraulichkeit und >>>> Unversehrtheit dieser Mitteilung. Bei irrtĂŒmlicher Zustellung bitten wir >>>> Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie >>>> eventueller AnhĂ€nge. Jegliche unberechtigte Verwendung oder Verbreitung >>>> dieser Informationen ist streng verboten. >>>> >>>> This message is intended only for the named recipient and may contain >>>> confidential or privileged information. As the confidentiality of email >>>> communication cannot be guaranteed, we do not accept any responsibility for >>>> the confidentiality and the intactness of this message. If you have >>>> received it in error, please advise the sender by return e-mail and delete >>>> this message and any attachments. Any unauthorised use or dissemination of >>>> this information is strictly prohibited. >>>> >>> >> >> Secure Link Services Group >> ZĂŒrich: The Circle 37, 8058 ZĂŒrich-Airport, Switzerland >> Munich: Tal 44, 80331 MĂŒnchen, Germany >> Dubai: Building 3, 3rd Floor, Dubai Design District, Dubai, United Arab >> Emirates >> Dhaka: Midas Center, Road 16, Dhanmondi, Dhaka 1209, Bangladesh >> Thimphu: Bhutan Innovation Tech Center, Babesa, P.O. Box 633, Thimphu, >> Bhutan >> >> Visit us: www.selise.ch >> >> *Important Note: This e-mail and any attachment are confidential and may >> contain trade secrets and may well also be legally privileged or otherwise >> protected from disclosure. If you have received it in error, you are on >> notice of its status. Please notify us immediately by reply e-mail and then >> delete this e-mail and any attachment from your system. If you are not the >> intended recipient please understand that you must not copy this e-mail or >> any attachment or disclose the contents to any other person. Thank you for >> your cooperation.* >> > -- Secure Link Services Group ZĂŒrich: The Circle 37, 8058 ZĂŒrich-Airport, Switzerland Munich: Tal 44, 80331 MĂŒnchen, Germany Dubai: Building 3, 3rd Floor, Dubai Design District, Dubai, United Arab Emirates Dhaka: Midas Center, Road 16, Dhanmondi, Dhaka 1209, Bangladesh Thimphu: Bhutan Innovation Tech Center, Babesa, P.O. Box 633, Thimphu, Bhutan Visit us: www.selise.ch <http://www.selise.ch> -- *Important Note: This e-mail and any attachment are confidential and may contain trade secrets and may well also be legally privileged or otherwise protected from disclosure. If you have received it in error, you are on notice of its status. Please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. If you are not the intended recipient please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.*