Hi Raihan,

As the description of PipelinedRegion says, a pipelined region is a set of
vertices connected via pipelined data exchanges. For example in a job with
such a dag A->B, both of the tasks have two subtasks. If the edge between A
and B is a forward edge, there are two pipelined regions: (A1, B1) and (A2,
B2). While if the edge is a keyby edge or a rebalance edge, there'll be
only one pipelined region containing all 4 subtasks.

PipelinedRegion is an important concept in fault tolerance. When one
subtask fails, all subtasks in the pipelined region it belongs to must
restart and recover from the most recent checkpoint together to ensure the
result is correct, while other pipelined regions can keep running. You can
learn more about the fault tolerance from the following articles, which I'm
sure will be helpful to understand the pipelined region.

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/learn-flink/fault_tolerance/
[2]
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/concepts/stateful-stream-processing/

On Tue, Dec 20, 2022 at 4:32 PM Schwalbe Matthias <
matthias.schwa...@viseca.ch> wrote:

> Hi Sunny,
>
>
>
> Welcome to Flink 😊.
>
>
>
> The next thing for you to consider is to setup checkpointing [1] which
> allows a failing job to pick up from where it stopped.
>
>
>
> Sincere greetings from the supposed close-by  Zurich 😊
>
>
>
> Thias
>
>
>
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/
>
>
>
>
>
> *From:* Raihan Sunny <raihan.su...@selise.ch>
> *Sent:* Tuesday, December 20, 2022 6:30 AM
> *To:* user@flink.apache.org
> *Subject:* Understanding pipelined regions
>
>
>
> ⚠*EXTERNAL MESSAGE – **CAUTION: Think Before You Click *⚠
>
>
>
> Hi,
>
>
>
> I'm quite new to the world of stream and batch processing. I've been
> reading about pipelined regions in Flink and am quite confused by what it
> means. My specific problem involves a streaming job that looks like the
> following:
>
>
>
> 1. There is a Kafka source that takes in an input data that sets off a
> series of operations
>
> 2. As part of the first operation, I have an operator that produces
> multiple values, each of which has to be fed into several different
> operators in parallel
>
> 3. The operators each produce a result which I keyBy and merge together
> using the union operator
>
> 4. The merged result is then written to a Kafka sink
>
>
>
> The problem is that when one of the parallel operators throws an
> exception, all the tasks in the entire pipeline gets restarted including
> the source which then replays the input data and the process starts off
> once again. My question is if it's possible to make the tasks of only the
> branch that failed restart rather than the whole job. I do realize that it
> is possible to split up the job such that the first operator produces its
> output to a sink and having that as the source to the subsequent operations
> can mitigate the problem. I was just wondering if it's possible in the
> scenario that I have described above. In general, how can I "create" a
> pipelined region?
>
>
>
>
>
> Thanks,
>
> Sunny
>
>
>
> [image: Image removed by sender.]
>
> Secure Link Services Group
> Zürich: The Circle 37, 8058 Zürich-Airport, Switzerland
> Munich: Tal 44, 80331 München, Germany
> Dubai: Building 3, 3rd Floor, Dubai Design District, Dubai, United Arab
> Emirates
> Dhaka: Midas Center, Road 16, Dhanmondi, Dhaka 1209, Bangladesh
> Thimphu: Bhutan Innovation Tech Center, Babesa, P.O. Box 633, Thimphu,
> Bhutan
>
> Visit us: www.selise.ch
>
>
>
> *Important Note:** This e-mail and any attachment are confidential and
> may contain trade secrets and may well also be legally privileged or
> otherwise protected from disclosure. If you have received it in error, you
> are on notice of its status. Please notify us immediately by reply e-mail
> and then delete this e-mail and any attachment from your system. If you are
> not the intended recipient please understand that you must not copy this
> e-mail or any attachment or disclose the contents to any other person.
> Thank you for your cooperation.*
> Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und
> beinhaltet unter Umständen vertrauliche Mitteilungen. Da die
> Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann,
> übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und
> Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir
> Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie
> eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung
> dieser Informationen ist streng verboten.
>
> This message is intended only for the named recipient and may contain
> confidential or privileged information. As the confidentiality of email
> communication cannot be guaranteed, we do not accept any responsibility for
> the confidentiality and the intactness of this message. If you have
> received it in error, please advise the sender by return e-mail and delete
> this message and any attachments. Any unauthorised use or dissemination of
> this information is strictly prohibited.
>

Reply via email to