Hi Till,

Thanks for your reply.

Is there any option to disconnect the side outputs from the pipelined data 
exchanges of the main stream.

The benefit of side outputs is very high regarding performance and useability 
plus it fits the use case here very nicely. Though this pipelined connection to 
the main stream is a real concern as all of the streams of the job are 
cancelled if one fails. With many side outputs that is not really an option we 
can maintain at scale.

So we are looking for options on how to preserve the side outputs but still get 
the individual region based failover to work.

The other thing I have noticed is that when using side outputs is that when 
looking at the Flink Web UI those streams are just named 
Unregistered_DataStream_. Setting up .name and .uid on the sideoutput stream 
does not change this. Is there any way on naming the side output streams so the 
customized name appears in the Flink Web UI?

Any help or tips are highly appreciated.

Many thanks in advance.

Cheers,

Patrick
--
Patrick Eifler

Senior Software Engineer (BI)

Cloud Gaming Engineering & Infrastructure
Sony Interactive Entertainment LLC

Wilhelmstraße 118, 10963 Berlin

Germany

E: patrick.eif...@sony.com

From: Till Rohrmann <trohrm...@apache.org>
Date: Tuesday, 24. November 2020 at 17:20
To: "Eifler, Patrick" <patrick.eif...@sony.com>
Cc: "user@flink.apache.org" <user@flink.apache.org>
Subject: Re: How to setup Regions for Fault Tolerance in Flink when using Side 
Outputs

Hi Patrick,

Flink supports regional failover [1] which only restarts all tasks connected 
via pipelined data exchanges. Hence, either when having an embarrassingly 
parallel topology or running a batch job, Flink should not restart the whole 
job in case of a task failure.

However, in the case of side outputs, I think they are connected via pipelined 
data exchanges with the main stream and, hence, are part of the same failover 
region as the main stream.

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/task_failure_recovery.html#restart-pipelined-region-failover-strategy

Cheers,
Till

On Tue, Nov 24, 2020 at 5:15 PM Eifler, Patrick 
<patrick.eif...@sony.com<mailto:patrick.eif...@sony.com>> wrote:
Hi all,

We are trying to setup regions to enable Flink to only stop failing tasks based 
on region instead of failing the entire stream.
We are using one main stream that is reading from a kafka topic and a bunch of 
side outputs for processing each event from that topic differently.
For the processing in the side outputs we use the process function provided by 
flink.

So far when one side output stream failed, the whole stream job failed.

Is there anything that needs to be done or set on the Side Outputs so that 
Flink recognizes them as regions?
Is it even possible to have Flink handle side outputs as regions and restart 
only one specific side output stream on failure?

Many thanks in advance!

Cheers,

Patrick
--
Patrick Eifler

Senior Software Engineer (BI)

Cloud Gaming Engineering & Infrastructure
Sony Interactive Entertainment LLC

Wilhelmstraße 118, 10963 Berlin

Germany

E: patrick.eif...@sony.com<mailto:patrick.eif...@sony.com>

Reply via email to