Hi Patrick,

at the moment it is not possible to disconnect side outputs from other
streaming operators. I guess what you would like to have is an operator
which consumes on a best effort basis but which can also lose some data
while it is being restarted. This is currently not supported by Flink.

Concerning the web UI problems, I would suggest creating a JIRA ticket [1]
because it sounds like a missing feature.

[1] https://issues.apache.org/jira/

Cheers,
Till

On Wed, Nov 25, 2020 at 11:15 AM Eifler, Patrick <patrick.eif...@sony.com>
wrote:

> Hi Till,
>
>
>
> Thanks for your reply.
>
>
>
> Is there any option to disconnect the side outputs from the pipelined data
> exchanges of the main stream.
>
>
>
> The benefit of side outputs is very high regarding performance and
> useability plus it fits the use case here very nicely. Though this
> pipelined connection to the main stream is a real concern as all of the
> streams of the job are cancelled if one fails. With many side outputs that
> is not really an option we can maintain at scale.
>
>
>
> So we are looking for options on how to preserve the side outputs but
> still get the individual region based failover to work.
>
>
>
> The other thing I have noticed is that when using side outputs is that
> when looking at the Flink Web UI those streams are just named
> Unregistered_DataStream_. Setting up .name and .uid on the sideoutput
> stream does not change this. Is there any way on naming the side output
> streams so the customized name appears in the Flink Web UI?
>
>
>
> Any help or tips are highly appreciated.
>
>
>
> Many thanks in advance.
>
>
>
> Cheers,
>
>
>
> Patrick
>
> --
>
> Patrick Eifler
>
>
>
> Senior Software Engineer (BI)
>
> Cloud Gaming Engineering & Infrastructure
> Sony Interactive Entertainment LLC
>
> Wilhelmstraße 118, 10963 Berlin
>
>
> Germany
>
> E: patrick.eif...@sony.com
>
>
>
> *From: *Till Rohrmann <trohrm...@apache.org>
> *Date: *Tuesday, 24. November 2020 at 17:20
> *To: *"Eifler, Patrick" <patrick.eif...@sony.com>
> *Cc: *"user@flink.apache.org" <user@flink.apache.org>
> *Subject: *Re: How to setup Regions for Fault Tolerance in Flink when
> using Side Outputs
>
>
>
> Hi Patrick,
>
>
>
> Flink supports regional failover [1] which only restarts all tasks
> connected via pipelined data exchanges. Hence, either when having an
> embarrassingly parallel topology or running a batch job, Flink should not
> restart the whole job in case of a task failure.
>
>
>
> However, in the case of side outputs, I think they are connected via
> pipelined data exchanges with the main stream and, hence, are part of the
> same failover region as the main stream.
>
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/task_failure_recovery.html#restart-pipelined-region-failover-strategy
>
>
>
> Cheers,
>
> Till
>
>
>
> On Tue, Nov 24, 2020 at 5:15 PM Eifler, Patrick <patrick.eif...@sony.com>
> wrote:
>
> Hi all,
>
>
>
> We are trying to setup regions to enable Flink to only stop failing tasks
> based on region instead of failing the entire stream.
>
> We are using one main stream that is reading from a kafka topic and a
> bunch of side outputs for processing each event from that topic differently.
>
> For the processing in the side outputs we use the process function
> provided by flink.
>
>
>
> So far when one side output stream failed, the whole stream job failed.
>
>
>
> Is there anything that needs to be done or set on the Side Outputs so that
> Flink recognizes them as regions?
>
> Is it even possible to have Flink handle side outputs as regions and
> restart only one specific side output stream on failure?
>
>
>
> Many thanks in advance!
>
>
>
> Cheers,
>
>
>
> Patrick
>
> --
>
> Patrick Eifler
>
>
>
> Senior Software Engineer (BI)
>
> Cloud Gaming Engineering & Infrastructure
> Sony Interactive Entertainment LLC
>
> Wilhelmstraße 118, 10963 Berlin
>
>
> Germany
>
> E: patrick.eif...@sony.com
>
>

Reply via email to