Hi Patrick, at the moment it is not possible to disconnect side outputs from other streaming operators. I guess what you would like to have is an operator which consumes on a best effort basis but which can also lose some data while it is being restarted. This is currently not supported by Flink.
Concerning the web UI problems, I would suggest creating a JIRA ticket [1] because it sounds like a missing feature. [1] https://issues.apache.org/jira/ Cheers, Till On Wed, Nov 25, 2020 at 11:15 AM Eifler, Patrick <patrick.eif...@sony.com> wrote: > Hi Till, > > > > Thanks for your reply. > > > > Is there any option to disconnect the side outputs from the pipelined data > exchanges of the main stream. > > > > The benefit of side outputs is very high regarding performance and > useability plus it fits the use case here very nicely. Though this > pipelined connection to the main stream is a real concern as all of the > streams of the job are cancelled if one fails. With many side outputs that > is not really an option we can maintain at scale. > > > > So we are looking for options on how to preserve the side outputs but > still get the individual region based failover to work. > > > > The other thing I have noticed is that when using side outputs is that > when looking at the Flink Web UI those streams are just named > Unregistered_DataStream_. Setting up .name and .uid on the sideoutput > stream does not change this. Is there any way on naming the side output > streams so the customized name appears in the Flink Web UI? > > > > Any help or tips are highly appreciated. > > > > Many thanks in advance. > > > > Cheers, > > > > Patrick > > -- > > Patrick Eifler > > > > Senior Software Engineer (BI) > > Cloud Gaming Engineering & Infrastructure > Sony Interactive Entertainment LLC > > Wilhelmstraße 118, 10963 Berlin > > > Germany > > E: patrick.eif...@sony.com > > > > *From: *Till Rohrmann <trohrm...@apache.org> > *Date: *Tuesday, 24. November 2020 at 17:20 > *To: *"Eifler, Patrick" <patrick.eif...@sony.com> > *Cc: *"user@flink.apache.org" <user@flink.apache.org> > *Subject: *Re: How to setup Regions for Fault Tolerance in Flink when > using Side Outputs > > > > Hi Patrick, > > > > Flink supports regional failover [1] which only restarts all tasks > connected via pipelined data exchanges. Hence, either when having an > embarrassingly parallel topology or running a batch job, Flink should not > restart the whole job in case of a task failure. > > > > However, in the case of side outputs, I think they are connected via > pipelined data exchanges with the main stream and, hence, are part of the > same failover region as the main stream. > > > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/dev/task_failure_recovery.html#restart-pipelined-region-failover-strategy > > > > Cheers, > > Till > > > > On Tue, Nov 24, 2020 at 5:15 PM Eifler, Patrick <patrick.eif...@sony.com> > wrote: > > Hi all, > > > > We are trying to setup regions to enable Flink to only stop failing tasks > based on region instead of failing the entire stream. > > We are using one main stream that is reading from a kafka topic and a > bunch of side outputs for processing each event from that topic differently. > > For the processing in the side outputs we use the process function > provided by flink. > > > > So far when one side output stream failed, the whole stream job failed. > > > > Is there anything that needs to be done or set on the Side Outputs so that > Flink recognizes them as regions? > > Is it even possible to have Flink handle side outputs as regions and > restart only one specific side output stream on failure? > > > > Many thanks in advance! > > > > Cheers, > > > > Patrick > > -- > > Patrick Eifler > > > > Senior Software Engineer (BI) > > Cloud Gaming Engineering & Infrastructure > Sony Interactive Entertainment LLC > > Wilhelmstraße 118, 10963 Berlin > > > Germany > > E: patrick.eif...@sony.com > >