Thank you Rong and Andrey. The blog and your explanation was very useful.

In my use case, source stream (kafka based) contains messages that capture some 
“work” that needs to be done for a tenant.  It’s a multi-tenant source stream. 
I need to queue up (and execute) this work per tenant in the order in which it 
was produced. And flink provides this ordered queuing per tenant very 
elegantly. Now the only thing is that executing this “work” could be expensive 
in terms of compute/memory/time.  Furthermore per tenant there is a constraint 
of doing this work serially. Hence this question.  I believe if our flink 
cluster has enough resources, it should work.

But this leads to another related question. If there are multiple flink jobs 
sharing the same flink cluster and one of those jobs sees the spike such that 
back pressure builds up all the way to the source, will that impact other jobs 
as well? Is a task slot shared by multiple jobs? If not, my understanding is 
that this should not impact other flink jobs. Is that correct?

Thanks.

Ajay

From: Andrey Zagrebin <and...@ververica.com>
Date: Thursday, February 14, 2019 at 5:09 AM
To: Rong Rong <walter...@gmail.com>
Cc: "Aggarwal, Ajay" <ajay.aggar...@netapp.com>, "user@flink.apache.org" 
<user@flink.apache.org>
Subject: Re: Impact of occasional big pauses in stream processing

Hi Ajay,

Technically, it will immediately block the thread of MyKeyedProcessFunction 
subtask scheduled to some slot and basically block processing of the key range 
assigned to this subtask.
Practically, I agree with Rong's answer. Depending on the topology of your 
inputStream, it can eventually block a lot of stuff.
In general, I think, it is not recommended to perform blocking operations in 
process record functions. You could consider AsyncIO [1] to unblock the task 
thread.

Best,
Andrey

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/asyncio.html

On Thu, Feb 14, 2019 at 6:03 AM Rong Rong 
<walter...@gmail.com<mailto:walter...@gmail.com>> wrote:
Hi Ajay,

Flink handles "backpressure" in a graceful way so that it doesn't get affected 
when your processing pipeline is occasionally slowed down.
I think the following articles will help [1,2].

In your specific case: the "KeyBy" operation will re-hash data so they can be 
reshuffled from all input consumers to all your process operators (in this case 
the MyKeyedProcessFunction). If one of the process operator is backpressured, 
it will back track all the way to the source.
So, my understanding is that: since there's the reshuffling, if one of the 
process function is backpressured, it will potentially affect all the source 
operators.

Thanks,
Rong

[1] https://www.ververica.com/blog/how-flink-handles-backpressure
[2] 
https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/back_pressure.html

On Wed, Feb 13, 2019 at 8:50 AM Aggarwal, Ajay 
<ajay.aggar...@netapp.com<mailto:ajay.aggar...@netapp.com>> wrote:
I was wondering what is the impact if one of the stream operator function 
occasionally takes too long to process the event.  Given the following simple 
flink job

       inputStream
          .KeyBy (“tenantId”)
          .process ( new MyKeyedProcessFunction())

, if occasionally MyKeyedProcessFunction takes too long (say ~5-10 minutes) to 
process an incoming element, what is the impact on overall pipeline? Is the 
impact limited to

  1.  Specific key for which MyKeyedProcessFunction is currently taking too 
long to process an element, or
  2.  Specific Taskslot, where MyKeyedProcessFunction is currently taking too 
long to process an element, i.e. impacting multiple keys, or
  3.  Entire inputstream ?

Also what is the built in resiliency in these cases? Is there a concept of 
timeout for each operator function?

Ajay

Reply via email to