Hi Mason

Can you use the jvm cpu perfrommance analysis tools?  

Jprofile and https://github.com/alibaba/arthas 
<https://github.com/alibaba/arthas>

You can probably guess the reason for the high CPU load.

Jake

> On Aug 6, 2020, at 12:25 PM, Chen, Mason <mason.c...@sony.com> wrote:
> 
> Thanks Peter for the reply. I noticed the behavior you described when I 
> reduced the parallelism of the asyncio sink to 8—one task manager had its 
> slots completely taken and the other one had all its slots completely open. 
> To mitigate this behavior, I tried to use the setting 
> `cluster.evenly-spread-out-slots: true`, but it didn’t fix anything (had 
> expected the job manager to split the task slot requirements evenly between 
> the two task managers). It seems like in general I should be extremely wary 
> of the parallelism and number of task slots, and their effects on the 
> cpu/memory usage…
> 
> I will use your work around to use parallelism of 8—I can scale the capacity 
> of the asyncio accordingly, no problem there. For the filter function, I kept 
> it at 4 since there’s a cache involved and I noticed that hit rate was worse 
> when the parallelism was higher—I will use a keyBy to mitigate this.
>  
> From: Piotr Nowojski <piotr.nowoj...@gmail.com 
> <mailto:piotr.nowoj...@gmail.com>>
> Date: Wednesday, August 5, 2020 at 10:36 AM
> To: "Chen, Mason" <mason.c...@sony.com <mailto:mason.c...@sony.com>>
> Cc: "user@flink.apache.org <mailto:user@flink.apache.org>" 
> <user@flink.apache.org <mailto:user@flink.apache.org>>
> Subject: Re: Only One TaskManager Showing High CPU Usage
>  
> Hi,
>  
> What I guess is happening is since you have 16 slots in total (8 slots per 
> TM), while your operators have various levels of parallelism (8, 4, 16), 
> Flink is scheduling all of the operators with parallelism < 16 on a TM that 
> becomes available first to the scheduler. That's causing the visible load 
> skew. Keep in mind that different operators are by default allowed to share 
> the same task slot, unless you explicitly tell them to not do that [1].
>  
> One obvious work around would be to define the same parallelism for all of 
> the operators, and that's the usual way to go, unless you have a really good 
> reason not to. Can you try this out? Usually there is no harm in keeping more 
> then required operator instances, and in your case you already have the 
> highest parallelism in your Async function (the one that allocates the most 
> resources?).
>  
> Till, is there a way to change this resource allocation/scheduling behaviour? 
> To not pack everything on the same TM?
>  
> Piotrek
>  
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/#task-chaining-and-resource-groups
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/#task-chaining-and-resource-groups>
>  
>  
>  
> śr., 5 sie 2020 o 02:39 Chen, Mason <mason.c...@sony.com 
> <mailto:mason.c...@sony.com>> napisał(a):
> Hi all,
> 
> The issue is that only one out of two taskmanagers experience high cpu 
> usage.<image001.png>
> 
> I’m running a series of performance tests processing records at 50k rps. In 
> this setup, I have 1 job manager (1 core, 1 gb) and 2 task managers (8 cores, 
> 8 gb). Each of the taskmanagers have 8 task slots and we have a simple 
> pipeline that reads from kafka, filters, and makes a http request downstream 
> with the asyncio function.
> 
> All operators have parallelism of 8, except the filter (parallelism of 4) and 
> the asyncio function (parallelism of 16). We do not have checkpointing turned 
> on.
> 
> I thought maybe the operator chaining was causing issues in distributing the 
> load, so I disabled operator chaining after the filter (before the asyncio). 
> However, the issue still persisted and I did see somewhat even distribution 
> of records before and after this change.
> 
> Some potential problems: the http client is not static so it will be 
> recreated for each parallel instance of the asyncio operator (so, there’s 
> gonna be a lot of executors.). At the cpu peak, I see 10k threads and it 
> steadily grows to 40k at the end of the time period shown.
> 
> 
> Does anyone have any ideas? In the 50k rps, about 500 out of those events 
> need to hit the asyncio function (the filter filters out the unrelated 
> events). I was doing fine before I added the unrelated events (just the 500 
> rps going to asyncio).
>  
> Thanks,
> Mason

Reply via email to