[ 
https://issues.apache.org/jira/browse/FLINK-28838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577116#comment-17577116
 ] 

Qingsheng Ren edited comment on FLINK-28838 at 8/9/22 3:30 AM:
---------------------------------------------------------------

Thanks for the ticket [~aitozi]! Yeah we can definitely make some improvement 
as not all source implementations works as expected. 

I think your first proposal make sense to me. We can drop empty records earlier 
before putting into elementsQueue. I have some concerns about the second one 
(adding SleepTask) as we can hardly decide the length of sleep considering 
source implementations differ a lot. For example KafkaConsumer itself has 
ability to block the thread if no data is available for polling so it doesn't 
need the SleepTask at all. I prefer to leave it to split reader implementation 
itself as the doc of {{SplitReader#fetch}} is quite clear that it could be a 
blocking call. WDYT?

BTW which source has this issue? We can check its implementation too. 


was (Author: renqs):
Thanks for the ticket [~aitozi]! Yeah we can definitely make some improvement 
as not all source implementations works as expected. 

I think your first proposal make sense to me. We can drop empty records earlier 
before putting into elementsQueue. I have some concerns about the second one 
(adding SleepTask) as we can hardly decide the length of sleep considering 
source implementations vary a lot. For example KafkaConsumer itself has ability 
to block the thread if no data is available for polling so it doesn't need the 
SleepTask at all. I prefer to leave it to split reader implementation itself as 
the doc of {{SplitReader#fetch}} is quite clear that it could be a blocking 
call. WDYT?

BTW which source has this issue? We can check its implementation too. 

> Avoid to notify the elementQueue consumer when the fetch result is empty
> ------------------------------------------------------------------------
>
>                 Key: FLINK-28838
>                 URL: https://issues.apache.org/jira/browse/FLINK-28838
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Common
>    Affects Versions: 1.15.0, 1.15.1
>            Reporter: Aitozi
>            Priority: Major
>             Fix For: 1.16.0
>
>         Attachments: 20220805165441.jpg
>
>
> When using the new source api, I found that if the source has no data, it 
> still brings high cpu usage. 
> The reason behind this is that it will always return the 
> {{RecordsWithSplitIds}} from the {{splitReader.fetch}} in FetchTask and it 
> will be added to the elementQueue. It will make the consumer be notified to 
> wake up frequently.
> This causes the thread to keep busy to run and wake up, which leads to the 
> high sys and user cpu usage.
> I think not all the SplitReader#fetch will block until there is data, if it 
> returns immediately when there is no data, then this problem will happen



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to