[ https://issues.apache.org/jira/browse/FLINK-28838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17577116#comment-17577116 ]
Qingsheng Ren edited comment on FLINK-28838 at 8/9/22 3:30 AM: --------------------------------------------------------------- Thanks for the ticket [~aitozi]! Yeah we can definitely make some improvement as not all source implementations works as expected. I think your first proposal make sense to me. We can drop empty records earlier before putting into elementsQueue. I have some concerns about the second one (adding SleepTask) as we can hardly decide the length of sleep considering source implementations differ a lot. For example KafkaConsumer itself has ability to block the thread if no data is available for polling so it doesn't need the SleepTask at all. I prefer to leave it to split reader implementation itself as the doc of {{SplitReader#fetch}} is quite clear that it could be a blocking call. WDYT? BTW which source has this issue? We can check its implementation too. was (Author: renqs): Thanks for the ticket [~aitozi]! Yeah we can definitely make some improvement as not all source implementations works as expected. I think your first proposal make sense to me. We can drop empty records earlier before putting into elementsQueue. I have some concerns about the second one (adding SleepTask) as we can hardly decide the length of sleep considering source implementations vary a lot. For example KafkaConsumer itself has ability to block the thread if no data is available for polling so it doesn't need the SleepTask at all. I prefer to leave it to split reader implementation itself as the doc of {{SplitReader#fetch}} is quite clear that it could be a blocking call. WDYT? BTW which source has this issue? We can check its implementation too. > Avoid to notify the elementQueue consumer when the fetch result is empty > ------------------------------------------------------------------------ > > Key: FLINK-28838 > URL: https://issues.apache.org/jira/browse/FLINK-28838 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common > Affects Versions: 1.15.0, 1.15.1 > Reporter: Aitozi > Priority: Major > Fix For: 1.16.0 > > Attachments: 20220805165441.jpg > > > When using the new source api, I found that if the source has no data, it > still brings high cpu usage. > The reason behind this is that it will always return the > {{RecordsWithSplitIds}} from the {{splitReader.fetch}} in FetchTask and it > will be added to the elementQueue. It will make the consumer be notified to > wake up frequently. > This causes the thread to keep busy to run and wake up, which leads to the > high sys and user cpu usage. > I think not all the SplitReader#fetch will block until there is data, if it > returns immediately when there is no data, then this problem will happen -- This message was sent by Atlassian Jira (v8.20.10#820010)