[ https://issues.apache.org/jira/browse/KUDU-3516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Song Jiacheng updated KUDU-3516: -------------------------------- Description: If we have many tables with many columns and each of them get many update requests, the maintenance scheduler will stuck in calculate the perf improvement score of major compaction. This tablet server has 6 maintenance manager but could only schedule 1 or 2 tasks at one time, even if the tablet server is actually under high memory pressure. !image-2023-10-09-15-59-01-026.png|width=655,height=327! According to the stack showed below, I found out the scheduler stuck in AddColumnIdsWithUpdates for a long time, but there is no need to get all the updated columns here. !image-2023-10-09-15-58-47-267.png|width=690,height=218! was: If we have many tables with many columns and each of them get many update requests, the maintenance scheduler will stuck in calculate the perf improvement score of major compaction. This tablet server has 6 maintenance manager but could only schedule 1 or 2 tasks at one time, even if the tablet server is actually under high memory pressure. !image-2023-10-09-15-59-01-026.png|width=479,height=239! According to the stack showed below, I found out the scheduler stuck in AddColumnIdsWithUpdates for a long time, but there is no need to get all the updated columns here. !image-2023-10-09-15-58-47-267.png|width=690,height=218! > Tserver: Maintenance scheduler might stuck in > DeltaStats#AddColumnIdsWithUpdates > --------------------------------------------------------------------------------- > > Key: KUDU-3516 > URL: https://issues.apache.org/jira/browse/KUDU-3516 > Project: Kudu > Issue Type: Bug > Components: tserver > Reporter: Song Jiacheng > Priority: Major > Attachments: image-2023-10-09-15-58-47-267.png, > image-2023-10-09-15-59-01-026.png > > > If we have many tables with many columns and each of them get many update > requests, the maintenance scheduler will stuck in calculate the perf > improvement score of major compaction. > This tablet server has 6 maintenance manager but could only schedule 1 or 2 > tasks at one time, even if the tablet server is actually under high memory > pressure. > !image-2023-10-09-15-59-01-026.png|width=655,height=327! > According to the stack showed below, I found out the scheduler stuck in > AddColumnIdsWithUpdates for a long time, but there is no need to get all the > updated columns here. > !image-2023-10-09-15-58-47-267.png|width=690,height=218! -- This message was sent by Atlassian Jira (v8.20.10#820010)