[jira] [Commented] (KUDU-1954) Improve maintenance manager behavior in heavy write workload

Yingchun Lai (Jira) Tue, 27 Jul 2021 23:30:09 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-1954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388492#comment-17388492
 ]


Yingchun Lai commented on KUDU-1954:
------------------------------------

Although we have tried to reduce a single compaction operation's duration, it 
is still possible in some special environments compaction OPs run slower than 
data ingestion. In some environments, the machines may have only spinning 
disks, or even a single spinning disk, the --maintenance_manager_num_threads is 
set to 1, once the thread is lauching some heavy compaction OPs, flush OPs will 
wait a long time to be lauched.

I think we can introduce a seperate flush threads to do flush OPs specially, 
which is similar to how RocksDB works[1].

1. 
https://github.com/facebook/rocksdb/blob/4361d6d16380f619833d58225183cbfbb2c7a1dd/include/rocksdb/options.h#L599-L658

> Improve maintenance manager behavior in heavy write workload
> ------------------------------------------------------------
>
>                 Key: KUDU-1954
>                 URL: https://issues.apache.org/jira/browse/KUDU-1954
>             Project: Kudu
>          Issue Type: Improvement
>          Components: compaction, perf, tserver
>    Affects Versions: 1.3.0
>            Reporter: Todd Lipcon
>            Priority: Major
>              Labels: performance, roadmap-candidate, scalability
>         Attachments: mm-trace.png
>
>
> During the investigation in [this 
> doc|https://docs.google.com/document/d/1U1IXS1XD2erZyq8_qG81A1gZaCeHcq2i0unea_eEf5c/edit]
>  I found a few maintenance-manager-related issues during heavy writes:
> - we don't schedule flushes until we are already in "backpressure" realm, so 
> we spent most of our time doing backpressure
> - even if we configure N maintenance threads, we typically are only using 
> ~50% of those threads due to the scheduling granularity
> - when we do hit the "memory-pressure flush" threshold, all threads quickly 
> switch to flushing, which then brings us far beneath the threshold
> - long running compactions can temporarily starve flushes
> - high volume of writes can starve compactions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KUDU-1954) Improve maintenance manager behavior in heavy write workload

Reply via email to