[ https://issues.apache.org/jira/browse/KUDU-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527697#comment-17527697 ]
Alexey Serbin commented on KUDU-3364: ------------------------------------- I'm not sure I understand the reasoning. The periodic rescheduling can be done at the logic of a particular task, and that's how it's done now in Kudu: check out various periodic tasks already, and that works fine without introducing any extra logic into the ThreadPool. Roughly speaking, ThreadPool is an entity that allows to throw a task on a set of workers, and that's simple enough. BTW, auto-rebalancing is already implemented in Kudu up to some extent, but it's not enabled yet: check out the code in src/kudu/master/auto_rebalancer.cc What's missing in the current approach to satisfy the use cases you mentioned? > Add TimerThread to ThreadPool to support a category of problem > -------------------------------------------------------------- > > Key: KUDU-3364 > URL: https://issues.apache.org/jira/browse/KUDU-3364 > Project: Kudu > Issue Type: New Feature > Reporter: shenxingwuying > Assignee: shenxingwuying > Priority: Minor > Original Estimate: 168h > Remaining Estimate: 168h > > h1. Scenanios > In general, I am talking about a category of problem. > There are some periodic tasks or automatically triggered scheduling tasks in > kudu. > For example, automatic rebalance of cluster data, some GC task and compaction > tasks. > Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the > really task internally periodic scheduled or internally strategy to trigge > execution. > They are all internal, we cann't do some. > In fact, we need a method our control to trigge the above types of actions. > In general, I am talking about a category of problem. > Some scenarios is significant. > Below is examples: > > h2. data rebalance > There are two rebalance ways: > 1. enable auto rebalance > 2. use rebalance tool 1.14 before. > The two ways maybe exist some conflicts at opeations race, because rebalance > tool' logic is a litte complex at tool and auto rebalance is running at > master. > In future, auto rebalance at master will become very steady and become the > main way for data rebalance. And at the same time, admin opers need a > external trigger the rebalance just like auto rebalance. > But, now auto rebalance is running in a thread and by time period. > Although we can add a api for MasterService, but the api is synchronize, and > will cose very much, we need a asynchronized method to trigger the rebalance. > h2. auto compaction > Another example is auto compaction, > I have found compaction strategy is not always valid, so maybe we need a > method controlled by admin users to triggle compaction. > If we can do a RowSetInCompaction, we need not restart the kudu cluster. > h1. > h1. My Solution > Add a timer in ThreadPool. This timer is a worker thread that schedules tasks > to the specified thread according to time. > We can limit only SERIAL ThreadPoolToken can enable TimerThread. > Pseudo code expresses my intention: > {code:java} > //代码占位符 > class TimerThread { > class Task { > ThreadPoolToken token; > std::function<void()> f; > }; > > void Schedule(Task task, int delay_ms) { > tasks_.insert(...); > } > void RunLoop() { > while (...) { > SleepFor(100ms); > tasks = FindTasks(); > for (auto task : tasks) { > token = task.token; > token->Submit(task.f); > tasks_.erase... > } > } > } > scoped_refptr<Thread> thread_; > std::multimap<MonoTime, Task> tasks; > }; > class ThreadPool{ > ... > TimerThread* timer_; > ... > }; > class ThreadPoolToken { > void Scheduler(); > };{code} > This scheme can be compatible with the previous ThreadPool, and timer is > nullptr by default. > For periodic tasks, We can use a Control ThreadPool with timer to refact some > codes to make them more clear, to avoid the problem of too many single > threads in the past. -- This message was sent by Atlassian Jira (v8.20.7#820007)