Hello, Igniters! Zhenya has already approved the patch that adds segmented-LRU and CLOCK page replacement modes [1] (thanks for the review!). Would anyone else like to review it? I have plans to merge it next Monday if nobody else will be interested in reviewing it.
[1]: https://issues.apache.org/jira/browse/IGNITE-13761 пт, 25 дек. 2020 г. в 12:26, Alex Plehanov <plehanov.a...@gmail.com>: > Guys, > > I've implemented Segmented-LRU page replacement algorithm and benchmarked > results, it gives some boost (5-10%) when page replacement is > heavily used, but, unfortunately, when replacement is not used it gives > about 2% drop compared to the current Random-LRU page replacement > implementation. Due to this, I think Segmented-LRU can't be used as the > only option or option by default. > > Also, I've implemented CLOCK page replacement algorithm (basic, > not scan-resistant version) and benchmark results are much better. It gives > about the same performance as Segmented-LRU when page replacement is used > and about the same performance as Random-LRU where there is no page > replacement. There are advanced modifications of CLOCK algorithm exists, > but usually, they require additional maintenance cost and we can again get > drop on environments without page replacements compared to Random-LRU. I've > written a benchmark with background full cache scans and even in this case > basic CLOCK version looks promising. > > So, my proposals: > 1. Include all 3 implementations (Random-LRU, Segmented-LRU, CLOCK) to the > product. > 2. Make page replacement algorithm configurable. > 3. Recommend to use Random-LRU for environments with no page replacements > (as it has zero maintenance cost). Recommend to use Segmented-LRU for > environments with a high page replacement rate and a large amount of > one-time scans (as it has near to optimal page to replace selection policy > and scan-resistant). Recommend to use CLOCK for all other cases (as it has > near to zero maintenance cost and efficiency of page replacement near to > Segmented-LRU). > 4. Set CLOCK as the default page replacement algorithm. > > WDYT? > > I've filled the IEP [1] for this discussion and create the pull request > [2] for the last proposal. I would appreciate for review of the patch. > > [1]: > https://cwiki.apache.org/confluence/display/IGNITE/IEP-62+Page+replacement+improvements > [2]: https://github.com/apache/ignite/pull/8513 > > пн, 23 нояб. 2020 г. в 11:12, Zhenya Stanilovsky > <arzamas...@mail.ru.invalid>: > >> >> >> Nikolay, i hope such case ignite users already observed) >> I suggest to: put data bigger then available, full scan, get data only >> for available inmem data in loop, check PageReplacement metric, how match >> iterations will bring to zero this metric. >> >> >Hello, Alex. >> > >> >> Perhaps we can implement a special benchmark for this case with >> manually triggered "batch page replacement" using yardstick (but I'm not >> sure if ducktape can help here). >> > >> >I think we should carefully describe the issues with the current >> approach and then we can choose right tool to benchmark improvements. >> >You said: >> > >> >> we use Random-LRU algorithm … it has many disadvantages and affects >> performance very much when replacement is started >> > >> >Can you list disadvantages of the Random-LRU? >> > >> >AFAIU the first benchmark should be: >> > >> >1. Start cluster with persistence and put data bigger then available RAM >> to it. >> >2. Measure performance of the queries that selects one entry from the >> cache. >> >3. Make some queries that will iterate throw all data - `SELECT SUM(x) >> FROM t` or something similar. >> >4. Repeat step 2. in this time performance of random queries should be >> lower due to the page replacement. >> > >> >Is this scenario correct? >> > >> >> 23 нояб. 2020 г., в 09:12, Alex Plehanov < plehanov.a...@gmail.com > >> написал(а): >> >> >> >> Nikolay, Zhenya, >> >> >> >> Benchmark from IGNITE-13034 is fully synthetic, it makes random puts >> >> uniformly. It can be used to compare different page replacement >> algorithms >> >> (random-LRU vs segmented-LRU), but it is totally inapplicable to >> benchmark >> >> batch page replacement. >> >> Perhaps we can implement a special benchmark for this case with >> manually >> >> triggered "batch page replacement" using yardstick (but I'm not sure >> >> if ducktape can help here). >> >> >> >>> I understand case you described, but who will pull the switch ? Human, >> >> artificial intelligence ? >> >> As I described before, we can implement both manual and automatic >> "batch >> >> page replacement" triggering. For automatic triggering, there is no >> >> artificial intelligence needed, just several conditions with reasonable >> >> thresholds. Automatic triggering also can be disabled by default. >> >> >> >> пт, 20 нояб. 2020 г. в 13:32, Zhenya Stanilovsky < >> arzamas...@mail.ru.invalid >> >>> : >> >> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>>> Zhenya, >> >>>> >> >>>>> Alexey, we already have changes that partially fixes this issue [1] >> >>>> IGNITE-13086 it's a minor improvement. We still have major problems >> with >> >>>> our page replacement algorithm (slow page selection and non-optimal >> >>>> page-fault rate). I think changing from random 5 pages to 7 will make >> >>>> things even worse (it's better for page-fault rate, but page >> selection >> >>> will >> >>>> be slower). >> >>> All this words above need to be proven, i hope. + 1 with Nikolay, we >> need >> >>> correct reproduces or some graphs from 2.9 ver. >> >>> >> >>>> >> >>>>> This approach still not applicable for real life >> >>>> Why do you think batch replacement is not applicable for real-life? >> It can >> >>>> be applied for workloads, where some big amount of data periodically >> used, >> >>>> but not very often. For example, when OLAP request over historical >> data >> >>>> raised pages to page-memory, and after such request this data is not >> >>> needed >> >>>> for a long time. Or when OLTP transactions mostly add new data and >> process >> >>>> recent data but rarely touch historical data. In these cases with the >> >>>> current approach, we will enter "page replacement mode" after some >> period >> >>>> of time and never leave it. With batch page replacement there is a >> chance >> >>>> to prevent random-LRU page replacement or postpone it. >> >>> I understand case you described, but who will pull the switch ? Human, >> >>> artificial intelligence ? >> >>> You approach assume some triggering from inner, i don`t like this. >> >>> >> >>>> >> >>>>> But request once more, do you really observe such problems with 2.9 >> ver >> >>> ? >> >>>> Any graphs maybe ? >> >>>> I don't have production usage feedback after IGNITE-13086, but I >> doubt >> >>>> something changed significantly. >> >>> >> >>> Lets wait ?:) In any case (Nikolay, Alex) IGNITE-13086 includes >> yardstik >> >>> bench for PR proven, we can use it once more. >> >>> >> >>> Thanks ! >> >>>> >> >>>> >> >>>> чт, 19 нояб. 2020 г. в 09:18, Zhenya Stanilovsky < >> >>> arzamas...@mail.ru.invalid >> >>>>> : >> >>>> >> >>>>> >> >>>>> Alexey, we already have changes that partially fixes this issue [1] >> >>>>> Easy way: >> >>>>> Looks like we already have converge in page replacement. >> >>>>> If we change 5 times touch iterator from random lru algo into, for >> >>>>> example — 7 we will obtain fast improvement from scratch. >> >>>>> >> >>>>> » Batch page replacement >> >>>>> This approach still not applicable for real life if you wan`t to >> observe >> >>>>> ugly people for threshold (i.e. 12 h) interval. And, of course, you >> >>>>> understand that dramatically reduce of such interval gives nothing? >> >>>>> >> >>>>> » Change the page replacement algorithm. >> >>>>> That`s way i vote for ) But request once more, do you really observe >> >>> such >> >>>>> problems with 2.9 ver ? Any graphs maybe ? >> >>>>> >> >>>>> thanks ! >> >>>>> >> >>>>> [1] https://issues.apache.org/jira/browse/IGNITE-13086 >> >>>>>> Hello, Igniters! >> >>>>>> >> >>>>>> Currently, for page replacement (page rotation between page-memory >> and >> >>>>>> disk) we use Random-LRU algorithm. It has a low maintenance cost >> and >> >>>>>> relatively simple implementation, but it has many disadvantages and >> >>>>> affects >> >>>>>> performance very much when replacement is started. We even have >> >>> warnings >> >>>>> in >> >>>>>> the log when page replacement started and a special event for >> this. I >> >>> know >> >>>>>> Ignite deployments where administrators force to restart cluster >> nodes >> >>>>>> periodically to avoid page replacement. >> >>>>>> >> >>>>>> I have a couple of proposals to improve page replacement in Ignite: >> >>>>>> >> >>>>>> *Batch page replacement.* >> >>>>>> >> >>>>>> Main idea: in some cases start background task to evict cold pages >> from >> >>>>>> page-memory (for example, pages, last touched more than 12 hours >> ago). >> >>>>>> >> >>>>>> The task can be started: >> >>>>>> - Automatically, triggered by some events, for example, when we >> expect >> >>> a >> >>>>>> start of Random-LRU page replacing soon (allocated more than 90% of >> >>>>>> page-memory) + we have enough amount of cold pages (we need some >> >>> metric to >> >>>>>> calculate the number of cold pages) + some time passed since last >> batch >> >>>>>> page replacement (to avoid too much resource consumption by >> background >> >>>>>> batch replacement). >> >>>>>> - Manually (JMX or control.sh), if an administrator wants to >> control >> >>> the >> >>>>>> time of batch replacement more precisely (for example, to avoid the >> >>> start >> >>>>>> of this task during peak time). >> >>>>>> >> >>>>>> Batch page replacement will be helpful in some workloads (when some >> >>> data >> >>>>>> much colder than another), it can prevent the starting of >> Random-LRU >> >>> page >> >>>>>> replacement, or if Random-LRU already started it can provide >> >>> conditions to >> >>>>>> stop it. >> >>>>>> >> >>>>>> *Change the page replacement algorithm.* >> >>>>>> >> >>>>>> Good page replacement algorithm should satisfy the requirements: >> >>>>>> - low page-fault rates for typical workload >> >>>>>> - low maintenance cost (low resource consumption to maintain >> additional >> >>>>>> structures required for page replacement) >> >>>>>> - fast searching of next page for replacement >> >>>>>> - sequential scans resistance (one sequential scan should not >> evict all >> >>>>>> relatively hot pages from page-memory) >> >>>>>> >> >>>>>> Our Random-LRU has low maintenance cost and sequential scan >> resistant, >> >>> but >> >>>>>> to find the next page for replacement in the best case we scan 5 >> >>> pages, in >> >>>>>> the worst case we can scan all data region segment. Also, due to >> random >> >>>>>> nature, it's not very effective in predicting the right page for >> >>>>>> replacement to minimize the page-fault rate. And it's much time >> >>> required >> >>>>> to >> >>>>>> totally evict old cold data. >> >>>>>> >> >>>>>> Usually, database management systems and operating systems use >> >>>>>> modifications of LRU algorithms. These algorithms have higher >> >>> maintenance >> >>>>>> costs (pages list should be modified on each page access), but >> often >> >>> they >> >>>>>> are effective from a "page-fault rate" point of view and have O(1) >> >>>>>> complexity for a searching page to replace. Simple LRU is not >> >>> sequential >> >>>>>> scan resistant, but modifications that utilize page access >> frequency >> >>> are >> >>>>>> resistant to sequential scan. >> >>>>>> >> >>>>>> We can try one of the modifications of LRU as well (for example, >> >>>>> "segmented >> >>>>>> LRU" seems suitable for Ignite). >> >>>>>> >> >>>>>> Ignite is a memory-centric product, so "low maintenance cost" is >> very >> >>>>>> critical. And there is a risk that page replacement algorithm can >> >>> affect >> >>>>>> workloads, where page replacement is not used (enough RAM to store >> all >> >>>>>> data). Of course, any page replacement solution should be carefully >> >>>>>> benchmarked. >> >>>>>> >> >>>>>> >> >>>>>> Igniters, WDYT? If any of these proposals look reasonable to you, I >> >>> will >> >>>>>> create IEP and start implementation. >> >>>>>> >> >>>>>> Also, I have a draft implementation of system view to determine >> how hot >> >>>>> are >> >>>>>> pages in page-memory [1]. I think it will be useful for any of >> these >> >>>>>> approaches (and even if we decide to left page replacement as is). >> >>>>>> >> >>>>>> [1]: https://issues.apache.org/jira/browse/IGNITE-13726 >> >>>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> >> >>> >> >>> >> >>> >> >>> >> > >> >> >> >> > >