Hi,
On 2/13/19 3:37 PM, Andrey Borodin wrote: > Hi, hackers! > > We have held education project at Sirius edu center (Sochi, Russia) > with mentors from Yandex. The group of 5 students was working on > improving the shared buffers eviction algorithm: Andrey Chausov, Yuriy > Skakovskiy, Ivan Edigaryev, Arslan Gumerov, Daria Filipetskaya. I’ve > been a mentor for the group. For two weeks we have been looking into > known caching algorithms and tried to adapt some of them for PostgreSQL > codebase. > > While a lot of algorithms appeared to be too complex to be hacked in > 2 weeks, we decided to implement and test the working version of > tweaked LRU eviction algorithm. > > ===How it works=== > Most of the buffers are organized into the linked list. Firstly > admitted pages jump into 5/8th of the queue. The last ⅛ of the queue is > governed by clock sweep algorithm to improve concurrency. > Interesting. Where do these numbers (5/8 and 1/8) come from? > ===So how we tested the patch=== > We used sampling on 4 Yandex.Cloud compute instances with 16 vCPU > cores, 8GB of RAM, 200GB database in 30-minute YCSB-like runs with Zipf > distribution. We found that on read-only workload our algorithm is > showing consistent improvement over the current master branch. On > read-write workloads we haven’t found performance improvements yet, > there was too much noise from checkpoints and bgwriter (more on it in > TODO section). > Charts are here: [0,1] That TPS chart looks a bit ... wild. How come the master jumps so much up and down? That's a bit suspicious, IMHO. How do I reproduce this benchmark? I'm aware of pg_ycsb, but maybe you've used some other tools? Also, have you tried some other benchmarks (like, regular TPC-B as implemented by pgbench, or read-only pgbench)? We need such benchmarks with a range of access patterns to check for regressions. BTW what do you mean by "sampling"? > We used this config: [2] > That's only half the information - it doesn't say how many clients were running the benchmark etc. > ===TODO=== > We have taken some ideas expressed by Ben Manes in the pgsql-hackers > list. But we could not implement all of them during the time of the > program. For example, we tried to make LRU bumps less write-contentious > by storing them in a circular buffer. But this feature was not stable > enough. Can you point us to the thread/email discussing those ideas? I have tried searching through archives, but I haven't found anything :-( This message does not really explain the algorithms, and combined with the absolute lack of comments in the linked commit, it'd somewhat difficult to form an opinion. > The patch in its current form also requires improvements. So, we > shall reduce the number of locks at all (in this way we have tried > bufferization, but not only it). “Clock sweep” algorithm at the last > ⅛ part of the logical sequence should be improved too (see > ClockSweepTickAtTheAnd() and places of its usage). OK > Unfortunately, we didn’t have more time to bring CAR and W-TinyLFU > to testing-ready state. What is CAR? Did you mean ARC, perhaps? > We have a working implementation of frequency sketch [3] to make a > transition between the admission cycle and LRU more concise with TinyLFU > filter. Most probably, work in this direction will be continued. OK > Also, the current patch does not change bgwriter behavior: with a > piece of knowledge from LRU, we can predict that some pages will not be > changed in the nearest future. This information should be used to > schedule the background writes better. Sounds interesting. > We also think that with proper eviction algorithm shared buffers > should be used instead of specialized buffer rings. > Are you suggesting to get rid of the buffer rings we use for sequential scans, for example? IMHO that's going to be tricky, e.g. because of synchronized sequential scans. > We will be happy to hear your feedback on our work! Thank you :) > > [0] LRU TPS https://yadi.sk/i/wNNmNw3id22nMQ > [1] LRU hitrate https://yadi.sk/i/l1o710C3IX6BiA > [2] Benchmark config https://yadi.sk/d/L-PIVq--ujw6NA > [3] Frequency sketch code > https://github.com/heyfaraday/postgres/commit/a684a139a72cd50dd0f9d031a8aa77f998607cf1 > > With best regards, almost serious cache group. > cheers -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services