On Wed, 28 Feb 2018 15:55:19 +0300 Alexander Kuzmenkov <a.kuzmen...@postgrespro.ru> wrote:
> Hi David, > > I was able to reproduce the problem using your script. > analyze_counts.awk is missing, though. Attached now I hope. I think I also added it to the commitfest page. > The idea of using the result of ANALYZE as-is, without additional > averaging, was discussed when vac_estimate_reltuples() was introduced > originally. Ultimately, it was decided not to do so. You can find the > discussion in this thread: > https://www.postgresql.org/message-id/flat/BANLkTinL6QuAm_Xf8teRZboG2Mdy3dR_vw%40mail.gmail.com#banlktinl6quam_xf8terzbog2mdy3dr...@mail.gmail.com Well that was a long discussion. I'm not sure I would agree that there was a firm conclusion on what to do about ANALYZE results. There was some recognition that the case of ANALYZE is different than VACUUM and that is reflected in the original code comments too. However the actual code ended up being the same for both ANALYZE and VACUUM. This patch is about that. See messages: https://www.postgresql.org/message-id/BANLkTimVhdO_bKQagRsH0OLp7MxgJZDryg%40mail.gmail.com https://www.postgresql.org/message-id/BANLkTimaDj950K-298JW09RrmG0eJ_C%3DqQ%40mail.gmail.com https://www.postgresql.org/message-id/28116.1306609295%40sss.pgh.pa.us > The core problem here seems to be that this calculation of moving > average does not converge in your scenario. It can be shown that when > the number of live tuples is constant and the number of pages grows, the > estimated number of tuples will increase at each step. Do you think we > can use some other formula that would converge in this scenario, but > still filter the noise in ANALYZE results? I couldn't think of one yet. Besides the test data generated with the script I have parsed the analyze verbose output for several large production systems running complex applications and have found that for tables larger than the statistics sample size (300*default_statistics_target) the row count you can caculate from (pages/sample_pages) * live_rows is pretty accurate, within a few percent of the value from count(*). In theory the sample pages analyze uses should represent the whole table fairly well. We rely on this to generate pg_statistic and it is a key input to the planner. Why should we not believe in it as much only for reltuples? If the analyze sampling does not work, the fix would be to improve that, not to disregard it piecemeal. My motivation is that I have seen large systems fighting mysterious run-away bloat for years no matter how aggressively autovacuum is tuned. The fact that an inflated reltuples can cause autovacuum to simply ignore tables forever seems worth fixing. -dg -- David Gould da...@sonic.net If simplicity worked, the world would be overrun with insects.
analyze_counts.awk
Description: application/awk