On Fri, Mar 2, 2018 at 5:17 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > (1) do we really want to go over to treating ANALYZE's tuple density > result as gospel, contradicting the entire thrust of the 2011 discussion? > >> This tables reltuples is 18 times the actual row count. It will never >> converge >> because with 50000953 pages analyze can only adjust reltuples by 0.0006 each >> time. > > But by the same token, analyze only looked at 0.0006 of the pages. It's > nice that for you, that's enough to get a robust estimate of the density > everywhere; but I have a nasty feeling that that won't hold good for > everybody. The whole motivation of the 2011 discussion, and the issue > that is also seen in some other nearby discussions, is that the density > can vary wildly.
I think that viewing ANALYZE's result as fairly authoritative is probably a good idea. If ANALYZE looked at only 0.0006 of the pages, that's because we decided that 0.0006 of the pages were all it needed to look at in order to come up with good estimates. Having decided that, we should turn around and decide that they are 99.94% bunk. Now, it could be that there are data sets out there were the number of tuples per page varies widely between different parts of the table, and therefore sampling 0.0006 of the pages can return quite different estimates depending on which ones we happen to pick. However, that's a lot like saying that 0.0006 of the pages isn't really enough, and maybe the solution is to sample more. Still, it doesn't seem unreasonable to do some kind of smoothing, where we set the new estimate = (newly computed estimate * X) + (previous estimate * (1 - X)) where X might be 0.25 or whatever; perhaps X might even be configurable. One thing to keep in mind is that VACUUM will, in many workloads, tend to scan the same parts of the table over and over again. For example, consider a database of chess players which is regularly updated with new ratings information. The records for active players will be updated frequently, but the ratings for deceased players will rarely change. Living but inactive players may occasionally become active again, or may be updated occasionally for one reason or another. So, VACUUM will keep scanning the pages that contain records for active players but will rarely or never be asked to scan the pages for dead players. If it so happens that these different groups of players have rows of varying width -- perhaps we store more detailed data about newer players but don't have full records for older ones -- then the overall tuple density estimate will come to resemble more and more the density of the rows that are actively being updated, rather than the overall density of the whole table. Even if all the tuples are the same width, it might happen in some workload that typically insert a record, update it N times, and then it stays fixed after that. Suppose we can fit 100 tuples into a page. On pages were all of the records have reached their final state, there will be 100 tuples. But on pages where updates are still happening there will -- after VACUUM -- be fewer than 100 tuples per page, because some fraction of the tuples on the page were dead row versions. That's why we were vacuuming them. Suppose typically half the tuples on each page are getting removed by VACUUM. Then, over time, as the table grows, if only VACUUM is ever run, our estimate of tuples per page will converge to 50, but in reality, as the table grows, the real number is converging to 100. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company