On Sun, Jun 19, 2011 at 11:12 AM, Cédric Villemain <cedric.villemain.deb...@gmail.com> wrote: >> Good point --- we would be making plan decisions based on the visibility >> map coverage. The big question is whether visibility map changes are >> more dynamic than the values we already plan against, like rows in the >> table, table size, and value distributions. I don't know the answer. > > Robert, I though of Covered-Index as just a usage of the vis map: > don't take the heap block if not needed. This look easier to do and > better in the long term (because read-only DB may quickly turn into a > no-heap access DB for example). Thus this is not real covered-index. > Did you want to implement real covered-index and did you have ideas on > how to do that ? Or just an optimization of the current > planner/executor on index usage ?
If by a "real" covered index you mean one that includes visibility info in the index - I have no plans to work on anything like that. If we were to do that, the index would become much larger and less efficient, whereas the approach of just optimizing the way our existing indexes are used doesn't have that disadvantage. It also sounds like a lot of work. Now, if someone else wants to demonstrate that it has advantages that are worth the costs and go do it, more power to said person, but I'm unexcited about it. > I don't know VM internals: > > * do we have a counter of ALL_VISIBLE flag set on a relation ? (this > should be very good for planner) > * do we need a pg_class.rel_vmvisible ?! (I have hands up, don't > shoot pleeaase) Evidently I'm developing a more frightening reputation than I would hope. :-( Anyway, yes, I do believe we need a table-level statistic for the percentage of the visibility map bits that are believed to be set. Having said that I think we need it, let me also say that I'm a bit skeptical about how well it will work. There are two problems: 1. Consider a query like "SELECT a, b FROM foo WHERE a = 1". To accurately estimate the cost of executing this query via an index-only scan (on an index over foo (a, b)), we need to know (i) the percentage of rows in the table for which a = 1 and (ii) the percentage *of those rows* which are on an all-visible page. We can assume that if 80% of the rows in the table are on all-visible pages, then 80% of the rows returned by this query will be on all-visible pages also, but that might be wildly wrong. This is similar to the problem of costing "SELECT * FROM foo WHERE a = 1 AND b = 1" - we know the fraction of rows where a = 1 and the fraction where b = 1, but there's no certainty that multiplying those values will produce an accurate estimate for the conjunction of those conditions. The problem here is not as bad as the general multi-column statistics problem because a mistake will only bollix the cost, not the row count estimate, but it's still not very nice. 2. Since VACUUM and ANALYZE often run together, we will be estimating the percentage of rows on all-visible pages just at the time when that percentage is highest. This is not exactly wonderful, either... I have a fair amount of hope that even with these problems we can come up with some adjustment to the planner that is better than just ignoring the problem, but I am not sure how difficult it will be. > * is it ok to parse VM for planning (I believe it is not) ? It doesn't seem like a good idea to me, but I just work here. I'm not sure what that would buy us. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers