Re: [HACKERS] the big picture for index-only scans

Robert Haas Sun, 19 Jun 2011 11:41:15 -0700

On Sun, Jun 19, 2011 at 11:12 AM, Cédric Villemain
<cedric.villemain.deb...@gmail.com> wrote:
>> Good point --- we would be making plan decisions based on the visibility
>> map coverage.  The big question is whether visibility map changes are
>> more dynamic than the values we already plan against, like rows in the
>> table, table size, and value distributions.  I don't know the answer.
>
> Robert, I though of Covered-Index as just a usage of the vis map:
> don't take the heap block if not needed. This look easier to do and
> better in the long term (because read-only DB may quickly turn into a
> no-heap access DB for example). Thus this is not real covered-index.
> Did you want to implement real covered-index and did you have ideas on
> how to do that ? Or just an optimization of the current
> planner/executor on index usage ?


If by a "real" covered index you mean one that includes visibility
info in the index - I have no plans to work on anything like that.  If
we were to do that, the index would become much larger and less
efficient, whereas the approach of just optimizing the way our
existing indexes are used doesn't have that disadvantage.  It also
sounds like a lot of work. Now, if someone else wants to demonstrate
that it has advantages that are worth the costs and go do it, more
power to said person, but I'm unexcited about it.

> I don't know VM internals:
>
>  * do we have a counter of ALL_VISIBLE flag set on a relation ? (this
> should be very good for planner)
>  * do we need a pg_class.rel_vmvisible ?! (I have hands up, don't
> shoot pleeaase)

Evidently I'm developing a more frightening reputation than I would hope.  :-(

Anyway, yes, I do believe we need a table-level statistic for the
percentage of the visibility map bits that are believed to be set.
Having said that I think we need it, let me also say that I'm a bit
skeptical about how well it will work.  There are two problems:

1. Consider a query like "SELECT a, b FROM foo WHERE a = 1".  To
accurately estimate the cost of executing this query via an index-only
scan (on an index over foo (a, b)), we need to know (i) the percentage
of rows in the table for which a = 1 and (ii) the percentage *of those
rows* which are on an all-visible page.  We can assume that if 80% of
the rows in the table are on all-visible pages, then 80% of the rows
returned by this query will be on all-visible pages also, but that
might be wildly wrong.  This is similar to the problem of costing
"SELECT * FROM foo WHERE a = 1 AND b = 1" - we know the fraction of
rows where a = 1 and the fraction where b = 1, but there's no
certainty that multiplying those values will produce an accurate
estimate for the conjunction of those conditions.  The problem here is
not as bad as the general multi-column statistics problem because a
mistake will only bollix the cost, not the row count estimate, but
it's still not very nice.

2. Since VACUUM and ANALYZE often run together, we will be estimating
the percentage of rows on all-visible pages just at the time when that
percentage is highest.  This is not exactly wonderful, either...

I have a fair amount of hope that even with these problems we can come
up with some adjustment to the planner that is better than just
ignoring the problem, but I am not sure how difficult it will be.

>  * is it ok to parse VM for planning (I believe it is not) ?

It doesn't seem like a good idea to me, but I just work here.  I'm not
sure what that would buy us.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] the big picture for index-only scans

Reply via email to