Thanks a lot for your response Tom. May I ask how do you debug those functions? Or is it just that you read the code and more or less guess what should be the value for each variable with information coming from querying Postgres tables?
Thanks a lot. El lun, 28 abr 2025 a las 17:07, Tom Lane (<t...@sss.pgh.pa.us>) escribió: > Laurenz Albe <laurenz.a...@cybertec.at> writes: > > On Mon, 2025-04-28 at 15:22 +0200, Felipe López Montes wrote: > >> Following the book PostgreSQL Query Optimization (Second Edition), > there is a > >> statement on page 90 talking about Partial Indexes that says that the > planner > >> will use the partial index rather than the full index on the flight > table, > >> however after doing my own tests I have checked that this is not true > and the > >> planner estimates that scanning the full index is cheaper than scanning > the > >> partial one and would like to understand why. > > > Which index is bigger (you can use \di+ in "psql")? > > I find that I can reproduce something similar with a very tiny > partial index: the cost estimate for the partial index comes out > higher than for an equivalent query using a non-partial index. > After tracing through it, the blame seems to affix to this > formula in genericcostestimate: > > /* > * Estimate the number of index pages that will be retrieved. > * > * We use the simplistic method of taking a pro-rata fraction of the > total > * number of index pages. In effect, this counts only leaf pages and > not > * any overhead such as index metapage or upper tree levels. > * > * In practice access to upper index levels is often nearly free > because > * those tend to stay in cache under load; moreover, the cost involved > is > * highly dependent on index type. We therefore ignore such costs here > * and leave it to the caller to add a suitable charge if needed. > */ > if (index->pages > 1 && index->tuples > 1) > numIndexPages = ceil(numIndexTuples * index->pages / > index->tuples); > else > numIndexPages = 1.0; > > (numIndexTuples is the estimated number of index entries to be > visited.) In the example I'm looking at, the query wants to > retrieve 5 rows and the index holds exactly those 5 rows, so > > (gdb) p numIndexTuples > $38 = 5 > (gdb) p index->pages > $39 = 2 > (gdb) p index->tuples > $40 = 5 > > and numIndexPages comes out to 2, ie we expect to visit the > whole index. But if we're considering a non-partial index, > > (gdb) p numIndexTuples > $44 = 5 > (gdb) p index->pages > $45 = 17 > (gdb) p index->tuples > $46 = 10000 > > and numIndexPages comes out to 1, so we estimate half as much > disk access cost and the partial index looks worse. > > I think what's wrong here is that index->pages is the entire > size of the index including the meta page, but the calculation > is being done (as the comment says) on the assumption that > only leaf pages are involved. If we were to exclude the meta > page from the calculation then we'd conclude numIndexPages = 1 > for both indexes. Felipe is considering a slightly larger index > but I bet it's fundamentally the same issue. With indexes having > more than a few hundred entries, the delta due to the meta page > would drop into the noise and we'd eventually prefer the partial > index. But I think the absolute index size only contributes to > our estimate of descentCost (in btcostestimate) so you'd need > fair-sized indexes before that reliably wins out. > > I don't consider this a serious defect: for this size of index > it barely matters which one the planner picks, as evidenced by > the fact that the true execution times are so close. But it could > be something to try to improve. We could trivially discount the > meta page for index types that have one. Discounting intermediate > upper pages would take more calculation (since we don't know a-priori > how many there are) and might not be worth the trouble. > > regards, tom lane >