You all know this FAQ: "Why does Postgres not use my index?" Half of the time this problem can easily be solved by casting a literal to the type of the respective column; this is not my topic here.
In many other cases it turns out that the planner over-estimates the cost of an index scan. Sometimes this can be worked around by lowering random_page_cost. "Of course, that's a hack that is quite unrelated to the real problem." I strongly agree ;-) AFAICS (part of) the real problem is in costsize.c:cost_index() where IO_cost is calculated from min_IO_cost, pages_fetched, random_page_cost, and indexCorrelation. The current implementation uses indexCorrelation^2 to interpolate between min_IO_cost and max_IO_cost, which IMHO gives results that are too close to max_IO_cost. This conjecture is supported by the fact, that often actual run times are much lower than estimated, when seqscans are disabled. So we have to find a cost function, so that . min_IO_cost <= cost <= max_IO_cost for -1 <= indexCorrelation <= 1 . cost --> min_IO_cost for indexCorrelation --> +/- 1 . cost --> max_IO_cost for indexCorrelation --> 0 . cost tends more towards min_IO_cost than current implementation After playing around a bit I propose three functions satisfying above conditions. All proposals use absC = abs(indexCorrelation). Proposal 1: Use absC for interpolation. IO_cost = absC * min_IO_cost + (1 - absC) * max_IO_cost; Proposal 2: First calculate estimates for numbers of pages and cost per page, then multiply the results. estPages = absC * minPages + (1 - absC) * maxPages; estPCost = absC * 1 + (1 - absC) * random_page_cost; /* ^ sequential_page_cost */ IO_cost = estPages * estPCost; Proposal 3: Interpolate "geometrically", using absC. IO-cost = exp( absC * ln(min_IO_Cost) + (1 - absC) * ln(max_IO_cost)); Here are some numbers for seq_page_cost = 1 (constant) random_page_cost = 4 (GUC) minPages = 61 maxPages = 1440 corr current p1 p2 p3 0 5760.00 5760.00 5760.00 5760.00 0.1 5703.01 5190.10 4817.77 3655.22 0.2 5532.04 4620.20 3958.28 2319.55 0.3 5247.09 4050.30 3181.53 1471.96 0.4 4848.16 3480.40 2487.52 934.08 0.5 4335.25 2910.50 1876.25 592.76 0.6 3708.36 2340.60 1347.72 376.16 0.7 2967.49 1770.70 901.93 238.70 0.8 2112.64 1200.80 538.88 151.48 0.9 1143.81 630.90 258.57 96.13 0.95 616.65 345.95 149.44 76.57 0.99 174.41 117.99 77.03 63.84 0.995 117.85 89.50 68.91 62.40 0.999 72.39 66.70 62.57 61.28 1 61.00 61.00 61.00 61.00 Another example for seq_page_cost = 1 (constant) random_page_cost = 10 (GUC) minPages = 20 maxPages = 938.58 corr current p1 p2 p3 0 9385.79 9385.79 9385.79 9385.79 0.1 9292.14 8449.21 7705.17 5073.72 0.2 9011.16 7512.64 6189.88 2742.73 0.3 8542.87 6576.06 4839.94 1482.65 0.4 7887.27 5639.48 3655.34 801.48 0.5 7044.35 4702.90 2636.09 433.26 0.6 6014.11 3766.32 1782.19 234.21 0.7 4796.56 2829.74 1093.62 126.61 0.8 3391.69 1893.16 570.40 68.44 0.9 1799.50 956.58 212.53 37.00 0.95 933.16 488.29 95.60 27.20 0.99 206.38 113.66 31.81 21.27 0.995 113.42 66.83 25.70 20.62 0.999 38.72 29.37 21.11 20.12 1 20.00 20.00 20.00 20.00 (If you want to play around with your own numbers, I can send my OOo spreadsheet privately or to the list.) The second example shows that especially with proposal 3 we could afford to set random_page_cost to a *higher* value, which in contrast to previous recommendations seems to be appropriate, IIRC that benchmark results posted here showed values of up to 60. As nobody knows how each of these proposals performs in real life under different conditions, I suggest to leave the current implementation in, add all three algorithms, and supply a GUC variable to select a cost function. Comments? Ideas? Objections? Servus Manfred ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]