On Wed, May 27, 2020 at 8:01 PM Ashutosh Bapat <
ashutosh.ba...@2ndquadrant.com> wrote:

>
>
> On Wed, 27 May 2020 at 04:43, Andy Fan <zhihui.fan1...@gmail.com> wrote:
>
>> You can use the attached sql to reproduce this issue, but I'm not sure
>> you can
>> get the above result at the first time that is because when optimizer
>> think the
>> 2 index scan have the same cost, it will choose the first one it found,
>> the order
>> depends on RelationGetIndexList.  If so,  you may try drop and create
>> j1_i_im5 index.
>>
>> The sense behind this patch is we still use the cost based optimizer,
>> just when we
>> we find out the 2 index scans have the same cost,  we prefer to use the
>> index which
>> have more qual filter on Index Cond.  This is implemented by adjust the
>> qual cost
>> on index filter slightly higher.
>>
>
> Thanks for the example and the explanation.
>
> The execution time difference in your example is pretty high to account
> for executing the filter on so many rows. My guess is this has to do with
> the heap access. For applying the filter the entire row needs to be fetched
> from the heap. So we should investigate this case from that angle. Another
> guess I have is the statistics is not correct and hence the cost is wrong.
>
>
I believe this is a statistics issue and then the cost is wrong.  More
characters of this
issue are:  1).  If a data is out of range in the old statistics,
optimizer will given an 1 row
assumption.  2).  based on the 1 row assumption,  for query
"col1=out_of_range_val AND
col2 = any_value"   Index (col1, col2) and (col1, col3) will have exactly
same cost for current
cost model. 3).  If the statistics was wrong, (col1, col3) maybe a very bad
plan as shown
above, but index (col1, col2) should  always better/no worse than (col1,
col3) in any case.
4). To expand the rule, for query "col1 = out_of_range_val AND col2 =
any_value AND col3 = any_val",
index are (col1, col2, col_m) and (col1, col_m, col_n),  the former index
will aways has better/no worse
than the later one.  5). an statistics issue like this is not  uncommon,
for example
an log based application, creation_date is very easy to out of range in
statistics.

so we need to optimize the cost model for such case, the method is the
patch I mentioned above.
I can't have a solid data to prove oracle did something similar, but based
on the talk with my
customer,  oracle is likely did something like this.

-- 
Best Regards
Andy Fan

Reply via email to