On Wed, Oct 7, 2020 at 7:00 PM Andy Fan <zhihui.fan1...@gmail.com> wrote: > > > > On Wed, Oct 7, 2020 at 5:05 PM Andy Fan <zhihui.fan1...@gmail.com> wrote: >> >> >> >> On Sun, Oct 4, 2020 at 3:10 PM Andy Fan <zhihui.fan1...@gmail.com> wrote: >>>> >>>> >>>> >>>> Now, in my experience, the current system for custom plans vs. generic >>>> plans doesn't approach the problem in this way at all, and in my >>>> experience that results in some pretty terrible behavior. It will do >>>> things like form a custom plan every time because the estimated cost >>>> of the custom plan is lower than the estimated cost of the generic >>>> plan even though the two plans are structurally identical; only the >>>> estimates differ. It will waste gobs of CPU cycles by replanning a >>>> primary key lookup 5 times just on the off chance that a lookup on the >>>> primary key index isn't the best option. But this patch isn't going >>>> to fix any of that. The best we can probably do is try to adjust the >>>> costing for Append paths in some way that reflects the costs and >>>> benefits of pruning. I'm tentatively in favor of trying to do >>>> something modest in that area, but I don't have a detailed proposal. >>>> >>> >>> I just realized this issue recently and reported it at [1], then Amit >>> pointed >>> me to this issue being discussed here, so I would like to continue this >>> topic >>> here. >>> >>> I think we can split the issue into 2 issues. One is the partition prune >>> in initial >>> partition prune, which maybe happen in custom plan case only and caused >>> the above issue. The other one happens in the "Run-Time" partition prune, >>> I admit that is an important issue to resolve as well, but looks harder. >>> So I >>> think we can fix the first one at first. >>> >>> ... When we count for the cost of a >>> generic plan, we can reduce the cost based on such information. >> >> >> This way doesn't work since after the initial partition prune, not only the >> cost of the Append node should be reduced, the cost of other plans should >> be reduced as well [1] >> >> However I think if we can use partition prune information from a custom plan >> at the cost_append_path stage, it looks the issue can be fixed. If so, the >> idea >> is similar to David's idea in [2], however Robert didn't agree with this[2]. >> Can anyone elaborate this objection? for a partkey > $1 or BETWEEN cases, >> some real results from the past are probably better than some hard-coded >> assumptions IMO. > > > I can understand Robert's idea now, he intended to resolve both the > "initial-partition-prune" case and "runtime partition prune" case while I > always think > about the former case (Amit reminded me about that at the beginning, but I > just > wake up right now..) > > With the Robert's idea, I think we can do some hack at create_append_path, > we can figure out the pruneinfo (it was done at create_append_plan now) at > that > stage, and then did a fix_append_path with such pruneinfo. We need to fix not > only the cost of AppendPath, but also rows of AppendPath, For example: > SELECT .. FROM t1, t2, p where t1.a = p.partkey and t1.b = t2.b, if the > plan is: > Nest Loop > Nest Loop > t1 > Append (p) > t2 > > Then the rows of Append (p) will be very important. > > For this idea, how to estimate how many partitions/rows can be pruned is a > key > part. Robert said "I was thinking, rather, that if we know for example that > we've doing pruning on partition_column = $1, then we know that only > one partition will match. That's probably a common case. If we've > got partition_column > $1, we could assume that, say, 75% of the > partitions would match. partition_column BETWEEN $1 and $2 is > probably a bit more selective, so maybe we assume 50% of the > partitions would match.". I think we can't say the 75% or 50% is a good > number, but the keypoint may be "partition_column = $1" is a common > case. And for the others case, we probably don't make it worse. > > I think we need to do something here, any thoughts? Personally I'm more > like this idea now.
Yes, I think we have to do something on those lines. But I am wondering why our stats machinery is failing to do that already. For equality I think it's straight forward. But even for other operators we should use the statistics from the partitioned table to estimate the selectivity and then adjust number of scanned partitions = (number of partitions * fraction of rows scanned). That might be better than 50% or 75%. -- Best Wishes, Ashutosh Bapat