On 1/8/2024 20:56, Alexander Korotkov wrote:
On Tue, Apr 2, 2024 at 6:07 AM Andrei Lepikhov
<a.lepik...@postgrespro.ru> wrote:
Actually, the idea I tried to express is the combination of #1 and #2:
to build individual plan for every partition, but consider the 'Common
Resources'. Let me explain this a bit more.
Thanks for keeping your eye on it!
My idea is to introduce a new property for paths selection.
3) Usage of common resources. The common resource can be: hash
representation of relation, memoize over relation scan, etc. We can
exclude the cost of common resource generation from the path cost, but
keep the reference for the common resource with its generation cost.
If one path uses more common resources than another path, it could
cost-dominate another one only if its cheaper together with its extra
common resources cost. If one path uses less or equal common
resources than another, it could normally cost-dominate another one.
The most challenging part for me is the cost calculation, which is
bonded with estimations of other paths. To correctly estimate the
effect, we need to remember at least the whole number of paths sharing
resources.
Also, I wonder if it can cause some corner cases where prediction error
on a shared resource will cause an even worse situation upstream.
I think we could push off here from an example and a counter-example,
but I still can't find them.
However, I understand this is huge amount of work given we have to
introduce new basic optimizer concepts. I get that the main
application of this patch is sharding. If we have global tables
residing each shard, we can push down any joins with them. Given this
patch gives some optimization for non-sharded case, I think we
*probably* can accept its concept even that it this optimization is
obviously not perfect.
Yes, right now sharding is the most profitable case. We can push down
parts of the plan which references only some common resources:
FunctionScan, ValueScan, tables which can be proved are existed
everywhere and provide the same output. But for now it is too far from
the core code, IMO. - So, I search for cases that can be helpful for a
single instance.
--
regards,
Andrei Lepikhov
Postgres Professional