2020年11月6日(金) 0:40 Dmitry Dolgov <9erthali...@gmail.com>: > > > On Tue, Jan 14, 2020 at 12:46:02AM +0900, Kohei KaiGai wrote: > > The v2 patch is attached. > > > > This adds two dedicated lists on the RelOptInfo to preserve lesser paths > > if extension required to retain the path-node to be removed in usual manner. > > These lesser paths are kept in the separated list, so it never expand the > > length > > of pathlist and partial_pathlist. That was the arguable point in the > > discussion > > at the last October. > > > > The new hook is called just before the path-node removal operation, and > > gives extension a chance for extra decision. > > If extension considers the path-node to be removed can be used in the upper > > path construction stage, they can return 'true' as a signal to preserve this > > lesser path-node. > > In case when same kind of path-node already exists in the preserved_pathlist > > and the supplied lesser path-node is cheaper than the old one, extension can > > remove the worse one arbitrarily to keep the length of preserved_pathlist. > > (E.g, PG-Strom may need one GpuJoin path-node either pathlist or preserved- > > pathlist for further opportunity of combined usage with GpuPreAgg path-node. > > It just needs "the best GpuJoin path-node" somewhere, not two or more.) > > > > Because PostgreSQL core has no information which preserved path-node can > > be removed, extensions that uses path_removal_decision_hook() has > > responsibility > > to keep the length of preserved_(partial_)pathlist reasonable. > > Hi, > > Thanks for the patch! I had a quick look at it and have a few questions: > Sorry for the very late response. It's my oversight.
> * What would be the exact point/hook at which an extension can use > preserved pathlists? I guess it's important, since I can imagine it's > important for one of the issues mentioned in the thread about such an > extension have to re-do significant part of the calculations from > add_path. > set_join_pathlist_hook and create_upper_paths_hook For example, even if GpuPreAgg may be able to generate cheaper path with GpuJoin result, make_one_rel() may drop GpuJoin results due to its own cost estimation. In this case, if lesser GpuJoin path would be preserved somewhere, the extension invoked by create_upper_paths_hook can make GpuPreAgg path with GpuJoin sub-path; that can reduce data transfer between CPU and GPU. > * Do you have any benchmark results with some extension using this > hook? The idea with another pathlist of "discarded" paths sounds like > a lightweight solution, and indeed I've performed few tests with two > workloads (simple queries, queries with joins of 10 tables) and the > difference between master and patched versions is rather small (no > stable difference for the former, couple of percent for the latter). > But it's of course with an empty hook, so it would be good to see > other benchmarks as well. > Not yet. And, an empty hook will not affect so much. Even if the extension uses the hook, it shall be more lightweight than its own alternative implementation. In case of PG-Strom, it also saves Gpu-related paths in its own hash-table, then we look at the hash-table also to find out the opportunity to merge multiple GPU invocations into single invocation. > * Does it make sense to something similar with add_path_precheck, > which also in some situations excluding paths? > This logic allows to skip the paths creation that will obviously have expensive cost. Its decision is based on the cost estimation. The path_removal_decision_hook also gives extensions a chance to preserve pre-built paths that can be used later even if cost is not best. This decision is not only based on the cost. In my expectations, it allows to preserve the best path in the gpu related ones. > * This part sounds dangerous for me: > > > Because PostgreSQL core has no information which preserved path-node can > > be removed, extensions that uses path_removal_decision_hook() has > > responsibility > > to keep the length of preserved_(partial_)pathlist reasonable. > > since an extension can keep limited number of paths in the list, but > then the same hook could be reused by another extension which will > also try to limit such paths, but together they'll explode. > If Path node has a flag to indicate whether it is referenced by any other upper node, we can simplify the check whether it is safe to release. In the current implementation, the lesser paths except for IndexPath are released at the end of add_path. On the other hand, I'm uncertain whether the pfree(new_path) at the tail of add_path makes sense on the modern hardware, because they allow to recycle just small amount of memory, then entire memory consumption by the optimizer shall be released by MemoryContext mechanism. If add_path does not release path-node, the above portion is much simpler. Best regards, -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei <kai...@heterodb.com>