On Fri, 8 Dec 2023 at 18:02, Ashutosh Bapat <ashutosh.bapat....@gmail.com> wrote: > given path. E.g. we have three path chains as follows > 1. joinpath_1->joinpath_2->seqpath_1, > 2. joinpath_3->joinpath_4->seqpath_1, > 3. joinpath_5->joinpath_2->seqpath_1 > > Please note that this is not full path tree/forest. It is difficult to > describe the whole path forest in text format. But this should be > sufficient to get the gist of how linking and unlinking works. > > Let's say all these paths are listed in pathlists of respective > RelOptInfos. Assume that joinpath_1, joinpath_3 and joinpath_5 are all > part of the topmost RelOptInfo. Then the refcounts will be as follows > joinpath_1, joinpath_3 and joinpath_5 -> each 1 (RelOptInfo) > joinpath_2 - 3 (joinpath_2, joinpath_5 and RelOptInfo) > joinpath_4 - 2 (joinpath_3, RelOptInfo) > seqpath_1 - 3 (joinpath_2, joinpath_4, RelOptInfo)
I think this likely works fine providing you always unlink paths from the root of the path tree. When you start unlinking from non-root paths I think you could start to see problems unless you increment refcounts recursively. The problem I had when working on improving the union planner was when building the final paths for a union child. Let's say I have the query: SELECT a,b FROM t1 UNION SELECT x,y FROM t2; When planning the first union child, I'd like to provide the union planner with a sorted path and also the cheapest scan path on t1 to allow it to Hash Aggregate on the cheapest path. Let's say the planner produces the following paths on t1. SeqScan_t1 When I create the final union child paths I want to try and produce a few sorted paths to allow MergeAppend to work. I'll loop over each path in t1's pathlist. SeqScan_t1 isn't sorted, so I'll make Sort1->SeqScan_t1 add add_path that to final_rel. Now, I also want to allow cheap Hash Aggregates to implement the UNION, so I'll include SeqScan_t1 without sorting it. If I now do add_path(final_rel, SeqScan_t1), add_path() could see that the SeqScan cost is fuzzily the same as the Sort->SeqScan_t1 and since the sort version has better PathKeys, add_path will unlink SeqScan_t1. Now, I don't think your scheme for refcounting paths is broken by this because you'll increment the refcount of the SeqScan_t1 when the Sort uses it as its subpath, but if instead of a Sort->SeqScan that was IncrementalSort->Sort->SeqScan, then suddenly you're not incrementing the refcount for the SeqScan when you create the Incremental Sort due to the Sort being between the two. So, if I now do add_path(final_rel, Sort2->Incremental_Sort->SeqScan_t1) Sort1 is rejected due to cost and we unlink it. Sort1 refcount gets to 0 and we free the path and the SeqScan_t1's refcount goes down by 1 but does not reach 0. We then add_path(final_rel, SeqScan_t1) but this path is fuzzily the same cost as Sort2 so we reject SeqScan_t1 due to Sort2 having better pathkeys and we unlink SeqScan_t1. Refcount on SeqScan_t1 gets to 0 and we free the path. We need SeqScan_t1 for the incremental sort. Perhaps in reality here, since SeqScan_t1 exists in the base rel's pathlist we'd still have a refcount from that, but I assume at some point you want to start freeing up paths from RelOptInfos that we're done with, in which case the SeqScan_t1's refcount would reach 0 at that point and we'd crash during create plan of the Sort2->Incremental_Sort->SeqScan_t1 plan. Have I misunderstood something here? As far as I see it with your non-recursive refcounts, it's only safe to free from the root of a pathtree and isn't safe when unlinking paths that are the subpath of another path unless the unlinking is done from the root. David