On Mon, 2024-08-26 at 12:32 -0400, Robert Haas wrote: > I think there are two basic approaches that are possible here. If > someone sees a third option, let me know. First, we could allow users > to hook add_path() and add_partial_path().
... > The other possible approach is to allow extensions to feed some > information into the planner before path generation and let that > influence which paths are generated. Preserving a path for the right amount of time seems like the primary challenge for most of the use cases you raised (removing paths is easier than resurrecting one that was pruned too early). If we try to keep a path around, that implies that we need to keep parent paths around too, which leads to an explosion if we aren't careful. But we already solved all of that for pathkeys. We keep the paths around if there's a reason to (a useful pathkey) and there's not some other cheaper path that also satisfies the same reason. Idea: generalize the idea of "pathkeys" to work for other reasons to preserve a path. Mechanically, a hint to use an index could work very similarly: come up with a custom reason to keep a path around, such as "a hint suggests we use index foo_idx for table foo", and assign it a unique number. If there's another hint that says we should also use index bar_idx for table bar, then that reason would get a different unique reason number. (In other words, the number of reasons would not be fixed; there could be one reason for each hint specified in the query, kind of like there could be many interesting pathkeys for a query.) Each Path would have a "preserve_for_these_reasons" bitmapset holding all of the non-cost reasons we are preserving that path. If two paths have exactly the same set of reasons, then add_path() would only keep the cheaper one. We could get fancy and have a compare_reasons_hook that would allow you to take two paths with the same reason and see if there are other factors to consider that would cause both to still be preserved (similar to pathkey length). I suspect that we might see interesting applications of this mechanism in core as well: for instance, track partition keys or other properties relevant to parallelism. That could allow us to keep parallel-friendly paths around and then decide later in the planning process whether to actually parallelize them or not. Once we've generalized the "reasons" mechnism, it would be easy enough to have a hook to add reasons to a path as it's being generated to be sure it's not lost. These hooks should probably be called in the individual create_*_path() functions where there's enough context to know what's happening. There could be many such hooks, but I suspect only a handful of important ones. This idea allows the extension author to preserve the right paths long enough to use set_rel_pathlist_hook/set_join_pathlist_hook, which can editorialize on costs or do its own pruning. Regards, Jeff Davis