David Rowley <david.row...@2ndquadrant.com> writes: > [ eclass_indexes_v3.patch ]
I looked at this for a little bit. I'm on board with the basic idea of assigning integer indexes to the ECs and keeping bitmapsets of relevant EC indexes in RelOptInfos. However ... man, is that delete_eclass() thing ugly. And slow, and fragile-feeling. I think that maybe we could improve that situation by not trying to maintain the RelOptInfo.eclass_indexes sets at all during the initial construction of the ECs, but only setting them up after we've completed doing that. We already have an assumption that we know when EC merging is done and it's safe to make pathkeys that reference the "canonical" (merged) ECs, so it seems like that would be a point where we could build the indexing bitmapsets safely. This hinges on not needing the bitmapsets till after that point, but it'd be easy to arrange some Asserts verifying that we don't try to use them before that. If that doesn't work (because we need the index info sooner), maybe we could consider never removing ECs from eq_classes, so that their indices never change, then teaching most/all of the loops over eq_classes to ignore entries with non-null ec_merged. But we probably want the indexing bitmapsets to reflect canonical EC numbers, so we'd still have to do some updating I fear -- or else be prepared to chase up the ec_merged links when using the index bitmaps. Stepping back a little bit, I wonder whether now is the time to rethink the overall EC data structure, as foreseen in this old comment: * Note: constructing merged EquivalenceClasses is a standard UNION-FIND * problem, for which there exist better data structures than simple lists. * If this code ever proves to be a bottleneck then it could be sped up --- * but for now, simple is beautiful. I'm not very sure what a better data structure would really look like --- after perusing Sedgewick's section on UNION-FIND, it seems like the ec_merged links are more or less what he's recommending anyway, and the expensive part of this is probably all the equal() tests to find whether two proposed EC member expressions are already in the set of known expressions. So I'm just handwaving here. But my point is that we needn't feel wedded to the idea of keeping the ECs in a list, if we can think of some better data structure for them. regards, tom lane