2010/9/20 Robert Haas <robertmh...@gmail.com>: > 4. TGL: "In any case, I'm amazed that it's not failing regression > tests all over the place with those critical tests in > make_sort_from_pathkeys lobotomized by random #ifdef FIXMEs. Perhaps > we need some more regression tests...". Obviously, we need to remove > that lobotomy and insert the correct fix for whatever problem it was > trying to solve. Adding some regression tests seems wise, too.
So, I spent a lot of time today trying to understand why these random FIXMEs were inserted and just exactly what they break or, uh, fix. I didn't get very far. The regression tests all pass with the latest version of Merge Append, with and without these FIXMEs, and in fact when I extracted just those changes and applied them to HEAD, the regression tests still pass. So I said to myself: "Let's come up with a regression test that illustrates why these changes are Evil and Dangerous, and then we can add that to the regression test suite, per Tom's suggestion." This proved to be easier said than done: apparently our code is so good that it works even with random sections commented out. Go us. As a debugging aide, instead of actually diking these tests out, I made them elog(LOG). This has the same effect as removing them entirely but it makes it possible to see whether you've triggered the relevant code path. Then I tried to trigger those code paths, which I shall hereinafter refer to as FIXME #1, FIXME #2, FIXME #3, and FIXME #4, as per the attached patch. It proved to be pretty easy to trigger FIXME #3; indeed, just about every query I tried did the job. CREATE TABLE parent (a integer NOT NULL, b integer NOT NULL); CREATE TABLE child (b integer, a integer, c text NOT NULL) INHERITS (parent); CREATE TABLE joinme (j integer NOT NULL); I populated joinme with a single row with the value 1, and child with a million rows with a running from 1 to a million, b running from 10 million down to 9 million and 1, and c getting random()::text || random()::text || random()::text || random()::text. Then you can trigger FIXME #3 with either of these two (the second also triggers FIXME #4): select * from joinme, parent where a = j and j = 1; select * from parent order by a; I believe that the first case fires because of the "ec_has_const" condition and the second from the "EC only has at most one member condition". However, it's difficult to see what problem this actually causes. From what I gather from reading the code, em_is_child EquivalenceMembers are only supposed to affect the construction of inner index-scan paths; and certainly if there's at most one EquivalenceMember it's unclear to me how you'd be forming a join. Maybe it's possible to break something in the ec_has_const case, but I haven't been to puzzle out what that thing is. FIXME #1 and FIXME #2 were much harder to trigger. In fact, barring significant further lobotimization of the code, I couldn't. For FIXME #1, the right sort of query seems to be something like this: select 1 from joinme left join parent on a = j where j = 1 order by a; ...but the problem is that in every example I could construct, the em_is_child EquivalenceMembers were *after* the members for the parent rel, so you never get far enough in the loop to see them. For related reasons, I couldn't get FIXME #2 to fire, either: in order to have a chance of firing FIXME #2, you have to get all the way through the loop where FIXME #1 is located without finding a match. I can't believe all of this code is here just for fun, but in every example I could come up with you quickly find a match in the first loop, and never even finish visiting all the members of that list, let alone reach the second loop. Somehow, you need to construct a case where the values to be sorted aren't directly emitted by the node immediately under the sort, but I couldn't figure out how to do that - no matter how I rearranged things, the planner just computed the necessary outputs one level down. Anyone have an idea? I did find one apparent weirdness in the way explain outputs sort keys, namely: rhaas=# explain (costs off) select 2 as x union all select 1 as x order by x; QUERY PLAN -------------------- Sort Sort Key: (2) -> Append -> Result -> Result (5 rows) This seems to have nothing to do with the problem at hand, but it looks pretty weird. The planner is in fact sorting on the column, not on the literal value 2, so this is (I think) just a display error. "x" would be a more reasonable representation of the sort key. All of this leaves me wondering why Greg ended up ifdefing out this code in the first place. There's probably something I'm missing here... but for now I can't think of a better idea than just removing the #ifdefs and hoping that whatever problem they were causing was limited to an earlier version of the code that no longer exists. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
planner_lobotomy.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers