On Mon, Apr 6, 2020 at 9:46 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > > Tomas Vondra <tomas.von...@2ndquadrant.com> writes: > > I don't know, I've tried running the tests on a number of machines, > > similar to those failing. Rapsberry Pi, Fedora 31, ... and it worked > > everywhere while the failures seem consistent. > > On my machine, it reproduces about one time in six with > force_parallel_mode = regress. It seems possible given your > results that reducing max_parallel_workers would make it more > likely, but I've not tried that. > > What I'm seeing, after adding some debug printouts, is that sortMethod is > frequently zero when we reach the EXPLAIN output for a worker. In many of > the tests this happens even though there is no visible failure, because > we've got a filter function hiding the output :-( > > So I concur with James' conclusion that the existing code is relying on > sortMethod initializing to zeroes, and that we did the wrong thing by > trying to give SORT_TYPE_STILL_IN_PROGRESS a nonzero representation. > I do not like his patch though, particularly not the type pun with NULL.
Sentinel and NULL? I hadn't caught that at all. > I think the correct fix is to change the enum declaration. Hmm. I don't actually really like that, because it means the value here isn't actually semantically correct. That is, the sort type is not "in progress"; it's "we never started a sort at all". I don't really love the conflating of those things that the old enum declaration had (even it'd had a helpful comment). It seems to me that we should make "we don't have a type" and "we have a type" distinct. We could add a new enum value SORT_TYPE_UNINITIALIZED or similar though. James