On Wed, Jan 24, 2018 at 3:37 PM, Robert Haas <robertmh...@gmail.com> wrote: > Well, I've been resisting that approach from the very beginning of > parallel query. Eventually, I hope that we're going to go in the > direction of changing our mind about how many workers parallel > operations use "on the fly". For example, if there are 8 parallel > workers available and 4 of them are in use, and you start a query (or > index build) that wants 6 but only gets 4, it would be nice if the > other 2 could join later after the other operation finishes and frees > some up.
That seems like a worthwhile high-level goal. I remember looking into Intel Threading Building Blocks many years ago, and seeing some interesting ideas there. According to Wikipedia, "TBB implements work stealing to balance a parallel workload across available processing cores in order to increase core utilization and therefore scaling". The programmer does not operate in terms of an explicit number of threads, and there are probably certain types of problems that this has an advantage with. That model also has its costs, though, and I don't think it's every going to supplant a lower level approach. In an ideal world, you have both things, because TBB's approach apparently has high coordination overhead on many core systems. > That, of course, won't work very well if parallel operations > are coded in such a way that the number of workers must be nailed down > at the very beginning. But my whole approach to sorting is based on the idea that each worker produces a roughly even amount of output to merge. I don't see any scope to do better for parallel CREATE INDEX. (Other uses for parallel sort are another matter, though.) > Now maybe all that seems like pie in the sky, and perhaps it is, but I > hold out hope. For queries, there is another consideration, which is > that some queries may run with parallelism but actually finish quite > quickly - it's not desirable to make the leader wait for workers to > start when it could be busy computing. That's a lesser consideration > for bulk operations like parallel CREATE INDEX, but even there I don't > think it's totally negligible. Since I don't have to start this until the leader stops participating as a worker, there is no wait in the leader. In the vast majority of cases, a call to something like WaitForParallelWorkersToAttach() ends up looking at state in shared memory, immediately determining that every launched process initialized successfully. The overhead should be negligible in the real world. > For both reasons, it's much better, or so it seems to me, if parallel > operations are coded to work with the number of workers that show up, > rather than being inflexibly tied to a particular worker count. I've been clear from day one that my approach to parallel tuplesort isn't going to be that useful to parallel query in its first version. You need some kind of partitioning (a distribution sort of some kind) for that, and probably plenty of cooperation from within the executor. I've also said that I don't think we can do much better for parallel CREATE INDEX even *with* support for partitioning, which is something borne out by comparisons with other systems. My patch was always presented as an 80/20 solution. I have given you specific technical reasons why I think that using a barrier is at least a bad idea for nbtsort.c, and probably for nodeGather.c, too. Those problems will need to be worked through if you're not going to concede the point on using a barrier. Your aspirations around not assuming that workers cannot join later seem like good ones, broadly speaking, but they are not particularly applicable to how *anything* happens to work now. Besides all this, I'm not even suggesting that I need to know the number of workers up front for parallel CREATE INDEX. Perhaps nworkers_launched can be incremented after the fact following some later enhancement to the parallel infrastructure, in which case parallel CREATE INDEX will theoretically be prepared to take advantage right away (though other parallel sort operations seem more likely to *actually* benefit). That will be a job for the parallel infrastructure, though, not for each and every parallel operation -- how else could we possibly hope to add more workers that become available half way through, as part of a future enhancement to the parallel infrastructure? Surely every caller to CreateParallelContext() should not need to invent their own way of doing this. All I want is to be able to rely on nworkers_launched. That's not in tension with this other goal/aspiration, and actually seems to complement it. -- Peter Geoghegan