On Tue, Mar 21, 2017 at 2:03 PM, Peter Geoghegan <p...@bowt.ie> wrote: > I think that since the comment refers to code from before 1999, it can > go. Any separate patch to remove it would have an entirely negative > linediff.
It's a good general principle that a patch should do one thing well and not make unrelated changes. I try hard to adhere to that principle in my commits, and I think other committers generally do (and should), too. Of course, different people draw the line in different places. If you can convince another committer to include that change in their commit of this patch, well, that's not my cup of tea, but so be it. If you want me to consider committing this, you're going to have to submit that part separately, preferably on a separate thread with a suitably descriptive subject line. > Obviously iff you write to the file in the leader, there is little > that the worker can do afterwards, but it's not a given that you'd > want to do that, and this patch actually never does. You could equally > well say that PHJ fails to provide for my requirement for having the > leader write to the files sensibly in order to recycle blocks, a > requirement that its shared BufFile mechanism expressly does not > support. >From my point of view, the main point is that having two completely separate mechanisms for managing temporary files that need to be shared across cooperating workers is not a good decision. That's a need that's going to come up over and over again, and it's not reasonable for everybody who needs it to add a separate mechanism for doing it. We need to have ONE mechanism for it. The second point is that I'm pretty convinced that the design you've chosen is fundamentally wrong. I've attempt to explain that multiple times, starting about three months ago with http://postgr.es/m/ca+tgmoyp0vzpw64dfmqt1jhy6szyavjoglkj3ermzzzn2f9...@mail.gmail.com and continuing across many subsequent emails on multiple threads. It's just not OK in my book for a worker to create something that it initially owns and then later transfer it to the leader. The cooperating backends should have joint ownership of the objects from the beginning, and the last process to exit the set should clean up those resources. >> That would cut hundreds of >> lines from this patch with no real disadvantage that I can see -- >> including things like worker_wait(), which are only needed because of >> the shortcomings of the underlying mechanism. > > I think it would definitely be a significant net gain in LOC. And, > worker_wait() will probably be replaced by the use of the barrier > abstraction anyway. No, because if you do it Thomas's way, the worker can exit right away, without waiting. You don't have to wait via a different method; you escape waiting altogether. I understand that your point is that the wait will always be brief, but I think that's probably an optimistic assumption and definitely an unnecessary assumption. It's optimistic because there is absolutely no guarantee that all workers will take the same amount of time to sort the data they read. It is absolutely not the case that all data sets sort at the same speed. Because of the way parallel sequential scan works, we're somewhat insulated from that; workers that sort faster will get a larger chunk of the table. However, that only means that workers will finish generating their sorted runs at about the same time, not that they will finish merging at the same time. And, indeed, if some workers end up with more data than others (so that they finish building runs at about the same time) then some will probably take longer to complete the merging than others. But even if were true that the waits will always be brief, I still think the way you've done it is a bad idea, because now tuplesort.c has to know that it needs to wait because of some detail of lower-level resource management about which it should not have to care. That alone is a sufficient reason to want a better approach. I completely accept that whatever abstraction we use at the BufFile level has to be something that can be plumbed into logtape.c, and if Thomas's mechanism can't be bolted in there in a sensible way then that's a problem. But I feel quite strongly that the solution to that problem isn't to adopt the approach you've taken here. >> + * run. Parallel workers always use quicksort, however. >> >> Comment fails to mention a reason. > > Well, I don't think that there is any reason to use replacement > selection at all, what with the additional merge heap work last year. > But, the theory there remains that RS is good when you can get one big > run and no merge. You're not going to get that with parallel sort in > any case, since the leader must merge. Besides, merging in the workers > happens in the workers. And, the backspace requirement of 32MB of > workMem per participant pretty much eliminates any use of RS that > you'd get otherwise. So, please mention that briefly in the comment. > I believe that the main reason that you like the design I came up with > on the whole is that it's minimally divergent from the serial case. That's part of it, I guess, but it's more that the code you've added to do parallelism here looks an awful lot like what's gotten added to do parallelism in other cases, like parallel query. That's probably a good sign. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers