On 2017-11-15 08:37:11 -0500, Robert Haas wrote: > On Tue, Nov 14, 2017 at 4:24 PM, Andres Freund <and...@anarazel.de> wrote: > >> I agree, and I am interested in that subject. In the meantime, I > >> think it'd be pretty unfair if parallel-oblivious hash join and > >> sort-merge join and every other parallel plan get to use work_mem * p > >> (and in some cases waste it with duplicate data), but Parallel Hash > >> isn't allowed to do the same (and put it to good use). > > > > I'm not sure I care about fairness between pieces of code ;) > > I realize you're sort of joking here, but I think it's necessary to > care about fairness between pieces of code.
Indeed I kinda was. > I mean, the very first version of this patch that Thomas submitted was > benchmarked by Rafia and had phenomenally good performance > characteristics. That turned out to be because it wasn't respecting > work_mem; you can often do a lot better with more memory, and > generally you can't do nearly as well with less. To make comparisons > meaningful, they have to be comparisons between algorithms that use > the same amount of memory. And it's not just about testing. If we > add an algorithm that will run twice as fast with equal memory but > only allow it half as much, it will probably never get picked and the > whole patch is a waste of time. But this does bug me, and I think it's what made me pause here to make a bad joke. The way that parallelism treats work_mem makes it even more useless of a config knob than it was before. Parallelism, especially after this patch, shouldn't compete / be benchmarked against a single-process run with the same work_mem. To make it "fair" you need to compare parallelism against a single threaded run with work_mem * max_parallelism. Thomas argues that this makes hashjoins be treated faily vis-a-vi parallel-oblivious hash join etc. And I think he has somewhat of a point. But I don't think it's quite right either: In several of these cases the planner will not prefer the multi-process plan because it uses more work_mem, it's a cost to be paid. Whereas this'll optimize towards using work_mem * max_parallel_workers_per_gather amount of memory. This makes it pretty much impossible to afterwards tune work_mem on a server in a reasonable manner. Previously you'd tune it to something like free_server_memory - (max_connections * work_mem * 80%_most_complex_query). Which you can't really do anymore now, you'd also need to multiply by max_parallel_workers_per_gather. Which means that you might end up "forcing" paralellism on a bunch of plans that'd normally execute in too short a time to make parallelism worth it. I don't really have a good answer to "but what should we otherwise do", but I'm doubtful this is quite the right answer. Greetings, Andres Freund