Moin, On Wed, April 4, 2018 11:41 pm, David Rowley wrote: > Hi Tomas, > > Thanks for taking another look. > > On 5 April 2018 at 07:12, Tomas Vondra <tomas.von...@2ndquadrant.com> > wrote: >> Other than that, the patch seems fine to me, and it's already marked as >> RFC so I'll leave it at that. > > Thanks.
I have one more comment - sorry for not writing sooner, the flu got to me ... Somewhere in the code there is a new allocation of memory when the string grows beyond the current size - and that doubles the size. This can lead to a lot of wasted space (think: constructing a string that is a bit over 1 Gbyte, which would presumable allocate 2 GByte). The same issue happens when each worker allocated 512 MByte for a 256 Mbyte + 1 result. IMHO a factor of like 1.4 or 1.2 would work better here - not sure what the current standard in situations like this in PG is. >> The last obstacle seems to be the argument about the risks of the patch >> breaking queries of people relying on the ordering. Not sure what's are >> the right next steps in this regard ... > > yeah, seems like a bit of a stalemate. > > Personally, think if the group of people Tom mentions do exist, then > they've already been through some troubled times since Parallel Query > was born. I'd hope they've already put up their defenses due to the > advent of that feature. I stand by my thoughts that it's pretty > bizarre to draw the line here when we've probably been causing these > people issues for many years already. I said my piece on this already > so likely not much point in going on about it. These people are also > perfectly capable of sidestepping the whole issue with setting > max_parallel_workers_per_gather TO 0. Coming from the Perl side, this issue has popped up there a lot with the ordering of hash keys (e.g. "for my $key (keys %$hash) { ... }") and even tho there was never a guarantee, it often caught people by surprise. Mostly in testsuites, tho, because real output often needs some ordering, anyway. However, changes where nevertheless done, and code had to be adapted. But the world didn't end, so to speak. For PG, I think the benefits greatly outweight the problems with the sorting order - after all, if you have code that relies on an implicit order, you've already got a problem, no? So my favourite would be something along these lines: * add the string_agg * document it in the release notes, and document workaround/solutions (add ORDER BY, disabled workers etc.) * if not already done, stress in the documentation that if you don't ORDER things, things might not come out in the order you expect - especially with paralell queries, new processing nodes etc. Best wishes, Tels PS: We use string_agg() in a case where we first agg each row, then string_agg() all rows, and the resulting string is really huge. We did run into the "out of memory"-problem, so we now use a LIMIT and assembly the resulting parts on the client side. My wish here would be to better know how large the LIMIT can be, I found it quite difficult to predict with how many rows PG runs out of memory for the string buffer, even tho all rows have almost the same length as text. But that aside, getting the parts faster with parallel agg would be very cool, too.