On 7/15/22 4:36 PM, Tomas Vondra wrote:


On 7/13/22 17:32, Andres Freund wrote:
Hi,

On 2022-07-13 09:23:00 -0400, Jonathan S. Katz wrote:
On 7/13/22 12:13 AM, David Rowley wrote:
On Tue, 12 Jul 2022 at 17:15, David Rowley <dgrowle...@gmail.com> wrote:
So far only Robert has raised concerns with this regression for PG15
(see [2]). Tom voted for leaving things as they are for PG15 in [3].
John agrees, as quoted above. Does anyone else have any opinion?

Let me handle this slightly differently.  I've moved the open item for
this into the "won't fix" section.  If nobody shouts at me for that
then I'll let that end the debate.  Otherwise, we can consider the
argument when it arrives.

The RMT discussed this issue at its meeting today (and a few weeks back --
apologies for not writing sooner). While we agree with your analysis that 1/
this issue does appear to be a corner case and 2/ the benefits outweigh the
risks, we still don't know how prevalent it may be in the wild and the
general impact to user experience.

The RMT suggests that you make one more pass at attempting to solve it.

I think without a more concrete analysis from the RMT that's not really
actionable. What risks are we willing to accept to resolve this? This is
mostly a question of tradeoffs.

Several "senior" postgres hackers looked at this and didn't find any solution
that makes sense to apply to 15. I don't think having David butt his head
further against this code is likely to achieve much besides a headache.  Note
that David already has a patch to address this for 16.


I agree with this. It's not clear to me how we'd asses how prevalent it
really is (reports on a mailing list surely are not a very great way to
measure this). My personal opinion is that it's a rare regression. Other
optimization patches have similar rare regressions, except that David
spent so much time investigating this one it seems more serious.

I think it's fine to leave this as is. If we feel we have to fix this
for v15, it's probably best to just apply the v16. I doubt we'll find
anything simpler.

I think the above is reasonable.

If there does not appear to be a clear path forward, we should at least
document how a user can detect and resolve the issue.

To me that doesn't really make sense. We have lots of places were performance
changes once you cross some threshold, and lots of those are related to
work_mem.

Yes, but in this case this is nonobvious to a user. A sort that may be performing just fine on a pre-PG15 version is suddenly degraded, and the user has no guidance as to why or how to remediate.

We don't, e.g., provide tooling to detect when performance in aggregation
regresses due to crossing work_mem and could be fixed by a tiny increase in
work_mem.


Yeah. I find it entirely reasonable to tell people to increase work_mem
a bit to fix this. The problem is knowing you're affected :-(

This is the concern the RMT discussed. Yes it is reasonable to say "increase work_mem to XYZ", but how does a user know how to detect and apply that? This is the part that is worrisome, especially because we don't have any sense of what the overall impact will be.

Maybe it's not much, but we should document that there is the potential for a regression.

Jonathan

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to