Hi Tomas, Thanks for working on this. I haven't studied this problem recently, but here are some ideas that occur to me:
1. Try to reduce the per-batch overhead. 2. Stop increasing the number of batches when the per-batch overhead exceeds a small percentage of work_mem (10%? 5%? 1%?). If you've reached a point where the per-batch overhead is using up >=10% of your work_mem, then at the next doubling it's going to be using >=20%, which is pretty insane, and the next doubling after that is going to be >=40%, which is really silly. For 1MB of work_mem and what I gather from your remarks is 16kB/batch, we exceed the 10% threshold at 16 batches. Somebody might claim that capping the number of batches to 16 is insane, but those 16 batches are using 256kB of memory and we're supposed to finish the entire operation using <= 1MB of memory, it really isn't. We pretty obviously are not going to be able to stay within 1MB no matter what we do. I think your proposal might be a more refined version of this, where instead of just completely ceasing to create new batches, you try to balance creating new batches with overrunning work_mem to get the best outcome possible overall. Maybe that's a good approach, although perhaps it is more complicated than we need? At any rate, I found the vadjust-size patch to be quite hard to understand. I think you if you want to go that route it would need more comments and to have the existing ones rewritten so that they are understandable without needing to scour this email thread (e.g. "Try to move on the anti-diagonal and see if we'd consume less memory" doesn't seem like something most people are going to understand without a lot of context). ...Robert