On Sat, Apr 30, 2016 at 9:30 AM, Bruce Momjian <br...@momjian.us> wrote: >> I think we can expand "21.6. Tablespaces" to describe the implications >> of these new performance characteristics. I'd like to hear opinions on >> how to approach that before proposing a patch, though.
> This seems very detailed. I think we need much broader coverage of how > the existing GUC variables affect performance before we could cover > this. While my justification is very detailed, I think that the practical guidance itself is fairly simple. In fact, making sure that people don't miss that things have become *simpler* across the board is the primary point that I'd like to make to users. As of 9.6, external sorting finally benefits from very large amounts of memory, certainly in the multiple gigabyte range, where before performance steadily declined as work_mem increased past a fairly small amount (less than 100MB in my tests). So, DBAs had competing, even contradictory considerations: keep work_mem high enough to make most sorts internal. But if a sort must be external, make sure that the work_mem it has available is quite low. Say, less than 100MB. This set of characteristics is almost impossible to tune for. Other operations that use temporary memory bound in size by work_mem have always benefited from increasing work_mem settings in a more or less predictable, linear way, so the *general* picture about what to do becomes far clearer (we don't really *need* to talk about sorting at all). Presumably due to the old issues with tuplesort, the closest the docs get to recommending higher work_mem or maintenance_work_mem settings is: "Larger [maintenance_work_mem] settings might improve performance for vacuuming and for restoring database dumps". That's it! Since the performance characteristics of external sorting are now roughly in line with everything else, why continue to make such a weak statement in 9.6? It's not hard to understand why we originally equivocated here, but things have changed. I hardly realized that the docs are only lukewarm on the idea that increasing work_mem will ever help *anything* at all. It's easy to fail to notice that when you're an expert. We provide *zero* guidance on how to tune work_mem or maintenance_work_mem. Surely we can do better than that. A secondary point I'd like to make is that if and when no further benefit can be observed from increasing work_mem, it's well worth considering that more I/O bandwidth for temp files will help. Obviously, this assumes the DBA avoids swapping when using all that memory, for example by setting vm.swappiness appropriately on Linux, while also avoiding OOMs. At the point that no further benefit can be obtained by increasing work_mem, sequential I/O should be strongly considered as a possible bottleneck to target. That's a nice problem to have, because you can buy fairly inexpensive HDDs for temp files that will increase the point at which higher work_mem settings will no longer help. So, we can talk about this stuff without necessarily even mentioning external sorting. I didn't mention it before now, but as it happens the 2009 ACM article I linked to already (https://queue.acm.org/detail.cfm?id=1563874) mentions that they found performance fell sharply past a certain point when using Postgres for a large aggregate SQL query: """ Invoking the DBMS’s built-in EXPLAIN facility revealed the problem: while the query planner chose a reasonable hash table-based aggregation strategy for small tables, on larger tables it switched to sorting by grouping columns—a viable, if suboptimal strategy given a few million rows, but a very poor one when facing a billion. """ I think that this must be describing tuplesort's previous use of replacement selection to sort runs -- it might have been the case that an internal sort for the GroupAggregate was "viable, if suboptimal" then, but an external sort was just unacceptable. 9.6 really blurs the distinction between internal and external sorts a lot of the time. It was common to see a big external sort with ~5 runs taking no longer than 110% of the time of an equivalent internal sort when testing the 9.6 sorting stuff. I would like to help DBAs keep up the benefits when they have hundreds of gigabytes or even terabytes of data to aggregate through. If you look at the graph in that article, it isn't that hard to imagine that a well-tuned implementation could avoid that superlinear growth in query runtime, because *superlinear* growth isn't predicted by any theoretical model. The article is incorrect to state "There is no pathology here". Although, that doesn't undermine the author's argument too much; I find it rather convincing overall. The fact that a single node will eventually fall short doesn't mean it isn't important to push your single node further, to maintain an unsharded Postgres instance as the right choice for longer (by getting the full benefit of the hardware). Getting this right could allow unsharded Postgres to go a lot further (parallelism is the final part of this), which could make all the difference for some users. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers