I've looked at another (admittedly sympathetic) dataset that is publicly available: the flickr "tags" dataset [1]. I end up with a single table of tags; it's a large enough table, at 388 MB, but the tuples are not very wide. There are 7,656,031 tags/tuples.
Predictably enough, this query is very fast when an internal sort is used on a patched Postgres: select * from (select * from tags order by tag offset 100000000) ii; Git master takes about 25 seconds to execute the query. Patched takes about 6.8 seconds. That seems very good, but this is not really new information. However, with work_mem set low enough to get an external sort, the difference is more interesting. If I set work_mem to 10 MB, then the query takes about 10.7 seconds to execute with a suitably patched Postgres. Whereas on master, it consistently takes a full 69 seconds. That's the largest improvement I've seen so far, for any case. I must admit that this did surprise me, but then I don't grok tape sort. What's particularly interesting here is that when work_mem is cranked up to 512MB, which is a high setting, but still not high enough to do an internal sort, the difference closes in a bit. Instead of 41 runs, there are only 2. Patched now takes 16.3 seconds. Meanwhile, master is somewhat improved, and consistently takes 65 seconds to complete the sort. This probably has something to do with CPU cache effects. I believe that all world class external sorting algorithms are cache sensitive. I'm not sure what the outcome would have been had there not been a huge amount of memory available for the OS cache to use, which there was. I think there is probably something to learn about how to improve tape sort here. Does anyone recall hearing complaints around higher work_mem settings regressing performance? [1] http://www.isi.edu/integration/people/lerman/load.html?src=http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html , bottom of page -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers