On Tue, Mar 29, 2016 at 9:11 AM, Robert Haas <robertmh...@gmail.com> wrote: > One test that kind of bothers me in particular is the "SELECT DISTINCT > a FROM numeric_test ORDER BY a" test on the high_cardinality_random > data set. That's a wash at most work_mem values, but at 32MB it's > more than 3x slower. That's very strange, and there are a number of > other results like that, where one particular work_mem value triggers > a large regression. That's worrying.
That case is totally invalid as a benchmark for this patch. Here is the query plan I get (doesn't matter if I run analyze) when I follow Tomas' high_cardinality_random 10M instructions (including setting work_mem to 32MB): postgres=# explain analyze select distinct a from numeric_test order by a; QUERY PLAN ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Sort (cost=268895.39..270373.10 rows=591082 width=8) (actual time=3907.917..4086.174 rows=999879 loops=1) Sort Key: a Sort Method: external merge Disk: 18536kB -> HashAggregate (cost=206320.50..212231.32 rows=591082 width=8) (actual time=3109.619..3387.599 rows=999879 loops=1) Group Key: a -> Seq Scan on numeric_test (cost=0.00..175844.40 rows=12190440 width=8) (actual time=0.025..601.295 rows=10000000 loops=1) Planning time: 0.088 ms Execution time: 4120.656 ms (8 rows) Does that seem like a fair test of this patch? I must also point out an inexplicable differences between the i5 and Xeon in relation to this query. It took about took 10% less time on the patched Xeon 10M case, not ~200% more (line 53 of the summary page in each 10M case). So even if this case did exercise the patch well, it's far from clear that it has even been regressed at all. It's far easier to imagine that there was some problem with the i5 tests. A complete do-over from Tomas would be best, here. He has already acknowledged that the i5 CREATE INDEX results were completely invalid. Pending a do-over from Tomas, I recommend ignoring the i5 tests completely. Also, I should once again point out that many of the work_mem cases actually had internal sorts at the high end, so once the code in the patches simply wasn't exercised at all at the high end (the 1024MB cases, where the numbers might be expected to get really good). If there is ever a regression, it is only really sensible to talk about it while looking at trace_sort output (and, I guess, the query plan). I've asked Tomas for trace_sort output in all relevant cases. There is no point in "flying blind" and speculating what the problem was from a distance. > Also, it's pretty clear that the patch has more large wins than it > does large losses, but it seems pretty easy to imagine people who > haven't tuned any GUCs writing in to say that 9.6 is way slower on > their workload, because those people are going to be at work_mem=4MB, > maintenance_work_mem=64MB. At those numbers, if Tomas's data is > representative, it's not hard to imagine that the number of people who > see a significant regression might be quite a bit larger than the > number who see a significant speedup. I don't think they are representative. Greg Stark characterized the regressions as being fairly limited, mostly at the very low end. And that was *before* all the memory fragmentation stuff made that better. I haven't done any analysis of how much better that made the problem *across the board* yet, but for int4 cases I could make 1MB work_mem queries faster with gigabytes of data on my laptop. I believe I tested various datum sort cases there, like "select count(distinct(foo)) from bar"; those are a very pure test of the patch. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers