Hi,

On 03/29/2016 09:43 PM, Peter Geoghegan wrote:
On Tue, Mar 29, 2016 at 9:11 AM, Robert Haas <robertmh...@gmail.com> wrote:
One test that kind of bothers me in particular is the "SELECT DISTINCT
a FROM numeric_test ORDER BY a" test on the high_cardinality_random
data set.  That's a wash at most work_mem values, but at 32MB it's
more than 3x slower.  That's very strange, and there are a number of
other results like that, where one particular work_mem value triggers
a large regression.  That's worrying.

That case is totally invalid as a benchmark for this patch. Here is
the query plan I get (doesn't matter if I run analyze) when I follow
Tomas' high_cardinality_random 10M instructions (including setting
work_mem to 32MB):

postgres=# explain analyze select distinct a from numeric_test order by a;
                                                               QUERY
PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  Sort  (cost=268895.39..270373.10 rows=591082 width=8) (actual
time=3907.917..4086.174 rows=999879 loops=1)
    Sort Key: a
    Sort Method: external merge  Disk: 18536kB
    ->  HashAggregate  (cost=206320.50..212231.32 rows=591082 width=8)
(actual time=3109.619..3387.599 rows=999879 loops=1)
          Group Key: a
          ->  Seq Scan on numeric_test  (cost=0.00..175844.40
rows=12190440 width=8) (actual time=0.025..601.295 rows=10000000
loops=1)
  Planning time: 0.088 ms
  Execution time: 4120.656 ms
(8 rows)

Does that seem like a fair test of this patch?

And why not? I mean, why should it be acceptable to slow down?


I must also point out an inexplicable differences between the i5 and
Xeon in relation to this query. It took about took 10% less time on
the patched Xeon 10M case, not ~200% more (line 53 of the summary page
in each 10M case). So even if this case did exercise the patch well,
it's far from clear that it has even been regressed at all. It's far
easier to imagine that there was some problem with the i5 tests.

That may be easily due to differences between the CPUs and configuration. For example the Xeon uses a way older CPU with different amounts of CPU cache, and it's also a multi-socket system. And so on.

A complete do-over from Tomas would be best, here. He has already
acknowledged that the i5 CREATE INDEX results were completely invalid.
Pending a do-over from Tomas, I recommend ignoring the i5 tests
completely. Also, I should once again point out that many of the
work_mem cases actually had internal sorts at the high end, so once
the code in the patches simply wasn't exercised at all at the high end
(the 1024MB cases, where the numbers might be expected to get really
good).

If there is ever a regression, it is only really sensible to talk
about it while looking at trace_sort output (and, I guess, the query
plan). I've asked Tomas for trace_sort output in all relevant cases.
There is no point in "flying blind" and speculating what the problem
was from a distance.

The updated benchmarks are currently running. I'm out of office until Friday, and I'd like to process the results over the weekend. FWIW I'll have results for these cases:

1) unpatched (a414d96a)
2) patched, default settings
3) patched, replacement_sort_mem=64

Also, I'll have trace_sort=on output for all the queries, so we can investigate further.


Also, it's pretty clear that the patch has more large wins than it
does large losses, but it seems pretty easy to imagine people who
haven't tuned any GUCs writing in to say that 9.6 is way slower on
their workload, because those people are going to be at
work_mem=4MB, maintenance_work_mem=64MB. At those numbers, if
Tomas's data is representative, it's not hard to imagine that the
number of people who see a significant regression might be quite a
bit larger than the number who see a significant speedup.

Yeah. That was one of the goals of the benchmark, to come up with some tuning recommendations. On some systems significantly increasing memory GUCs may not be possible, though - say, on very small systems with very limited amounts of RAM.


I don't think they are representative. Greg Stark characterized the
regressions as being fairly limited, mostly at the very low end. And
that was *before* all the memory fragmentation stuff made that
better. I haven't done any analysis of how much better that made the
problem *across the board* yet, but for int4 cases I could make 1MB
work_mem queries faster with gigabytes of data on my laptop. I
believe I tested various datum sort cases there, like "select
count(distinct(foo)) from bar"; those are a very pure test of the
patch.


Well, I'd guess those conclusions may be a bit subjective.

regards


--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to