Hi,

On 03/22/2016 11:07 PM, Peter Geoghegan wrote:
On Tue, Mar 22, 2016 at 2:27 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
Each query was executed 5x for each work_mem value (between 8MB and 1GB),
and then a median of the runs was computed and that's what's on the
"comparison". This compares a414d96ad2b without (master) and with the
patches applied (patched). The last set of columns is simply a "speedup"
where "<1.0" means the patched code is faster, while >1.0 means it's slower.
Values below 0.9 or 1.1 are using green or red background, to make the most
significant improvements or regressions clearly visible.

For the smaller data set (1M rows), things works pretty fine. There are
pretty much no red cells (so no significant regressions), but quite a few
green ones (with duration reduced by up to 50%). There are some results in
the 1.0-1.05 range, but considering how short the queries are, I don't think
this is a problem. Overall the total duration was reduced by ~20%, which is
nice.

For the 10M data sets, total speedup is also almost ~20%, and the speedups
for most queries are also very nice (often ~50%).

To be clear, you seem to mean that ~50% of the runtime of the query
was removed. In other words, the quicksort version is twice as fast.

Yes, that's what I meannt. Sorry for the inaccuracy.


But the number of regressions is considerably higher - there's a
small number of queries that got significantly slower for multiple
data sets, particularly for smaller work_mem values.

No time to fully consider these benchmarks right now, but: Did you
make sure to set replacement_sort_mem very low so that it was never
used when patched? And, was this on the latest version of the patch,
where memory contexts were reset (i.e. the version that got
committed recently)? You said something about memory batching, so
ISTM that you should set that to '64', to make sure you don't get one
longer run. That might mess with merging.

I've tested the patch you've sent on 2016/3/11, which I believe is the last version. I haven't tuned the replacement_sort_mem at all. because my understanding was that it's not a 9.6 material (per your message). So my intent was to test the configuration people are likely to use by default.

I'm not sure about the batching - that was merely a guess of what might be the problem.


Note that the master branch has the memory batching patch as of a
few days back, so it that's the problem at the low end, then that's
bad.

I'm not sure which commit are you referring to. The benchmark was done on a414d96a (from 2016/3/10). However I'd expect that to affect both sets of measurements, although it's possible that it affects the patched version differently.

But I don't think it is: I think that the regressions at the low end
are about abbreviated keys, particularly the numeric cases. There is
a huge gulf in the cost of those comparisons (abbreviated vs
authoritative), and it is legitimately a weakness of the patch that
it reduces the number in play. I think it's still well worth it, but
it is a downside. There is no reason why the authoritative numeric
comparator has to allocate memory, but right now that case isn't
optimized.

Yes, numeric and text are the most severely affected cases.


I find it weird that the patch is exactly the same as master in a
lot of cases. ISTM that with a case where you use 1GB of memory to
sort 1 million rows, you're so close to an internal sort that it
hardly matters (master will not need a merge step at all, most
likely). The patch works best with sorts that take tens of seconds,
and I don't think I see any here, nor any high memory tests where RS
flops. Now, I think you focused on regressions because that was what
was interesting, which is good. I just want to put that in context.

I don't think the tests on 1M rows are particularly interesting, and I don't see any noticeable regressions there. Perhaps you mean the tests on 10M rows instead?

Yes, you're correct - I was mostly looking for regressions. However, the worst cases of regressions are on relatively long sorts, e.g. slowing down from 35 seconds to 64 seconds, etc. So that's quite long, and it's surely using non-trivial amount of memory. Or am I missing something?

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to