On Wed, Apr 08, 2020 at 11:13:26AM -0400, James Coleman wrote:
On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote:
>On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote:
>>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra
>><tomas.von...@2ndquadrant.com> wrote:
>>>
>>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote:
>>>>On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote:
>>>>>hyrax is not too happy with this test:
>>>>>
>>>>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15
>>>>>
>>>>>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking
>>>>>EXPLAIN output, but it evidently is.
>>>>>
>>>>
>>>>Thanks, I'll investigate. It's not clear to me either what might be
>>>>causing this, but I guess something must have gone wrong in
>>>>estimation/planning.
>>>>
>>>
>>>OK, I know what's going on - it's a rather embarassing issue in the
>>>regression test. There's no analyze on the test tables, so it uses
>>>default estimates for number of groups etc. But with clobber cache the
>>>test runs long enough for autoanalyze to kick in and collect stats, so
>>>we generate better estimates which changes the plan.
>>>
>>>I'll get this fixed - explicit analyze and tweaking the data a bit
>>>should do the trick.
>>
>>Looking at the tests that failed, I think we should consider just adding:
>>set enable_sort = off;
>>because several of those tests have very specific amounts of data to
>>ensure we test the transition points around the different modes in the
>>incremental sort node.
>>
>
>Maybe, but I'd much rather tweak the data so that we test both the
>costing and execution part.
>
I do think this does the trick by increasing the number of rows a bit
(from 100 to 1000) to make the Sort more expensive than Incremental
Sort, while still testing the transition points.
James, can you verify it that's still true?
Those changes all look good to me from a "testing correctness" POV.
Also I like that we now test multiple sort methods in the explain
output, like: "Sort Methods: top-N heapsort, quicksort".
OK, good. I'll push the fix.
I personally find the `i/100` notation harder to read than a case, but
that's just an opinion...
Yeah, but with 1000 rows we'd need a more complex CASE statement (I
don't think simply having two groups - small+large would work).
Should we change `analyze` to `analyze t` to avoid unnecessarily
re-analyzing all other tables in the regression db?
Ah, definitely. That was a mistake. Thanks for noticing.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services