On Wed, Apr 8, 2020 at 11:02 AM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > On Wed, Apr 08, 2020 at 04:08:39PM +0200, Tomas Vondra wrote: > >On Wed, Apr 08, 2020 at 09:54:42AM -0400, James Coleman wrote: > >>On Wed, Apr 8, 2020 at 9:43 AM Tomas Vondra > >><tomas.von...@2ndquadrant.com> wrote: > >>> > >>>On Wed, Apr 08, 2020 at 12:51:05PM +0200, Tomas Vondra wrote: > >>>>On Tue, Apr 07, 2020 at 11:54:23PM -0400, Tom Lane wrote: > >>>>>hyrax is not too happy with this test: > >>>>> > >>>>>https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2020-04-07%2004%3A55%3A15 > >>>>> > >>>>>It's not too clear to me why CLOBBER_CACHE_ALWAYS would be breaking > >>>>>EXPLAIN output, but it evidently is. > >>>>> > >>>> > >>>>Thanks, I'll investigate. It's not clear to me either what might be > >>>>causing this, but I guess something must have gone wrong in > >>>>estimation/planning. > >>>> > >>> > >>>OK, I know what's going on - it's a rather embarassing issue in the > >>>regression test. There's no analyze on the test tables, so it uses > >>>default estimates for number of groups etc. But with clobber cache the > >>>test runs long enough for autoanalyze to kick in and collect stats, so > >>>we generate better estimates which changes the plan. > >>> > >>>I'll get this fixed - explicit analyze and tweaking the data a bit > >>>should do the trick. > >> > >>Looking at the tests that failed, I think we should consider just adding: > >>set enable_sort = off; > >>because several of those tests have very specific amounts of data to > >>ensure we test the transition points around the different modes in the > >>incremental sort node. > >> > > > >Maybe, but I'd much rather tweak the data so that we test both the > >costing and execution part. > > > > I do think this does the trick by increasing the number of rows a bit > (from 100 to 1000) to make the Sort more expensive than Incremental > Sort, while still testing the transition points. > > James, can you verify it that's still true?
Those changes all look good to me from a "testing correctness" POV. Also I like that we now test multiple sort methods in the explain output, like: "Sort Methods: top-N heapsort, quicksort". I personally find the `i/100` notation harder to read than a case, but that's just an opinion... Should we change `analyze` to `analyze t` to avoid unnecessarily re-analyzing all other tables in the regression db? James