On Sat, Dec 3, 2016 at 7:23 PM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > I do share your concerns about unpredictable behavior - that's > particularly worrying for pg_restore, which may be used for time- > sensitive use cases (DR, migrations between versions), so unpredictable > changes in behavior / duration are unwelcome.
Right. > But isn't this more a deficiency in pg_restore, than in CREATE INDEX? > The issue seems to be that the reltuples value may or may not get > updated, so maybe forcing ANALYZE (even very low statistics_target > values would do the trick, I think) would be more appropriate solution? > Or maybe it's time add at least some rudimentary statistics into the > dumps (the reltuples field seems like a good candidate). I think that there is a number of reasonable ways of looking at it. It might also be worthwhile to have a minimal ANALYZE performed by CREATE INDEX directly, iff there are no preexisting statistics (there is definitely going to be something pg_restore-like that we cannot fix -- some ETL tool, for example). Perhaps, as an additional condition to proceeding with such an ANALYZE, it should also only happen when there is any chance at all of parallelism being used (but then you get into having to establish the relation size reliably in the absence of any pg_class.relpages, which isn't very appealing when there are many tiny indexes). In summary, I would really like it if a consensus emerged on how parallel CREATE INDEX should handle the ecosystem of tools like pg_restore, reindexdb, and so on. Personally, I'm neutral on which general approach should be taken. Proposals from other hackers about what to do here are particularly welcome. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers