On Sat, 2016-12-03 at 18:37 -0800, Peter Geoghegan wrote: > On Sat, Dec 3, 2016 at 5:45 PM, Alvaro Herrera <alvherre@2ndquadrant. > com> wrote: > > > > I don't think a patch must necessarily consider all possible uses > > that > > the new feature may have. If we introduce parallel index creation, > > that's great; if pg_restore doesn't start using it right away, > > that's > > okay. You, or somebody else, can still patch it later. The patch > > is > > still a step forward. > While I agree, right now pg_restore will tend to use or not use > parallelism for CREATE INDEX more or less by accident, based on > whether or not pg_class.reltuples has already been set by something > else (e.g., an earlier CREATE INDEX against the same table in the > restoration). That seems unacceptable. I haven't just suppressed the > use of parallel CREATE INDEX within pg_restore because that would be > taking a position on something I have a hard time defending any > particular position on. And so, I am slightly concerned about the > entire ecosystem of tools that could implicitly use parallel CREATE > INDEX, with undesirable consequences. Especially pg_restore. > > It's not so much a hard question as it is an awkward one. I want to > handle any possible objection about there being future compatibility > issues with going one way or the other ("This paints us into a corner > with..."). And, there is no existing, simple way for pg_restore and > other tools to disable the use of parallelism due to the cost model > automatically kicking in, while still allowing the proposed new index > storage parameter ("parallel_workers") to force the use of > parallelism, which seems like something that should happen. (I might > have to add a new GUC like "enable_maintenance_paralleism", since > "max_parallel_workers_maintenance = 0" disables parallelism no matter > how it might be invoked).
I do share your concerns about unpredictable behavior - that's particularly worrying for pg_restore, which may be used for time- sensitive use cases (DR, migrations between versions), so unpredictable changes in behavior / duration are unwelcome. But isn't this more a deficiency in pg_restore, than in CREATE INDEX? The issue seems to be that the reltuples value may or may not get updated, so maybe forcing ANALYZE (even very low statistics_target values would do the trick, I think) would be more appropriate solution? Or maybe it's time add at least some rudimentary statistics into the dumps (the reltuples field seems like a good candidate). Trying to fix this by adding more GUCs seems a bit strange to me. > > In general, I have a positive outlook on this patch, since it appears > to compete well with similar implementations in other systems > scalability-wise. It does what it's supposed to do. > +1 to that -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers