On Mon, Dec 30, 2019 at 10:40:39AM +0530, Amit Kapila wrote:
On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote:
>>
>> v40-0003-Add-FAST-option-to-vacuum-command.patch
>> ------------------------------------------------
>>
>> I do have a bit of an issue with this part - I'm not quite convinved we
>> actually need a FAST option, and I actually suspect we'll come to regret
>> it sooner than later. AFAIK it pretty much does exactly the same thing
>> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide
>> multiple ways to do the same thing - I do expect reports from confused
>> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a
>> sufficient solution?
>
>I think the motivation of this option is similar to FREEZE. I think
>it's sometimes a good idea to have a shortcut of popular usage and
>make it have an name corresponding to its job. From that perspective I
>think having FAST option would make sense but maybe we need more
>discussion the combination parallel vacuum and vacuum delay.
>
OK. I think it's mostly independent piece, so maybe we should move it to
a separate patch. It's more likely to get attention/feedback when not
buried in this thread.
+1. It is already a separate patch and I think we can even discuss
more on it in a new thread once the main patch is committed or do you
think we should have a conclusion about it now itself? To me, this
option appears to be an extension to the main feature which can be
useful for some users and people might like to have a separate option,
so we can discuss it and get broader feedback after the main patch is
committed.
I don't think it's an extension of the main feature - it does not depend
on it, it could be committed before or after the parallel vacuum (with
some conflicts, but the feature itself is not affected).
My point was that by moving it into a separate thread we're more likely
to get feedback on it, e.g. from people who don't feel like reviewing
the parallel vacuum feature and/or feel intimidated by t100+ messages in
this thread.
>>
>> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
>> we need a separate VACUUM option, instead of just using the existing
>> max_parallel_maintenance_workers GUC?
>>
How will user specify parallel degree? The parallel degree is helpful
because in some cases users can decide how many workers should be
launched based on size and type of indexes.
By setting max_maintenance_parallel_workers.
>> It's good enough for CREATE INDEX
>> so why not here?
>
That is a different feature and I think here users can make a better
judgment based on the size of indexes. Moreover, users have an option
to control a parallel degree for 'Create Index' via Alter Table
<tbl_name> Set (parallel_workers = <n>) which I am not sure is a good
idea for parallel vacuum as the parallelism is more derived from size
and type of indexes. Now, we can think of a similar parameter at the
table/index level for parallel vacuum, but I don't see it equally
useful in this case.
I'm a bit skeptical about users being able to pick good parallel degree.
If we (i.e. experienced developers/hackers with quite a bit of
knowledge) can't come up with a reasonable heuristics, how likely is it
that a regular user will come up with something better?
Not sure I understand why "parallel_workers" would not be suitable for
parallel vacuum? I mean, even for CREATE INDEX it certainly matters the
size/type of indexes, no?
I may be wrong in both cases, of course.
>AFAIR There was no such discussion so far but I think one reason could
>be that parallel vacuum should be disabled by default. If the parallel
>vacuum uses max_parallel_maintenance_workers (2 by default) rather
>than having the option the parallel vacuum would work with default
>setting but I think that it would become a big impact for user because
>the disk access could become random reads and writes when some indexes
>are on the same tablespace.
>
I'm not quite convinced VACUUM should have parallelism disabled by
default. I know some people argued we should do that because making
vacuum faster may put pressure on other parts of the system. Which is
true, but I don't think the solution is to make vacuum slower by
default. IMHO we should do the opposite - have it parallel by default
(as driven by max_parallel_maintenance_workers), and have an option
to disable parallelism.
I think driving parallelism for vacuum by
max_parallel_maintenance_workers might not be sufficient. We need to
give finer control as it depends a lot on the size of indexes. Also,
unlike Create Index, Vacuum can be performed on an entire database and
it is quite possible that some tables/indexes are relatively smaller
and forcing parallelism on them by default might slow down the
operation.
Why wouldn't it be sufficient? Why couldn't this use similar logic to
what we have in plan_create_index_workers for CREATE INDEX?
Sure, it may be useful to give power users a way to override the default
logic, but I very much doubt users can make reliable judgments about
parallelism.
Also, it's not like the risks are comparable in those two cases. If you
have very large table with a lot of indexes, the gains with parallel
vacuum are pretty much bound to be significant, possibly 10x or more.
OTOH if the table is small, parallelism may not give you much and it may
even be less efficient, but I doubt it's going to be 10x slower. And
considering min_parallel_index_scan_size already protects us against
this, at least partially.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services