On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Mar 29, 2017 at 2:23 AM, Masahiko Sawada <sawada.m...@gmail.com> > wrote: >> I was thinking that the status of this patch is still "Needs review" >> because I sent latest version patch[1]. > > I think you're right. > > I took a look at this today. I think there is some problem with the > design of this patch. I originally proposed a threshold based on the > percentage of not-all-visible pages on the theory that we'd just skip > looking at the indexes altogether in that case. But that's not what > the patch does: it only avoids the index *cleanup*, not the index > *vacuum*.
Hmm, I guess there is misunderstanding on this thread (my English might have confused you, sorry). I've been proposing the patch that allows lazy vacuum skip only the index cleanup vacuum. Because the problem reported by xu jian[1] is that second vacuum freeze takes long time even if the table is not modified since first vaucuum. The reason is that we cannot skip lazy_cleanup_index even if the table is not changed at all since previous vacuum. If the table has a garbage the lazy vacuum does lazy_vacuum_index, on the other hand, if the table doesn't have garbage, which means that only insertion was execued or the table was not modified, the lazy vacuum does lazy_cleanup_index instead. Since I'd been knew this restriction I proposed it. That's why proposing new GUC parameter name has been "vacuum_cleanup_index_scale_factor". > And the comments in btvacuumcleanup say this: > > /* > * If btbulkdelete was called, we need not do anything, just return the > * stats from the latest btbulkdelete call. If it wasn't called, we must > * still do a pass over the index, to recycle any newly-recyclable pages > * and to obtain index statistics. > * > * Since we aren't going to actually delete any leaf items, there's no > * need to go through all the vacuum-cycle-ID pushups. > */ In current design, we can skip lazy_cleanup_index in case where the number of pages need to be vacuumed is less than the threshold. For example, after frozen whole table by aggressive vacuum the vacuum can skip scanning on the index if only a few insertion is execute on that table. But if a transaction made a garbage on the table, this patch cannot prevent from vacuuming on the index as you mentioned. By skipping index cleanup we could leave recyclable pages that are not marked as a recyclable. But it can be reclaimed by both next index vacuum and next index cleanup becuase btvacuumpage calls RecordFreeIndexPage for recyclable page. So I think it doesn't become a problem. And a next concern pointed by Peter is that we stash an XID when a btree page is deleted, which is used to determine when it's finally safe to recycle the page. I prevent from this situation by not allowing lazy vacuum to skip the index cleanup when aggressive vacuum. But since this makes the advantage of this patch weak I'm considering better idea. > So, if I'm reading this correctly, the only time this patch saves > substantial work - at least in the case of a btree index - is in the > case where there are no dead tuples at all. But if we only want to > avoid the work in that case, then a threshold based on the percentage > of all-visible pages seems like the wrong thing, because the other > stuff btvacuumcleanup() is doing doesn't have anything to do with the > number of all-visible pages. I thought that if the lazy vacuum skipped almost table according to visibility map it means that the state of table is changed just a little and we can skip updating the index statistics. Am I missing something? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers