Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2025-05-18 Thread Mihail Nikalayeu
Hello, Álvaro! > I didn't understand why you have a few "v19" patches and also a separate > series of "v19-only-part-3-" patches. Is there duplication? How do > people know which series comes first? This was explained in the previous email [0]: > Patch itself contains 4 parts, some of them may

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2025-05-18 Thread Álvaro Herrera
Hello Mihail, On 2025-May-18, Mihail Nikalayeu wrote: > Hello, everyone! > > Rebased version + materials from PGConf.dev 2025 Poster Session :) I agree with Matthias that this work is important, so thank you for persisting on it. I didn't understand why you have a few "v19" patches and also a

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2025-01-03 Thread Matthias van de Meent
On Wed, 1 Jan 2025 at 17:17, Michail Nikolaev wrote: > > Hello, everyone! > > I’ve added several updates to the patch set: > > * Automatic auxiliary index removal where applicable. > * Documentation updates to reflect recent changes. > * Optimization for STIR indexes: skipping datum setup, as they

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2025-01-01 Thread Michail Nikolaev
Hello everyone, My apologies, I probably forgot to attach the images with the benchmark results in my previous email. Please find them attached to this message. Best regards, Mikhail

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-12-25 Thread Michail Nikolaev
Hello, Michael! Thank you for your comments and feedback! Yes, this patch set contains a significant amount of code, which makes it challenging to review. Some details are explained in the commit messages, but I’m doing my best to structure the patch set in a way that is as committable as possibl

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-12-24 Thread Michael Paquier
On Tue, Dec 24, 2024 at 02:06:26PM +0100, Michail Nikolaev wrote: > Now STIR used for validation (but without resetting of snapshot during > that phase for now). Perhaps I am the only one, but what you are doing here is confusing. There is a dependency between one patch and the follow-up ones, bu

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-11-12 Thread Michail Nikolaev
Hello, everyone! With winter approaching, it’s the perfect time to dive back into work on this patch! :) The first attached patch implements Matthias's idea of periodically resetting the snapshot during the initial heap scan. The next step will be to add support for parallel builds. Additionally

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-09-01 Thread Michail Nikolaev
Hello, Matthias! Just wanted to update you with some information about the next steps in work. > In heapam_index_build_range_scan, it seems like you're popping the > snapshot and registering a new one while holding a tuple from > heap_getnext(), thus while holding a page lock. I'm not so sure tha

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-08-08 Thread Michail Nikolaev
Hello, Matthias! > While waiting for this, here are some initial comments on the github diffs: Thanks for your review! While stress testing the POC, I found some issues unrelated to the patch that need to be fixed first. This is [1] and [2]. >> Additional index is lightweight and does not produc

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-08-06 Thread Matthias van de Meent
On Tue, 11 Jun 2024 at 10:58, Michail Nikolaev wrote: > > Hello. > > I did the POC (1) of the method described in the previous email, and it looks > promising. > > It doesn't block the VACUUM, indexes are built about 30% faster (22 mins vs > 15 mins). That's a nice improvement. > Additional in

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-06-11 Thread Michail Nikolaev
Hello. I did the POC (1) of the method described in the previous email, and it looks promising. It doesn't block the VACUUM, indexes are built about 30% faster (22 mins vs 15 mins). Additional index is lightweight and does not produce any WAL. I'll continue the more stress testing for a while. A

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-05-09 Thread Michail Nikolaev
Hello, Matthias and others! Realized new horizon was applied only during validation phase (once index is marked as ready). Now it applied if index is not marked as valid yet. Updated version in attach. -- > I think the best way for this to work wo

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-05-07 Thread Michail Nikolaev
Hi again! Made an error in `GlobalVisHorizonKindForRel` logic, and it was caught by a new test. Fixed version in attach. > From 9a8ea366f6d2d144979e825c4ac0bdd2937bf7c1 Mon Sep 17 00:00:00 2001 From: nkey Date: Tue, 7 May 2024 22:10:56 +0200 Subject: [PATCH v3] WIP: fix d9d076222f5b "VACUUM: ig

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-05-07 Thread Michail Nikolaev
Hello, Matthias and others! Updated WIP in attach. Changes are: * Renaming, now it feels better for me * More reliable approach in `GlobalVisHorizonKindForRel` to make sure we have not missed `rd_safeindexconcurrentlybuilding` by calling `RelationGetIndexList` if required * Optimization to avoid

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-05-05 Thread Michail Nikolaev
Hello, Matthias! I just realized there is a much simpler and safe way to deal with the problem. So, d9d076222f5b94a85e0e318339cfc44b8f26022d(1) had a bug because the scan was not protected by a snapshot. At the same time, we want this snapshot to affect not all the relations, but only a subset of

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-05-04 Thread Michail Nikolaev
Hello, Matthias! > We can just release the current snapshot, and get a new one, right? I > mean, we don't actually use the transaction for much else than > visibility during the first scan, and I don't think there is a need > for an actual transaction ID until we're ready to mark the index entry >

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-03-12 Thread Matthias van de Meent
On Thu, 7 Mar 2024 at 19:37, Michail Nikolaev wrote: > > Hello! > > > I'm not a fan of this approach. Changing visibility and cleanup > > semantics to only benefit R/CIC sounds like a pain to work with in > > essentially all visibility-related code. I'd much rather have to deal > > with another in

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-03-07 Thread Michail Nikolaev
Hello! > I'm not a fan of this approach. Changing visibility and cleanup > semantics to only benefit R/CIC sounds like a pain to work with in > essentially all visibility-related code. I'd much rather have to deal > with another index AM, even if it takes more time: the changes in > semantics will

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-03-05 Thread Matthias van de Meent
On Wed, 21 Feb 2024 at 12:37, Michail Nikolaev wrote: > > Hi! > > > How do you suppose this would work differently from a long-lived > > normal snapshot, which is how it works right now? > > Difference in the ability to take new visibility snapshot periodically > during the second phase with reche

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-02-21 Thread Michail Nikolaev
Hi! > How do you suppose this would work differently from a long-lived > normal snapshot, which is how it works right now? Difference in the ability to take new visibility snapshot periodically during the second phase with rechecking visibility of tuple according to the "reference" snapshot (whic

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-02-21 Thread Matthias van de Meent
On Wed, 21 Feb 2024 at 09:35, Michail Nikolaev wrote: > > One more idea - is just forbid HOT prune while the second phase is > running. It is not possible anyway currently because of snapshot held. > > Possible enhancements: > * we may apply restriction only to particular tables > * we may apply r

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-02-21 Thread Matthias van de Meent
On Wed, 21 Feb 2024 at 00:33, Michail Nikolaev wrote: > > Hello! > > I think the best way for this to work would be an index method that > > exclusively stores TIDs, and of which we can quickly determine new > > tuples, too. I was thinking about something like GIN's format, but > > using (generati

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-02-21 Thread Michail Nikolaev
One more idea - is just forbid HOT prune while the second phase is running. It is not possible anyway currently because of snapshot held. Possible enhancements: * we may apply restriction only to particular tables * we may apply restrictions only to part of the tables (not yet scanned by R/CICs).

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-02-20 Thread Michail Nikolaev
Hello! > I think the best way for this to work would be an index method that > exclusively stores TIDs, and of which we can quickly determine new > tuples, too. I was thinking about something like GIN's format, but > using (generation number, tid) instead of ([colno, colvalue], tid) as > key data

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-02-17 Thread Matthias van de Meent
On Thu, 1 Feb 2024, 17:06 Michail Nikolaev, wrote: > > > > > I just realised there is one issue with this design: We can't cheaply > > > > reset the snapshot during the second table scan: > > > > It is critically important that the second scan of R/CIC uses an index > > > > contents summary (made

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-02-01 Thread Michail Nikolaev
> > > I just realised there is one issue with this design: We can't cheaply > > > reset the snapshot during the second table scan: > > > It is critically important that the second scan of R/CIC uses an index > > > contents summary (made with index_bulk_delete) that was created while > > > the curre

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-01-09 Thread Michail Nikolaev
Hello, Melanie! Sorry to interrupt you, just a quick question. > Correct, but there are changes being discussed where we would freeze > tuples during pruning as well [0], which would invalidate that > implementation detail. And, if I had to choose between improved > opportunistic freezing and imp

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-01-04 Thread Michail Nikolaev
Hello! > Correct, but there are changes being discussed where we would freeze > tuples during pruning as well [0], which would invalidate that > implementation detail. And, if I had to choose between improved > opportunistic freezing and improved R/CIC, I'd probably choose > improved freezing over

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2024-01-04 Thread Matthias van de Meent
On Mon, 25 Dec 2023 at 15:12, Michail Nikolaev wrote: > > Hello! > > It seems like the idea of "old" snapshot is still a valid one. > > > Should this deal with any potential XID wraparound, too? > > As far as I understand in our case, we are not affected by this in any way. > Vacuum in our table i

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-25 Thread Michail Nikolaev
Hello! It seems like the idea of "old" snapshot is still a valid one. > Should this deal with any potential XID wraparound, too? As far as I understand in our case, we are not affected by this in any way. Vacuum in our table is not possible because of locking, so, nothing may be frozen (see belo

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-21 Thread Michail Nikolaev
Hello. Realized my last idea is invalid (because tuples are frozen by using dynamically calculated horizon) - so, don't waste your time on it :) Need to think a little bit more here. Thanks, Mikhail.

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-20 Thread Michail Nikolaev
> Yes, good catch. > Assuming we have somehow prevented vac_truncate_clog from occurring > during CIC, we can leave frozen and potentially frozen > (xmin

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-20 Thread Michail Nikolaev
Hello! > How would this deal with tuples not visible to the old snapshot? > Presumably we can assume they're newer than that snapshot (the old > snapshot didn't have it, but the new one does, so it's committed after > the old snapshot, making them newer), so that backend must have > inserted it in

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-20 Thread Matthias van de Meent
On Wed, 20 Dec 2023 at 10:56, Michail Nikolaev wrote: > > Note that the use of such expressions would be a violation of the > > function's definition; it would depend on data from other tables which > > makes the function behave like a STABLE function, as opposed to the > > IMMUTABLE that is requi

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-20 Thread Michail Nikolaev
Hello! > Also, feels like we may apply this to both phases (first and the second > scans). > The original patch (1) was helping only to the second one (after call > to set_indexsafe_procflags). Oops, I was wrong here. The original version of the patch was also applied to both phases. > Note tha

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-17 Thread Matthias van de Meent
On Sun, 17 Dec 2023, 21:14 Michail Nikolaev, wrote: > > Hello! > > > I've thought about alternative solutions, too: how about getting a new > > snapshot every so often? > > We don't really care about the liveness of the already-scanned data; the > > snapshots used for RIC > > are used only durin

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-17 Thread Michail Nikolaev
Hello! > I've thought about alternative solutions, too: how about getting a new > snapshot every so often? > We don't really care about the liveness of the already-scanned data; the > snapshots used for RIC > are used only during the scan. C/RIC's relation's lock level means vacuum > can't run

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

2023-12-15 Thread Matthias van de Meent
On Fri, 15 Dec 2023, 20:07 Michail Nikolaev, wrote: > Hello, hackers! > > I think about revisiting (1) ({CREATE INDEX, REINDEX} CONCURRENTLY > improvements) in some lighter way. > > Yes, a serious bug was (2) caused by this optimization and now it reverted. > > But what about a more safe idea in