On Sat, Dec 14, 2024 at 1:24 PM Tomas Vondra <to...@vondra.me> wrote: > > On 12/13/24 00:04, Tomas Vondra wrote: > > ... > > > > The main difference is here: > > > > > > master / no parallel workers: > > > > pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total) > > > > 1 parallel worker: > > > > pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total) > > > > > > Clearly, with parallel vacuum we scan only a tiny fraction of the pages, > > essentially just those with deleted tuples, which is ~1/20 of pages. > > That's close to the 15x speedup. > > > > This effect is clearest without indexes, but it does affect even runs > > with indexes - having to scan the indexes makes it much less pronounced, > > though. However, these indexes are pretty massive (about the same size > > as the table) - multiple times larger than the table. Chances are it'd > > be clearer on realistic data sets. > > > > So the question is - is this correct? And if yes, why doesn't the > > regular (serial) vacuum do that? > > > > There's some more strange things, though. For example, how come the avg > > read rate is 0.000 MB/s? > > > > avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s > > > > It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's > > not 0.000 MB/s? I guess it's calculated from buffer misses, and all the > > pages are in shared buffers (thanks to the DELETE earlier in that session). > > > > OK, after looking into this a bit more I think the reason is rather > simple - SKIP_PAGES_THRESHOLD. > > With serial runs, we end up scanning all pages, because even with an > update every 5000 tuples, that's still only ~25 pages apart, well within > the 32-page window. So we end up skipping no pages, scan and vacuum all > everything. > > But parallel runs have this skipping logic disabled, or rather the logic > that switches to sequential scans if the gap is less than 32 pages. > > > IMHO this raises two questions: > > 1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to > sequential scans is the pages are close enough. Maybe there is a reason > for this difference? Workers can reduce the difference between random > and sequential I/0, similarly to prefetching. But that just means the > workers should use a lower threshold, e.g. as > > SKIP_PAGES_THRESHOLD / nworkers > > or something like that? I don't see this discussed in this thread.
Each parallel heap scan worker allocates a chunk of blocks which is 8192 blocks at maximum, so we would need to use the SKIP_PAGE_THRESHOLD optimization within the chunk. I agree that we need to evaluate the differences anyway. WIll do the benchmark test and share the results. > > 2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good > storage. If I can get an order of magnitude improvement (or more than > that) by disabling the threshold, and just doing random I/O, maybe > there's time to adjust it a bit. Yeah, you've started a thread for this so let's discuss it there. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com