On 12/13/24 00:04, Tomas Vondra wrote: > ... > > The main difference is here: > > > master / no parallel workers: > > pages: 0 removed, 221239 remain, 221239 scanned (100.00% of total) > > 1 parallel worker: > > pages: 0 removed, 221239 remain, 10001 scanned (4.52% of total) > > > Clearly, with parallel vacuum we scan only a tiny fraction of the pages, > essentially just those with deleted tuples, which is ~1/20 of pages. > That's close to the 15x speedup. > > This effect is clearest without indexes, but it does affect even runs > with indexes - having to scan the indexes makes it much less pronounced, > though. However, these indexes are pretty massive (about the same size > as the table) - multiple times larger than the table. Chances are it'd > be clearer on realistic data sets. > > So the question is - is this correct? And if yes, why doesn't the > regular (serial) vacuum do that? > > There's some more strange things, though. For example, how come the avg > read rate is 0.000 MB/s? > > avg read rate: 0.000 MB/s, avg write rate: 525.533 MB/s > > It scanned 10k pages, i.e. ~80MB of data in 0.15 seconds. Surely that's > not 0.000 MB/s? I guess it's calculated from buffer misses, and all the > pages are in shared buffers (thanks to the DELETE earlier in that session). >
OK, after looking into this a bit more I think the reason is rather simple - SKIP_PAGES_THRESHOLD. With serial runs, we end up scanning all pages, because even with an update every 5000 tuples, that's still only ~25 pages apart, well within the 32-page window. So we end up skipping no pages, scan and vacuum all everything. But parallel runs have this skipping logic disabled, or rather the logic that switches to sequential scans if the gap is less than 32 pages. IMHO this raises two questions: 1) Shouldn't parallel runs use SKIP_PAGES_THRESHOLD too, i.e. switch to sequential scans is the pages are close enough. Maybe there is a reason for this difference? Workers can reduce the difference between random and sequential I/0, similarly to prefetching. But that just means the workers should use a lower threshold, e.g. as SKIP_PAGES_THRESHOLD / nworkers or something like that? I don't see this discussed in this thread. 2) It seems the current SKIP_PAGES_THRESHOLD is awfully high for good storage. If I can get an order of magnitude improvement (or more than that) by disabling the threshold, and just doing random I/O, maybe there's time to adjust it a bit. regards -- Tomas Vondra