On Sat, Apr 24, 2021 at 12:22 AM Robert Haas <robertmh...@gmail.com> wrote: > > On Fri, Apr 23, 2021 at 7:04 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > I think we can divide the TID fork into 16MB or 32MB chunks like WAL > > segment files so that we can easily remove old chunks. Regarding the > > efficient search part, I think we need to consider the case where the > > TID fork gets bigger than maintenance_work_mem. In that case, during > > the heap scan, we need to check if the discovered TID exists in a > > chunk of the TID fork that could be on the disk. Even if all > > known-dead-TIDs are loaded into an array on the memory, it could get > > much slower than the current heap scan to bsearch over the array for > > each dead TID discovered during heap scan. So it would be better to > > have a way to skip searching by already recorded TIDs. For example, > > during heap scan or HOT pruning, I think that when marking TIDs dead > > and recording it to the dead TID fork we can mark them “dead and > > recorded” instead of just “dead” so that future heap scans can skip > > those TIDs without existence check. > > I'm not very excited about this. If we did this, and if we ever > generated dead-but-not-recorded TIDs, then we will potentially dirty > those blocks again later to mark them recorded.
Since the idea I imagined is that we always mark a TID recorded at the same time when marking it dead we don't dirty the page again, but, yes, if we do that the recorded flag is not necessary. We can simply think that TID marked dead is recorded to the TID fork. Future vacuum can skip TID that are already marked dead. > > Also, if bsearch() is a bottleneck, how about just using an O(1) > algorithm instead of an O(lg n) algorithm, rather than changing the > on-disk format? > > Also, can you clarify exactly what you think the problem case is here? > It seems to me that: > > 1. If we're pruning the heap to collect dead TIDs, we should stop when > the number of TIDs we've accumulated reaches maintenance_work_mem. It > is possible that we could find when starting to prune that there are > *already* more dead TIDs than will fit, because maintenance_work_mem > might have been reduced since they were gathered. But that's OK: we > can figure out that there are more than will fit without loading them > all, and since we shouldn't do additional pruning in this case, > there's no issue. The case I'm thinking is that pruning the heap and sanitizing indexes are running concurrently as you mentioned that concurrency is one of the benefits of decoupling vacuum phases. In that case, one process is doing index vacuuming using known-dead-TIDs in the TID fork while another process is appending new dead TIDs. We can suspend heap pruning until the size of the TID fork gets smaller as you mentioned but it seems inefficient. > > 2. If we're sanitizing indexes, we should normally discover that there > are few enough TIDs that we can still fit them all in memory. But if > that proves not to be the case, again because for example > maintenance_work_mem has been reduced, then we can handle that with > multiple index passes just as we do today. Yeah, there seems to be room for improvement but not worse than today. I imagine users will want to set a high maintenance_work_mem for sanitizing global index separately from the setting for heap pruning. > > 3. If we're going back to the heap to permit TIDs to be recycled by > setting dead line pointers to unused, we can load in as many of those > as will fit in maintenance_work_mem, sort them by block number, and go > through block by block and DTRT. Then, we can release all that memory > and, if necessary, do the whole thing again. This isn't even > particularly inefficient. Agreed. Just an idea: during pruning the heap, we can buffer the collected dead TIDs before writing the TID fork to the disk so that we can sort the dead TIDs in a chunk (say a 16MB chunk consists of 8KB blocks)? We write the chunk to the disk either when the chunk filled with dead TIDs or when index sanitizing starts. The latter timing is required to remember the chunk ID or uint64 ID of dead TID indicating how far index sanitizing removed dead TIDs up to. One of the benefits would be to reduce the disk I/O for the dead TID fork. Another would be we’re likely to complete the recycle phase in one heap scan since we load only one block per chunk during scanning the heap. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/