On Mon, Jun 9, 2025 at 3:28 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Mon, Jun 9, 2025 at 2:03 PM Nikita Malakhov <huku...@gmail.com> wrote:
> > 4) Update-heavy partitioned tables that should run vacuum frequently. > > Significant > > vacuum slowdown would result in going beyond SLAs without corresponding > > significant improvements. > > > You've got it. I'm on board with prioritizing a VACUUM optimization > solution for partitioned tables with global indexes. My initial > proposal touched on a proof-of-concept experiment, which indicated no > significant performance hit with global index after the optimization. > I'll share the detailed VACUUM optimization proposal in this thread > within the next couple of days. As discussed earlier the one of main problems is that the global indexes are vacuumed along with each partition whereas logically it should be vacuumed only once when all the partitions are vacuum in an ideal world. So my proposal is that we make some infrastructure change in vacuum api's such that there is option to tell heap_vacuum_rel() to skip the global index vacuum and also return back the deadtid_store, vacuum layer will store these deadtid_store, hash it by the reloid until it vacuum all the partitions which are covered by a global index. Once that is done it will vacuum the global index, we also need to modify the vac_tid_reaped() so that it can take additional input of reloid so that it can find the appropriate deadtid_store by looking into the hash and then look up the dead tid in respective deadtid_store. We also need to do something for the autovacuum, because currently autovacuum workers scan the pg_class and identify the relation which needs to be vacuumed and vacuum it one at a time. However once we have educated vacuum machinery to first vacuum all the partitions and then perform global index vacuum, the autovacuum worker should know which all partitions are supposed to be vacuum together and the autovacuum worker can pass a list of all those partitions together to the vacuum machinery. I believe this enhancement can be implemented in autovacuum without significant difficulty. Currently, autovacuum scans pg_class to generate a list of relations requiring vacuuming. For partitioned tables that have a global index, we can extend this process by additionally maintaining a list of all their leaf relations within the parent table's entry. I believe this enhancement can be implemented in autovacuum without significant difficulty. Currently, autovacuum scans pg_class to generate a list of relations requiring vacuuming. For partitioned tables that have a global index, we can extend this process by additionally maintaining a list of all their leaf relations within the parent table's entry. An autovacuum worker would then process all the leaf relations in this complete hierarchy. To effectively prevent other workers from attempting to vacuum the same hierarchy concurrently, the worker would publish the top-most partitioned relation ID (which identifies the table with the global index) as MyWorkerInfo->wi_tableoid. This mechanism ensures that if one worker is processing a partitioned table, the entire set of child relations under that top-level ID is automatically skipped by other workers While this approach offers benefits, it's important to acknowledge certain potential drawbacks. One concern is a possible impact on parallelism, as currently each partition might be vacuumed by a separate worker, but with this change, all partitions covered by the same global index would have to be processed by a single worker. Another significant challenge lies in effectively choosing which partitions to vacuum; for instance, if a table with a global index has 1000 partitions and only 10 meet the vacuum threshold in a given cycle, this method might not be very efficient. Although we wouldn't vacuum the global index once for every partition, we could still end up vacuuming it numerous times by the time all partitions are eventually processed. To mitigate this, an alternative strategy could involve proactively vacuuming partitions that haven't fully met their vacuum threshold but are close (e.g., reached 50% of the threshold). This would allow us to combine more partitions for a single vacuum operation, thereby reducing the number of times the global index needs to be vacuumed. -- Regards, Dilip Kumar Google