Hello postgres hackers: I am recently working on speeding up pg_upgrade for database with over a million tables and would like to share some (maybe) optimizeable or interesting findings.
1: Skip Compatibility Check In "pg_upgrade" ============================================= Concisely, we've got several databases, each with a million-plus tables. Running the compatibility check before pg_dump can eat up like half an hour. If I have performed an online check before the actual upgrade, repeating it seems unnecessary and just adds to the downtime in many situations. So, I'm thinking, why not add a "--skip-check" option in pg_upgrade to skip it? See "1-Skip_Compatibility_Check_v1.patch". 2: Accelerate "FastPathTransferRelationLocks" =============================================== In this scenario, pg_restore costs much more time than pg_dump. And through monitoring the "postgres" backend via perf, I found that the much time are taken by "LWLockAcquire" and "LWLockRelease". Diving deeper, I think I found the reason: When we try to create an index (pretty common in pg_restore), the "ShareLock" to the relation must be held first. Such lock is a "strong" lock, so to acquire the lock, before we change the global lock hash table, we must traverse each proc to transfer their relation lock in fastpath. And the issue raise here (in FastPathTransferRelationLocks ): we acquire "fpInfoLock" before accessing "proc->databaseId". So we must perform the lock acquiring and releasing "MaxBackends" times for each index. The reason is recorded in the comment: ``` /* * proc->databaseId is set at backend startup time and never changes * thereafter, so it might be safe to perform this test before * acquiring &proc->fpInfoLock. In particular, it's certainly safe to * assume that if the target backend holds any fast-path locks, it * must have performed a memory-fencing operation (in particular, an * LWLock acquisition) since setting proc->databaseId. However, it's * less clear that our backend is certain to have performed a memory * fencing operation since the other backend set proc->databaseId. So * for now, we test it after acquiring the LWLock just to be safe. */ ``` I agree with the reason, but it seems OK to replace LWLockAcquire with a memory barrier for "proc->databaseId". And this can save some time. See "2-Accelerate_FastPathTransferRelationLocks_v1.patch". 3: Optimize Toast Index Creating ==================================== While tracing the reason mentioned in point "2", I notice an interesting performance in creating toast index. In function "create_toast_table" ``` /* ShareLock is not really needed here, but take it anyway */ toast_rel = table_open(toast_relid, ShareLock); /* some operation */ index_create(xxxx) ``` Yep, ShareLock is not really needed here, since we this is the only transaction that the toast relation is visible to. But by design (in "relation_open"), NoLock mode is only used when the caller confirms that it already holds the lock. So I wonder is it still ok to let the NoLock mode used in such scenario where the relation is created by current transaction. See "3-Optimize_Toast_Index_Creating_v1.patch". That's what I've got. Any response is appreciated. Best regards, Yang Boyu
1-Skip_Compatibility_Check_v1.patch
Description: Binary data
2-Accelerate_FastPathTransferRelationLocks_v1.patch
Description: Binary data
3-Optimize_Toast_Index_Creating_v1.patch
Description: Binary data