On Sun, 19 Jan 2025 at 17:56, Yura Sokolov <y.soko...@postgrespro.ru> wrote: > 17.01.2025 17:00, Zhou, Zhiguo пишет: >> On 1/16/2025 10:00 PM, Yura Sokolov wrote: >>> >>> Good day, Zhiguo. >>> >>> Excuse me, I feel sneaky a bit, but I've started another thread >>> just about increase of NUM_XLOGINSERT_LOCK, because I can measure >>> its effect even on my working notebook (it is another one: Ryzen >>> 5825U limited to @2GHz). >>> >>> http://postgr.es/m/flat/3b11fdc2-9793-403d- >>> b3d4-67ff9a00d447%40postgrespro.ru >>> >>> ----- >>> >>> regards >>> Yura Sokolov aka funny-falcon >>> >>> >> Good day, Yura! >> Thank you for keeping me informed. I appreciate your proactive >> approach and understand the importance of exploring different angles >> for optimization. Your patch is indeed fundamental to our ongoing >> work on the lock-free xlog reservation, and I'm eager to see how it >> can further enhance our efforts. >> I will proceed to test the performance impact of your latest patch >> when combined with the lock-free xlog reservation patch. This will >> help us determine if there's potential for additional >> optimization. Concurrently, with your permission, I'll try to refine >> the hash-table- based implementation for your further review. WDYT? >> > > Good day, Zhiguo > > Here's version of "hash-table reservation" with both 32bit and 64bit > operations (depending on PG_HAVE_ATOMIC_U64_SIMULATION, or may be > switched by hand). > > 64bit version uses other protocol with a bit lesser atomic > operations. I suppose it could be a bit faster. But I can't prove it > now. > > btw, you wrote: > >>> Major issue: >>> - `SetPrevRecPtr` and `GetPrevRecPtr` do non-atomic write/read > with on >>> platforms where MAXALIGN != 8 or without native 64 > load/store. Branch >>> with 'memcpy` is rather obvious, but even pointer de-referencing on >>> "lucky case" is not safe either. >>> >>> I have no idea how to fix it at the moment. >>> >> >> Indeed, non-atomic write/read operations can lead to safety issues in >> some situations. My initial thought is to define a bit near the >> prev-link to flag the completion of the update. In this way, we could >> allow non-atomic or even discontinuous write/read operations on the >> prev-link, while simultaneously guaranteeing its atomicity through >> atomic operations (as well as memory barriers) on the flag bit. What >> do you think of this as a viable solution? > > There is a way to order operations: > - since SetPrevRecPtr stores start of record as LSN, its lower 32bits > are certainly non-zero (record could not start at the beginning of a > page). > - so SetPrevRecPtr should write high 32bits, issue write barrier, and > then write lower 32bits, > - and then GetPrevRecPtr should first read lower 32bits, and if it is > not zero, then issue read barrier and read upper 32bits. > > This way you will always read correct prev-rec-ptr on platform without > 64bit atomics. (because MAXALING >= 4 and PostgreSQL requires 4 byte > atomicity for several years). >
Hi, Yura Sokolov Thanks for updating the patch. I test the v2 patch using BenchmarkSQL 1000 warehouse, and here is the tpmC result: case | min | avg | max --------------------+------------+------------+-------------- master (patched) | 988,461.89 | 994,916.50 | 1,000,362.40 master (44b61efb79) | 857,028.07 | 863,174.59 | 873,856.92 The patch provides a significant improvement. I just looked through the patch, here are some comments. 1. The v2 patch can't be applied cleanly. Applying: Lock-free XLog Reservation using lock-free hash-table .git/rebase-apply/patch:33: trailing whitespace. .git/rebase-apply/patch:37: space before tab in indent. { .git/rebase-apply/patch:38: space before tab in indent. int i; .git/rebase-apply/patch:39: trailing whitespace. .git/rebase-apply/patch:46: space before tab in indent. LWLockReleaseClearVar(&WALInsertLocks[i].l.lock, warning: squelched 4 whitespace errors warning: 9 lines add whitespace errors. 2. And there is a typo: + * PrevLinksHash is a lock-free hash table based on Cuckoo algorith. It is + * mostly 4 way: for every element computed two positions h1, h2, and s/algorith/algorithm/g -- Regrads, Japin Li