Initial debug logging of a test on one Xeon system demonstrating this issue showed a very large number of unattributed semop() calls. We are still following up on this.Postgres has it's own user space spinlock and semaphore implementation. Both fall back to semop if there is contention.
Hmm. You wrote that the problem is Xeon specific, and that AthlonMP are unaffected. Perhaps Xeon cpus do not like the s_lock implementation? It doesn't follow Intel's recommentations:
- no pause instructions.
- always TAS. The recommended approach is nonatomic tests until the value is 0, then an atomic TAS.
Attached is a gross hack that adds pause instructions. If this doesn't magically fix your problem, then we must figure out what causes the semop calls, and avoid them.
Could you ask your Linux hackers why they blame the shared memory implementation in postgres? I don't see any link between shared memory and lock contention.
-- Manfred
Index: backend/storage/lmgr/s_lock.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/s_lock.c,v retrieving revision 1.16 diff -c -r1.16 s_lock.c *** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 -0000 1.16 --- backend/storage/lmgr/s_lock.c 19 Dec 2003 20:01:33 -0000 *************** *** 111,116 **** --- 111,117 ---- spins = 0; } + __asm__ __volatile__("rep;nop\n": : : "memory"); } }
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html