Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes: > There are several thousand clients, most of which are executing complex > queries.
So, that's really the core of your problem. We don't promise that you can run several thousand backends at once. Usually it's recommended that you stick a connection pooler in front of a server with (at most) a few hundred backends. > So them are not able to process all this invalidation messages and their > invalidation message buffer is overflown. > Size of this buffer is hardcoded (MAXNUMMESSAGES = 4096) and can not be > changed without recompilation of Postgres. > This is problem N1. No, this isn't a problem. Or at least you haven't shown a reason to think it is. Sinval overruns are somewhat routine, and we certainly test that code path (see CLOBBER_CACHE_ALWAYS buildfarm animals). > But then we come to the next show stopper: NUM_LOCK_PARTITIONS. > It is also hardcoded and can't be changed without recompilation: > #define LOG2_NUM_LOCK_PARTITIONS 4 > #define NUM_LOCK_PARTITIONS (1 << LOG2_NUM_LOCK_PARTITIONS) > Having just 16 LW-Locks greatly increase conflict probability (taken in > account that there are 5k tables and totally about 25k relations). > It cause huge lw-lock acquisition time for heap_open and planning stage > of some queries is increased from milliseconds to several minutes! Really? > This is problem number 2. But there is one more flaw we have faced with. > We have increased LOG2_NUM_LOCK_PARTITIONS to 8 > and ... clients start to report "too many LWLocks taken" error. > There is yet another hardcoded constant MAX_SIMUL_LWLOCKS = 200 > which relation with NUM_LOCK_PARTITIONS was not mentioned anywhere. Seems like self-inflicted damage. I certainly don't recall anyplace in the docs where we suggest that you can alter that constant without worrying about consequences. > So looks like NUM_LOCK_PARTITIONS and MAXNUMMESSAGES constants have to > be replaced with GUCs. I seriously doubt we'd do that. regards, tom lane