Hello, apologies for the long post, but I want to make sure I’ve got enough
details to describe the problem for y’all.



I’ve got a 64-core (Ubuntu 18.04 – 240GB RAM running at GCP) instance
running PG 13.2 and PostGIS 3.1.1 and we’re having troubles getting it to
run more than 30 or so large queries at the same time accessing the same
tables.  With 60 threads, each thread is only running at ~30% CPU and no
diskIO/IOWait (once the tables become cached).



Boiling the complex queries down to their simplest form, we test running 60
of this query simultaneously:



select

  count(*)

from

  travel_processing_v5.llc_zone z,

  parent_set10.usca_trip_points7 t

where t.year_num = 2019 and t.month_num = 9

and st_intersects(t.lock_geom, z.s_geom)

and st_intersects(t.lock_geom, z.e_geom);



llc_zone = 981 rows (568k disk size) with s_geom and e_geom both of
datatype geometry(Multipolygon, 2163)

usca_trip_points7 = 79 million rows (469G disk size) with t.lock_geom
datatype geometry(Linestring, 2163)

(more detailed schema/stats can be provided if helpful)



postgresql.conf is pretty normal for a large system like this (with
appropriate shared_buffer, work_mem, etc. – can be provided if helpful, too)



What I’m finding in pg_stat_activity when running this is lots of
wait_events of type ‘LockManager’.

Rebuilding with CFLAGS=" -fno-omit-frame-pointer"
--prefix=/usr/local/pgsql_13debug --enable-dtrace CPPFLAGS='-DLOCK_DEBUG'
and then setting trace_lwlocks yields lots of records looking like:



[39691] LOG:  39691: LWLockAcquire(LockManager 0x7fab2cc09d80): excl 0
shared 0 haswaiters 1 waiters 6 rOK 1



Does anyone have any advice on how to alleviate LockManager’s LWlock issue?



Thanks for any assistance!



---Paul



Paul Friedman

CTO



677 Harrison St  |  San Francisco, CA 94107

*M:* (650) 270-7676

*E-mail:* paul.fried...@streetlightdata.com

Reply via email to