Hi, I'm stumbling on an issue which seems like this one: https://www.postgresql.org/message-id/20170719152038.19353.71475%40wrigleys.postgresql.org, but I hope someone can shed some light on my specific case.
Software: POSTGIS="3.1.4 ded6c34" [EXTENSION] PGSQL="140" GEOS="3.8.0-CAPI-1.13.1 " PROJ="6.3.1" LIBXML="2.9.10" LIBJSON="0.13.1" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" PostgreSQL 14.0 (Ubuntu 14.0-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit 2 relevant tables and indices: CREATE TABLE osm_current.planet_osm_point ( osm_id bigint NOT NULL, tags hstore, way geometry(Point,3857) ); CREATE INDEX planet_osm_poi_bigfunc_geo ON osm_current.planet_osm_point USING gist (bigfunc(tags), way) WHERE bigfunc(tags) <= 14; CREATE TABLE osm_current.planet_osm_polygon ( osm_id bigint NOT NULL, tags hstore, way geometry(Polygon,3857) ); CREATE INDEX planet_osm_polygon_bigfunc_geo ON osm_current.planet_osm_polygon USING gist (bigfunc(tags), way) WHERE bigfunc(tags) <= 14; Query: SELECT * FROM osm_current.planet_osm_polygon WHERE bigfunc(tags) <= 7 AND NOT EXISTS( SELECT * FROM osm_current.planet_osm_point WHERE bigfunc(planet_osm_point.tags) <= 7 AND ST_Intersects(planet_osm_point.way,planet_osm_polygon.way) AND bigfunc2(planet_osm_point.tags) = bigfunc2(planet_osm_polygon.tags) ) AND ST_Intersects( 'SRID=3857;POLYGON((15012477.510296581 3741379.0533562037,15012477.510296581 4398859.837299369,15669958.252794353 4398859.837299369,15669958.252794353 3741379.0533562037,15012477.510296581 3741379.0533562037))'::geometry, way) Normal execution: I canceled it after 1 hour... Explain gives: Gather (cost=22998304.12..81977433.81 rows=2628686 width=262) Workers Planned: 2 -> Parallel Hash Anti Join (cost=22997304.12..81713565.21 rows=1095286 width=262) Hash Cond: (bigfunc2(planet_osm_polygon.tags) = bigfunc2(planet_osm_point.tags)) Join Filter: st_intersects(planet_osm_point.way, planet_osm_polygon.way) -> Parallel Bitmap Heap Scan on planet_osm_polygon (cost=51152.38..30790214.58 rows=1096787 width=262) Recheck Cond: (bigfunc(tags) <= 7) Filter: st_intersects('0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry, way) -> Bitmap Index Scan on planet_osm_polygon_bigfunc_geo (cost=0.00..50494.31 rows=2632289 width=0) Index Cond: ((bigfunc(tags) <= 7) AND (way && '0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry)) -> Parallel Hash (cost=22204690.21..22204690.21 rows=23875962 width=126) -> Parallel Bitmap Heap Scan on planet_osm_point (cost=309564.90..22204690.21 rows=23875962 width=126) Recheck Cond: (bigfunc(tags) <= 7) -> Bitmap Index Scan on planet_osm_poi_bigfunc_geo (cost=0.00..295239.32 rows=57302310 width=0) Index Cond: (bigfunc(tags) <= 7) When setting enable_hashjoin to false it gets radically different: Gather (cost=52152.79..169588182414.71 rows=2628686 width=262) (actual time=11.162..1037.116 rows=5381 loops=1) Output: planet_osm_polygon.osm_id, planet_osm_polygon.tags, planet_osm_polygon.way Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=380500 read=5531 -> Nested Loop Anti Join (cost=51152.79..169587918546.11 rows=1095286 width=262) (actual time=2.867..1015.295 rows=1794 loops=3) Output: planet_osm_polygon.osm_id, planet_osm_polygon.tags, planet_osm_polygon.way Buffers: shared hit=380500 read=5531 Worker 0: actual time=1.168..1011.822 rows=1834 loops=1 Buffers: shared hit=129515 read=1663 Worker 1: actual time=1.236..1010.438 rows=1858 loops=1 Buffers: shared hit=129837 read=1632 -> Parallel Bitmap Heap Scan on osm_current.planet_osm_polygon (cost=51152.38..30790214.58 rows=1096787 width=262) (actual time=1.846..23.809 rows=1853 loops=3) Output: planet_osm_polygon.osm_id, planet_osm_polygon.tags, planet_osm_polygon.way Recheck Cond: (bigfunc(planet_osm_polygon.tags) <= 7) Filter: st_intersects('0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry, planet_osm_polygon.way) Heap Blocks: exact=1235 Buffers: shared hit=9 read=4104 Worker 0: actual time=0.135..22.343 rows=1902 loops=1 Buffers: shared hit=2 read=1317 Worker 1: actual time=0.174..21.743 rows=1904 loops=1 Buffers: shared hit=3 read=1262 -> Bitmap Index Scan on planet_osm_polygon_bigfunc_geo (cost=0.00..50494.31 rows=2632289 width=0) (actual time=4.552..4.564 rows=5560 loops=1) Index Cond: ((bigfunc(planet_osm_polygon.tags) <= 7) AND (planet_osm_polygon.way && '0103000020110F000001000000050000007F5954B04FA26C414760D486618B4C417F5954B04FA26C41185096F5C2C750412FE416C858E36D41185096F5C2C750412FE416C858E36D414760D486618B4C417F5954B04FA26C414760D486618B4C41'::geometry)) Buffers: shared read=294 -> Index Scan using planet_osm_poi_bigfunc_geo on osm_current.planet_osm_point (cost=0.42..154805.97 rows=29 width=126) (actual time=0.534..0.534 rows=0 loops=5560) Output: planet_osm_point.osm_id, planet_osm_point.tags, planet_osm_point.way Index Cond: ((bigfunc(planet_osm_point.tags) <= 7) AND (planet_osm_point.way && planet_osm_polygon.way)) Filter: ((bigfunc2(planet_osm_point.tags) = bigfunc2(planet_osm_polygon.tags)) AND st_intersects(planet_osm_point.way, planet_osm_polygon.way)) Rows Removed by Filter: 0 Buffers: shared hit=380491 read=1427 Worker 0: actual time=0.519..0.519 rows=0 loops=1902 Buffers: shared hit=129513 read=346 Worker 1: actual time=0.518..0.518 rows=0 loops=1904 Buffers: shared hit=129834 read=370 Planning Time: 8.837 ms Execution Time: 1037.867 ms so finishes in a second. where bigfunc(tags) is a huge case when list wich returns an integer between 1 and 20 and bigfunc2(tags) is a huge case which returns a text[] with mostly 1 item in it The primary cause seems to be the selectivity, but table+index is vacuumed and analyzed... This query is used in a cursor so it's not easy for me to disable the hashjoin in our workflow, the only thing I can do to fix it is to keep adding AND ST_Intersects( 'SRID=3857;POLYGON((15012477.510296581 3741379.0533562037,15012477.510296581 4398859.837299369,15669958.252794353 4398859.837299369,15669958.252794353 3741379.0533562037,15012477.510296581 3741379.0533562037))'::geometry, way) until the queryplan is reasonable (in the most extreme case I needed 5 of them). This is ofcourse not the way to go (also disabling things in production is not what I want) so any pointers on how to let postgres use a better estimation on the number of rows is welcome! Paul P.S. when replying, please include me too