On Fri, Jun 2, 2023, at 10:01, Ants Aasma wrote: > Have you looked at roaring bitmaps? There is a pg_roaringbitmap > extension [1] already available that offers very fast unions, > intersections and membership tests over integer sets. I used it to get > some pretty impressive performance results for faceting search on > large document sets. [2]
Many thanks for the tip! New benchmark: We already had since before: > wget https://snap.stanford.edu/data/soc-pokec-relationships.txt.gz > gunzip soc-pokec-relationships.txt.gz > CREATE TABLE edges (from_node INT, to_node INT); > \copy edges from soc-pokec-relationships.txt; > ALTER TABLE edges ADD PRIMARY KEY (from_node, to_node); I've created a new `users` table from the `edges` table, with a new `friends` roaringbitmap column: CREATE TABLE users AS SELECT from_node AS id, rb_build_agg(to_node) AS friends FROM edges GROUP BY 1; ALTER TABLE users ADD PRIMARY KEY (id); Old query from before: CREATE OR REPLACE VIEW friends_of_friends AS WITH RECURSIVE friends_of_friends AS ( SELECT ARRAY[5867::bigint] AS current, 0 AS depth UNION ALL SELECT new_current, friends_of_friends.depth + 1 FROM friends_of_friends CROSS JOIN LATERAL ( SELECT array_agg(DISTINCT edges.to_node) AS new_current FROM edges WHERE from_node = ANY(friends_of_friends.current) ) q WHERE friends_of_friends.depth < 3 ) SELECT coalesce(array_length(current, 1), 0) AS count_friends_at_depth_3 FROM friends_of_friends WHERE depth = 3; ; New roaringbitmap-based query using users instead: CREATE OR REPLACE VIEW friends_of_friends_roaringbitmap AS WITH RECURSIVE friends_of_friends AS ( SELECT friends, 1 AS depth FROM users WHERE id = 5867 UNION ALL SELECT new_friends, friends_of_friends.depth + 1 FROM friends_of_friends CROSS JOIN LATERAL ( SELECT rb_or_agg(users.friends) AS new_friends FROM users WHERE users.id = ANY(rb_to_array(friends_of_friends.friends)) ) q WHERE friends_of_friends.depth < 3 ) SELECT rb_cardinality(friends) AS count_friends_at_depth_3 FROM friends_of_friends WHERE depth = 3 ; Note, depth is 1 at first level since we already have user 5867's friends in the users column. Maybe there is a better way to make use of the btree index on users.id, than to convert the roaringbitmap to an array in order to use = ANY(...), that is, this part: `users.id = ANY(rb_to_array(friends_of_friends.friends))`? Or maybe there is some entirely different but equivalent way of writing the query in a more efficient way? SELECT * FROM friends_of_friends; count_friends_at_depth_3 -------------------------- 1035293 (1 row) SELECT * FROM friends_of_friends_roaringbitmap; count_friends_at_depth_3 -------------------------- 1035293 (1 row) EXPLAIN ANALYZE SELECT * FROM friends_of_friends; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on friends_of_friends (cost=5722.03..5722.73 rows=1 width=4) (actual time=2232.896..2233.289 rows=1 loops=1) Filter: (depth = 3) Rows Removed by Filter: 3 CTE friends_of_friends -> Recursive Union (cost=0.00..5722.03 rows=31 width=36) (actual time=0.003..2228.707 rows=4 loops=1) -> Result (cost=0.00..0.01 rows=1 width=36) (actual time=0.001..0.001 rows=1 loops=1) -> Subquery Scan on "*SELECT* 2" (cost=190.59..572.17 rows=3 width=36) (actual time=556.806..556.837 rows=1 loops=4) -> Nested Loop (cost=190.59..572.06 rows=3 width=36) (actual time=553.748..553.748 rows=1 loops=4) -> WorkTable Scan on friends_of_friends friends_of_friends_1 (cost=0.00..0.22 rows=3 width=36) (actual time=0.000..0.001 rows=1 loops=4) Filter: (depth < 3) Rows Removed by Filter: 0 -> Aggregate (cost=190.59..190.60 rows=1 width=32) (actual time=737.427..737.427 rows=1 loops=3) -> Sort (cost=179.45..185.02 rows=2227 width=4) (actual time=577.192..649.812 rows=3486910 loops=3) Sort Key: edges.to_node Sort Method: quicksort Memory: 393217kB -> Index Only Scan using edges_pkey on edges (cost=0.56..55.62 rows=2227 width=4) (actual time=0.027..225.609 rows=3486910 loops=3) Index Cond: (from_node = ANY (friends_of_friends_1.current)) Heap Fetches: 0 Planning Time: 0.294 ms Execution Time: 2240.446 ms EXPLAIN ANALYZE SELECT * FROM friends_of_friends_roaringbitmap; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on friends_of_friends (cost=799.30..800.00 rows=1 width=8) (actual time=492.925..492.930 rows=1 loops=1) Filter: (depth = 3) Rows Removed by Filter: 2 CTE friends_of_friends -> Recursive Union (cost=0.43..799.30 rows=31 width=118) (actual time=0.061..492.842 rows=3 loops=1) -> Index Scan using users_pkey on users (cost=0.43..2.65 rows=1 width=118) (actual time=0.060..0.062 rows=1 loops=1) Index Cond: (id = 5867) -> Nested Loop (cost=26.45..79.63 rows=3 width=36) (actual time=164.244..164.244 rows=1 loops=3) -> WorkTable Scan on friends_of_friends friends_of_friends_1 (cost=0.00..0.22 rows=3 width=36) (actual time=0.000..0.001 rows=1 loops=3) Filter: (depth < 3) Rows Removed by Filter: 0 -> Aggregate (cost=26.45..26.46 rows=1 width=32) (actual time=246.359..246.359 rows=1 loops=2) -> Index Scan using users_pkey on users users_1 (cost=0.43..26.42 rows=10 width=114) (actual time=0.074..132.318 rows=116336 loops=2) Index Cond: (id = ANY (rb_to_array(friends_of_friends_1.friends))) Planning Time: 0.257 ms Execution Time: 493.134 ms