Em dom., 25 de jun. de 2023 às 17:49, David Rowley <dgrowle...@gmail.com> escreveu:
> There's no reason in the world > that those will speed up Bitmapsets, so why include them? > Of course optimization is the most important thing, but since you're going to touch the source, why not make it more readable. > > > v7 outperforms v4 by 29% (Query A) > > I tested v7 with query-a and I also see additional gains. However, > it's entirely down to your changes to bms_is_subset(). It seems, by > chance, with the given Bitmapsets that looping backwards for the given > sets is able to determine the result more quickly > I redid the tests and it seems that most of the difference comes from bms_subset. > > Here's some results from "perf top" > > query-a > v4 > > 30.08% postgres [.] bms_is_subset > 15.84% postgres [.] create_join_clause > 13.54% postgres [.] bms_equal > 11.03% postgres [.] get_eclass_for_sort_expr > 8.53% postgres [.] generate_implied_equalities_for_column > 3.11% postgres [.] generate_join_implied_equalities_normal > 1.03% postgres [.] add_child_rel_equivalences > 0.82% postgres [.] SearchCatCacheInternal > 0.73% postgres [.] AllocSetAlloc > 0.53% postgres [.] find_ec_member_matching_expr > 0.40% postgres [.] hash_search_with_hash_value > 0.36% postgres [.] palloc > 0.36% postgres [.] palloc0 > > latency average = 452.480 ms > > v7 > 20.51% postgres [.] create_join_clause > 15.33% postgres [.] bms_equal > 14.17% postgres [.] get_eclass_for_sort_expr > 12.05% postgres [.] bms_is_subset > 10.40% postgres [.] generate_implied_equalities_for_column > 3.90% postgres [.] generate_join_implied_equalities_normal > 1.34% postgres [.] add_child_rel_equivalences > 1.06% postgres [.] AllocSetAlloc > 1.00% postgres [.] SearchCatCacheInternal > 0.72% postgres [.] find_ec_member_matching_expr > 0.58% postgres [.] palloc0 > 0.49% postgres [.] palloc > 0.47% postgres [.] hash_search_with_hash_value > 0.44% libc.so.6 [.] __memmove_avx_unaligned_erms > > > latency average = 350.543 ms > > modified v7's bms_is_subset to go forwards then I get latency average > = 445.987 ms. > > If I add some debugging to bms_is_subset to have it record how many > words it checks, I see: > > postgres=# select sum(nwords) from forward; > sum > ----------- > 181490660 > (1 row) > > postgres=# select sum(nwords) from backwards; > sum > ---------- > 11322564 > (1 row) > > So, it took about 181 million loops in bms_is_member to plan query-a > when looping forward, but only 11 million when looping backwards. > > I think unless you've got some reason that you're able to justify why > we're always more likely to have to perform fewer word checks in > bms_is_subset() by looping backwards instead of forwards then I think > the performance gains you're showing here only happen to show better > results due to the given workload. It's just as easy to imagine that > you'll apply the equivalent slowdown for some other workload where the > initial words differ but all the remaining words all match. > Have you seen bms_compare? For some reason someone thought it would be better to loop backward the array. Since bms_subset performed very well with backward, I think that's a good reason to leave it as bms_compare. As in most cases, the array size is small, in general in both modes the performance will be equivalent. Anyway, I made a new patch v8, based on v4 but with some changes that I believe improve it. Windows 64 bits msvc 2019 64 bits == Query A == psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-a.sql psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-a.sql ============= patched v4: Time: 2305,445 ms (00:02,305) Time: 2185,972 ms (00:02,186) Time: 2177,434 ms (00:02,177) Time: 2169,883 ms (00:02,170) patched v8: Time: 2143,532 ms (00:02,144) Time: 2140,313 ms (00:02,140) Time: 2138,481 ms (00:02,138) Time: 2130,290 ms (00:02,130) == Query B == psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-b.sql psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-b.sql patched v4: Time: 2684,360 ms (00:02,684) Time: 2482,571 ms (00:02,483) Time: 2452,699 ms (00:02,453) Time: 2465,223 ms (00:02,465) patched v8: Time: 2493,281 ms (00:02,493) Time: 2490,090 ms (00:02,490) Time: 2432,515 ms (00:02,433) Time: 2426,860 ms (00:02,427) linux Ubuntu 64 bit gcc 64 bits == Query A == /usr/local/pgsql/bin/psql -U postgres -f /home/ranier/Documentos/benchmarks/bitmapset/create-tables-a.sql /usr/local/pgsql/bin/psql -U postgres -f /home/ranier/Documentos/benchmarks/bitmapset/query-a.sql ============= patched v4: Time: 933,181 ms Time: 931,520 ms Time: 906,496 ms Time: 872,446 ms patched v8: Time: 937,079 ms Time: 930,408 ms Time: 865,548 ms Time: 865,382 ms == Query B == /usr/local/pgsql/bin/psql -U postgres -f /home/ranier/Documentos/benchmarks/bitmapset/create-tables-b.sql /usr/local/pgsql/bin/psql -U postgres -f /home/ranier/Documentos/benchmarks/bitmapset/query-b.sql patched v4: Time: 1581,317 ms (00:01,581) Time: 1568,371 ms (00:01,568) Time: 1468,036 ms (00:01,468) Time: 1445,698 ms (00:01,446) patched v8: Time: 1437,997 ms (00:01,438) Time: 1437,435 ms (00:01,437) Time: 1440,422 ms (00:01,440) Time: 1436,112 ms (00:01,436) regards, Ranier Vilela
speeding-up-bitmapset-v8.patch
Description: Binary data