On Mon, 26 Jun 2023 at 00:29, Ranier Vilela <ranier...@gmail.com> wrote: > Yuya Watari presented a series of patches, with the objective of improving > the Bitmapset [1]. > After reading through the patches, I saw a lot of good ideas and thought I'd > help. > Unfortunately, my suggestions were not well received.
For the future, I recommend if you do have ideas that you wish to convey and think a patch is the easiest way to do that, then it's best to send those with an extension that won't cause the CFbot to pickup your patch instead. You patch did and still does seem to have a load of extra changes that are unjustified by your claims. There are quite a lot of white space changes going on. There's no reason in the world that those will speed up Bitmapsets, so why include them? > v7 outperforms v4 by 29% (Query A) I tested v7 with query-a and I also see additional gains. However, it's entirely down to your changes to bms_is_subset(). It seems, by chance, with the given Bitmapsets that looping backwards for the given sets is able to determine the result more quickly Here's some results from "perf top" query-a v4 30.08% postgres [.] bms_is_subset 15.84% postgres [.] create_join_clause 13.54% postgres [.] bms_equal 11.03% postgres [.] get_eclass_for_sort_expr 8.53% postgres [.] generate_implied_equalities_for_column 3.11% postgres [.] generate_join_implied_equalities_normal 1.03% postgres [.] add_child_rel_equivalences 0.82% postgres [.] SearchCatCacheInternal 0.73% postgres [.] AllocSetAlloc 0.53% postgres [.] find_ec_member_matching_expr 0.40% postgres [.] hash_search_with_hash_value 0.36% postgres [.] palloc 0.36% postgres [.] palloc0 latency average = 452.480 ms v7 20.51% postgres [.] create_join_clause 15.33% postgres [.] bms_equal 14.17% postgres [.] get_eclass_for_sort_expr 12.05% postgres [.] bms_is_subset 10.40% postgres [.] generate_implied_equalities_for_column 3.90% postgres [.] generate_join_implied_equalities_normal 1.34% postgres [.] add_child_rel_equivalences 1.06% postgres [.] AllocSetAlloc 1.00% postgres [.] SearchCatCacheInternal 0.72% postgres [.] find_ec_member_matching_expr 0.58% postgres [.] palloc0 0.49% postgres [.] palloc 0.47% postgres [.] hash_search_with_hash_value 0.44% libc.so.6 [.] __memmove_avx_unaligned_erms latency average = 350.543 ms modified v7's bms_is_subset to go forwards then I get latency average = 445.987 ms. If I add some debugging to bms_is_subset to have it record how many words it checks, I see: postgres=# select sum(nwords) from forward; sum ----------- 181490660 (1 row) postgres=# select sum(nwords) from backwards; sum ---------- 11322564 (1 row) So, it took about 181 million loops in bms_is_member to plan query-a when looping forward, but only 11 million when looping backwards. I think unless you've got some reason that you're able to justify why we're always more likely to have to perform fewer word checks in bms_is_subset() by looping backwards instead of forwards then I think the performance gains you're showing here only happen to show better results due to the given workload. It's just as easy to imagine that you'll apply the equivalent slowdown for some other workload where the initial words differ but all the remaining words all match. This seems equivalent to someone suggesting that looping backwards in list_member() is better than looping forwards because it'll find the member more quickly. I can't imagine us ever considering that would be a good change to make and I think the same for bms_is_member(). David