Em dom., 25 de jun. de 2023 às 17:49, David Rowley <dgrowle...@gmail.com>
escreveu:

> There's no reason in the world
> that those will speed up Bitmapsets, so why include them?
>
Of course optimization is the most important thing,
but since you're going to touch the source, why not make it more readable.


>
> > v7 outperforms v4 by 29% (Query A)
>
> I tested v7 with query-a and I also see additional gains. However,
> it's entirely down to your changes to bms_is_subset(). It seems, by
> chance, with the given Bitmapsets that looping backwards for the given
> sets is able to determine the result more quickly
>
I redid the tests and it seems that most of the difference comes from
bms_subset.


>
> Here's some results from "perf top"
>
> query-a
> v4
>
>   30.08%  postgres          [.] bms_is_subset
>   15.84%  postgres          [.] create_join_clause
>   13.54%  postgres          [.] bms_equal
>   11.03%  postgres          [.] get_eclass_for_sort_expr
>    8.53%  postgres          [.] generate_implied_equalities_for_column
>    3.11%  postgres          [.] generate_join_implied_equalities_normal
>    1.03%  postgres          [.] add_child_rel_equivalences
>    0.82%  postgres          [.] SearchCatCacheInternal
>    0.73%  postgres          [.] AllocSetAlloc
>    0.53%  postgres          [.] find_ec_member_matching_expr
>    0.40%  postgres          [.] hash_search_with_hash_value
>    0.36%  postgres          [.] palloc
>    0.36%  postgres          [.] palloc0
>
> latency average = 452.480 ms
>
> v7
>   20.51%  postgres          [.] create_join_clause
>   15.33%  postgres          [.] bms_equal
>   14.17%  postgres          [.] get_eclass_for_sort_expr
>   12.05%  postgres          [.] bms_is_subset
>   10.40%  postgres          [.] generate_implied_equalities_for_column
>    3.90%  postgres          [.] generate_join_implied_equalities_normal
>    1.34%  postgres          [.] add_child_rel_equivalences
>    1.06%  postgres          [.] AllocSetAlloc
>    1.00%  postgres          [.] SearchCatCacheInternal
>    0.72%  postgres          [.] find_ec_member_matching_expr
>    0.58%  postgres          [.] palloc0
>    0.49%  postgres          [.] palloc
>    0.47%  postgres          [.] hash_search_with_hash_value
>    0.44%  libc.so.6         [.] __memmove_avx_unaligned_erms
>
>
> latency average = 350.543 ms
>
> modified v7's bms_is_subset to go forwards then I get latency average
> = 445.987 ms.
>
> If I add some debugging to bms_is_subset to have it record how many
> words it checks, I see:
>
> postgres=# select sum(nwords) from forward;
>     sum
> -----------
>  181490660
> (1 row)
>
> postgres=# select sum(nwords) from backwards;
>    sum
> ----------
>  11322564
> (1 row)
>
> So, it took about 181 million loops in bms_is_member to plan query-a
> when looping forward, but only 11 million when looping backwards.
>
> I think unless you've got some reason that you're able to justify why
> we're always more likely to have to perform fewer word checks in
> bms_is_subset() by looping backwards instead of forwards then I think
> the performance gains you're showing here only happen to show better
> results due to the given workload.  It's just as easy to imagine that
> you'll apply the equivalent slowdown for some other workload where the
> initial words differ but all the remaining words all match.
>
Have you seen bms_compare?
For some reason someone thought it would be better to loop backward the
array.

Since bms_subset performed very well with backward, I think that's a good
reason to leave it as bms_compare.
As in most cases, the array size is small, in general in both modes the
performance will be equivalent.

Anyway, I made a new patch v8, based on v4 but with some changes that I
believe improve it.

Windows 64 bits
msvc 2019 64 bits

== Query A ==
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-a.sql
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-a.sql
=============

patched v4:
Time: 2305,445 ms (00:02,305)
Time: 2185,972 ms (00:02,186)
Time: 2177,434 ms (00:02,177)
Time: 2169,883 ms (00:02,170)

patched v8:
Time: 2143,532 ms (00:02,144)
Time: 2140,313 ms (00:02,140)
Time: 2138,481 ms (00:02,138)
Time: 2130,290 ms (00:02,130)

== Query B ==
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\create-tables-b.sql
psql -U postgres -f c:\postgres_bench\tmp\bitmapset\query-b.sql

patched v4:
Time: 2684,360 ms (00:02,684)
Time: 2482,571 ms (00:02,483)
Time: 2452,699 ms (00:02,453)
Time: 2465,223 ms (00:02,465)

patched v8:
Time: 2493,281 ms (00:02,493)
Time: 2490,090 ms (00:02,490)
Time: 2432,515 ms (00:02,433)
Time: 2426,860 ms (00:02,427)


linux Ubuntu 64 bit
gcc 64 bits

== Query A ==
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/create-tables-a.sql
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/query-a.sql
=============

patched v4:
Time: 933,181 ms
Time: 931,520 ms
Time: 906,496 ms
Time: 872,446 ms

patched v8:
Time: 937,079 ms
Time: 930,408 ms
Time: 865,548 ms
Time: 865,382 ms

== Query B ==
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/create-tables-b.sql
/usr/local/pgsql/bin/psql -U postgres -f
/home/ranier/Documentos/benchmarks/bitmapset/query-b.sql

patched v4:
Time: 1581,317 ms (00:01,581)
Time: 1568,371 ms (00:01,568)
Time: 1468,036 ms (00:01,468)
Time: 1445,698 ms (00:01,446)

patched v8:
Time: 1437,997 ms (00:01,438)
Time: 1437,435 ms (00:01,437)
Time: 1440,422 ms (00:01,440)
Time: 1436,112 ms (00:01,436)

regards,
Ranier Vilela

Attachment: speeding-up-bitmapset-v8.patch
Description: Binary data

Reply via email to