Re: Bloom Indexes - bit array length and the total number of bits (or hash functions ?? ) !

Fabien COELHO Sun, 09 Jun 2019 23:24:56 -0700

But the 2 direct questions i have are :

1. What is the structure of the Bloom Index ? Can you please let me know
what are the fields of a Bloom Index ? Is it just the Item Pointer
and BloomSignatureWord ?

I'm not sure of Postgres actual implementation, I have just looked at theunderlying hash functions implementation.

When i describe my bloom index it looks like following.

postgres=# \d+ foo.idx_bloom_bar
                  Index "foo.idx_bloom_bar"
Column  |  Type   | Key? | Definition | Storage | Stats target
---------+---------+------+------------+---------+--------------
id      | integer | yes  | id         | plain   |
dept    | integer | yes  | dept       | plain   |
id2     | integer | yes  | id2        | plain   |
id3     | integer | yes  | id3        | plain   |
id4     | integer | yes  | id4        | plain   |
id5     | integer | yes  | id5        | plain   |
id6     | integer | yes  | id6        | plain   |
zipcode | integer | yes  | zipcode    | plain   |
bloom, for table "foo.bar"

The bloom index associates a signature, i.e. a bitfield the size of whichis specified by the parameter "length", to the tuple location. Thebitfield is computed by hashing the value of columns which are listedabove in the index definition. The many values are somehow compressed intothe small signature.

Options: length=64, col1=4, col2=4, col3=4, col4=4, col5=4, col6=4, col7=4,
col8=4

I doubt that these parameters are good. The is no point to include aunique column into a bloom index. If you look at my blog, the number ofbits associated to each field should depend on the expected selectivity,which depends on the entropy of each field and the signature size. Thecolumn entropy can be computed with a query.

2. If we are storing just one signature word per row, how is this
aggregated for all column values of that row into one signature in high
level ?

The aggregation, if performed, is not very useful in practice because itcan only be effective on a few (first) bits, which are randomly computedanyway and the value of the query is not likely to hit them.

Fundamentally all bitfields are scanned to extract which tuples arepossibly of interest, i.e. are not excluded by the index. The "full scan"of the bloom index is not a bad thing if it is much smaller than the tableitself.

For example, if length = 64, does it mean that a bit array of 64 bits is
generated per each row ?


Yes.

If col1=4, does it mean the value of col1 is passed to 4 hash functions
that generate 4 positions that can be set to 1 in the bit array ?


Yes.

Try to apply the formula in the blog to see what you get for yourparameters, but it is likely to be < 4.


--
Fabien.

Re: Bloom Indexes - bit array length and the total number of bits (or hash functions ?? ) !

Reply via email to