I am confused.

If we restrict what Shape stores to "m" and "t" we can provide methods to
estimate "n" and "p" for the estimated "n"

If the constructors for Shape remains as as they are then developers don't
need to go online to get numbers to plug into Shape.  They only need to go
online if they want to understand the interplay between the values.

I think that most of the time developers will know the number of items and
a probability and let "k" and "m" default.

It seems like the API is built into the Shape constructor.  Pass in any
hashFunctionIdentity and various parameters and you can get the other
parameters.

On Sat, Mar 14, 2020 at 9:50 AM Alex Herbert <alex.d.herb...@gmail.com>
wrote:

> Should we not provide an API to compute this in code. Otherwise every
> developer who uses the code will have to go to the online Bloom filter
> calculator to check. They cannot do this dynamically in a program. If the
> equations are standard then I think we should provide a calculator.
>
> And Shape does not need to store the number of items to support a Bloom
> filter. Strictly speaking it only needs the number of bits. But bundling
> the hash function identity and number of hash functions saves you having to
> pass that separately to any Bloom filter and removes the requirement to
> specify these separately in the Bloom filter interface.
>
>
>
> On Sat, 14 Mar 2020, 09:31 Claude Warren, <cla...@xenei.com> wrote:
>
> > Shape is not intended to "Perform the standard computations using some of
> > n, m, k, p to produce optimal values for the other values of n, m, k, p:"
> > that is left to the developer to determine possibly with the help of
> > https://hur.st/bloomfilter/ as referenced in the class javadoc. However,
> > writing the constructors ends up performing the standard computations.
> >
> > On Sat, Mar 14, 2020 at 8:54 AM Alex Herbert <alex.d.herb...@gmail.com>
> > wrote:
> >
> > >
> > >
> > > > On 10 Mar 2020, at 01:36, Alex Herbert <alex.d.herb...@gmail.com>
> > wrote:
> > > >
> > > > I was going to modify the BloomFilter interface as discussed on this
> > > thread.
> > > >
> > > > However before that I have made some changes to the current code to
> > tidy
> > > it up. PR is [1]. The changes target the following:
> > >
> > > >
> > > > [1] https://github.com/apache/commons-collections/pull/140 <
> > > https://github.com/apache/commons-collections/pull/140>
> > > >
> > >
> > > This PR has updated the validations that the Shape does during
> > > construction. Number of bits has changed from >= 8 to > 0. But working
> on
> > > this it seems to me that the Shape is a composite class and should be
> > > refactored.
> > >
> > > I think that the Shape is combining two functionalities and should be
> > > separated:
> > >
> > > 1. Perform the standard computations using some of n, m, k, p to
> produce
> > > optimal values for the other values of n, m, k, p:
> > >
> > > n = number of items
> > > m = number of bits
> > > k = number of hash functions
> > > p = Probability of false positives (when storing n items)
> > >
> > > n = ceil(m / (-k / ln(1 - exp(ln(p) / k))))
> > > p = pow(1 - exp(-k / (m / n)), k)
> > > m = ceil((n * ln(p)) / ln(1 / pow(2, ln(2))))
> > > k = round((m / n) * ln(2))
> > >
> > > 2. Provide the functional shape for a Bloom filter
> > >
> > > This only requires storing m and k. It also requires storing the hash
> > > function identity.
> > >
> > > I suggest:
> > >
> > > - Moving all the computations to a ShapeCalculator
> > >
> > > - Only storing k and m in the Shape.
> > >
> > > - If you want to know n given p or p given n then you just call the
> > method
> > > in the ShapeCalculator.
> > >
> > > - Change constructors in the Shape to allow construction with:
> > >
> > > n, m; where n < m; compute k
> > > n, m, k; where n < m; validate p
> > > n, p; compute k and m
> > > m, k, p; compute n
> > > k, m;
> > >
> > > This adds the constructor (k, m) which is all you actually need for a
> > > working Bloom filter. Possible we should validate k < m.
> > >
> > > The constructor using n, m and k only serves to validate p is less
> than 1
> > > when the filter is saturated since n is only used to compute p. Perhaps
> > > this constructor should be removed.
> > >
> > > My argument is that a Bloom filter does not require anything to
> function
> > > other than the number of bits (m) and a hash functions (k). Typical use
> > > case is to construct using (n, p) if you know your number of items and
> > > filter efficiency, or a constructor using m if you know your memory
> > > limitations.
> > >
> > > Do we need any other constructors, e.g.?
> > >
> > > m, p; compute k
> > >
> > > Note that bundling the HashFunctionIdentity with the Shape ties the
> Shape
> > > to a functional Bloom filter. It could be renamed to BloomFilterShape
> to
> > > make it clear that it is directly linked to the Bloom filter. But that
> is
> > > over verbose.
> > >
> > > However the name Shape is generic and does not incorporate the fact
> that
> > > it also has a HashFunctionIdentity. There should be something in the
> > Shape
> > > javadoc to make it clear the Shape is not just a store for values
> > > associated with any Bloom filter (m, k) but is specific to a working
> > Bloom
> > > filter as it also defines the hash function to be used with a Bloom
> > filter.
> > >
> > > Alex
> > >
> > >
> > >
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to