On Sat, Sep 14, 2019 at 2:15 AM Claude Warren <cla...@xenei.com> wrote:

> @Gilles
>
> I am happy to rename the package without the plural if that is the
> standard, I will also fix the indent issue.  Is there a definition that can
> be quickly imported into Eclipse to do the proper formatting?
>
> I am adding/updating all comments in the code.
>
> FilterConfig contains a main method to make it easy to assess how the
> various settings influence the length and number of functions in the
> filter.  This could go into another class or a Utility class but neither of
> those seem like a better solution.
>
> I agree with your assessment of final and immutable for methods, variables
> and classes.
>
> I am removing the FilterConfig constructor for BloomFilter and ensuring the
> BloomFilter makes a defensive copy.
>
> The difference between match() and equals() is that equals() is a bit
> vector equality and match is the standard bloom filter comparison T&C = T.
> Documentation is being added to the match() and inverseMatch() to explain
> this.
>
> STORED_SIZE is similar to Long.BYTES (perhaps it should be renamed).  It
> identifies how many bytes the FilterConfig will use when written to a
> DataOuptut or read from a DataInput.
>
> I think the Utils class can be removed once the other code is moved to
> Java8 streams and not extended iterators.
>
> Hash function is changed to Objects.hash()
>
> Constructors are/will be private where factories make sense.  For some the
> simple constructor makes more sense.  (e.g. BloomFilter(BitSet) )
>
> I have no problem removing the Serializable on FilterConfig, but can you
> explain (or point to a post) why it should be avoided?
>

>From the master:
https://www.youtube.com/watch?v=V1vQf4qyMXg&feature=youtu.be&t=56m12s

Also related:
https://opensource.googleblog.com/2017/03/operation-rosehub.html

Gary


> The update/build methods are being documented and an explanation of the
> difference included.
>
> getLog() is also being documented
>
> The asBitSet() method was to extract the bitset from the bloom filter so
> that it could be stored otherwise used with BitSet based processes.  The
> name of the method is being chagned to getBitSet() and it makes a defensive
> copy.
>
> The ProtoBloomFilter is not a factory.  It is really a bean and there are
> times when it makes sense to use the ProtoBloomFilter.  The name is
> important as it identifies what it is.  It contains the necessary
> information to build any configuration of BloomFilter.  There is a proposal
> to use ProtoBloomFilters in service calls.  It could be defined as an
> interface with a concrete implementation in the ProtoBloomFilterBuilder.
>
> As an example of ProtoBloomFilter usage.  Say you have a Person object that
> has the user's first name, last name, eye color, hair color.
>
> The ProtoBloomFilterBuilder would accept the 4 values as 4 calls to
> update() followed by a build() to build the protoBloomFilter.
> ProtoBloomFilter proto = ne wProtoBloomFilterBuilder().update( "Claude"
> ).update( "Warren" ).update( "brown" ).update( "silver" ).build();
>
> I can create a bloom filter configuration that has 1x10^7  items and
> accepts 1 / 1x10^6 collisions.
> FilterConfig config1 = new FilterConfig( 10000000, 1000000)
>
> and build the BloomFilter
> BloomFilter filter1 = proto.create( config1 )
>
> I can use that filter to determine if the person is in a collection of
> 1x10^7 items.  The implementation of that collection could be multiple
> buckets of  1x10^4 items with a collision rate of 5x10^3 items.  To check
> that I create an appropriate FilterConfig
> FilterConfig config2 = new FilterConfig( 10000, 5000 )
>
> and then create the proper bloom filter
> BloomFilter filter2 = proto.create( config2 )
>
> Claude
>
>
> On Fri, Sep 13, 2019 at 2:22 PM Gilles Sadowski <gillese...@gmail.com>
> wrote:
>
> > > > [...]
> > >
> > > Gilles,
> > >
> > > Take a look at the PR, I added comments to allow the PR to have 0 deps.
> > >
> > > Gary
> > >
> > >
> >
> > IMO, the package should be named "bloomfilter" (without "s").
> > Naming seems inconsistent in [Collections]: Some (package)
> > names are singular, other plural.
> >
> > * Indent must be 4 spaces.
> > * All fields and methods must be commented.
> > * "FilterConfig" contains a "main" method.
> > * Local variables and fields must be declared "final" whenever constant.
> > * Some methods are declared "final", others not.
> > * "BloomFilter" should be made immutable.
> > * "BloomFilter" could do without a second constructor: "FilterConfig"
> > should have a method that returns a "BitSet".
> > * "BloomFilter" must make a defensive copy of its "bitSet" argument.
> > * What's the difference between "match" and "equals"?
> > * Why is "STORED_SIZE" public?
> > * "Utils" classes should be avoided.
> > * Hash function should be imported/shaded from "Commons Codec"
> > (and defined there if missing).
> > * Constructor(s) should be private (instantiation through factory
> > methods (cf. ValJO).
> > * "Serializable" should be avoided.
> > * The "update" methods are missing explanations (or reference) about
> > their purpose.  Also, it seems that "update" and "build" are redundant.
> > * Does "getLog" return a standard value?  If so, the reference should
> > appear in the Javadoc (not just as code comment).
> > * What is the purpose of method "asBitSet()"?
> > * (IIUC) "ProtoBloomFilter" should be renamed "BloomFilterFactory".
> > "ProtoBloomFilterBuilder" should be a static inner class of that
> > factory.
> >
> > Regards,
> > Gilles
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
> >
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>

Reply via email to