On Sun, Sep 15, 2019 at 8:17 PM sebb <seb...@gmail.com> wrote:

> On Mon, 16 Sep 2019 at 00:17, Gilles Sadowski <gillese...@gmail.com>
> wrote:
> >
> > Hi.
> >
> > Le sam. 14 sept. 2019 à 08:15, Claude Warren <cla...@xenei.com> a écrit
> :
> > >
> > > @Gilles
> > >
> > > I am happy to rename the package without the plural if that is the
> > > standard, I will also fix the indent issue.  Is there a definition
> that can
> > > be quickly imported into Eclipse to do the proper formatting?
> >
> > Hopefully, someone else can answer that one.
> >
> > > I am adding/updating all comments in the code.
> > >
> > > FilterConfig contains a main method to make it easy to assess how the
> > > various settings influence the length and number of functions in the
> > > filter.  This could go into another class or a Utility class but
> neither of
> > > those seem like a better solution.
> >
> > I don't think there should be a "main" method in a library.
> > Also, just printing to stdout does not seem very flexible.
> > Perhaps define a "toString()" method?
> >
> > > I agree with your assessment of final and immutable for methods,
> variables
> > > and classes.
> > >
> > > I am removing the FilterConfig constructor for BloomFilter and
> ensuring the
> > > BloomFilter makes a defensive copy.
> > >
> > > The difference between match() and equals() is that equals() is a bit
> > > vector equality and match is the standard bloom filter comparison T&C
> = T.
> > > Documentation is being added to the match() and inverseMatch() to
> explain
> > > this.
> > >
> > > STORED_SIZE is similar to Long.BYTES (perhaps it should be renamed).
> It
> > > identifies how many bytes the FilterConfig will use when written to a
> > > DataOuptut or read from a DataInput.
> >
> > Why is it necessary to know such internal details?
> > Anyways, I think that persistence is best left to higher-level code.
> >
> > >
> > > I think the Utils class can be removed once the other code is moved to
> > > Java8 streams and not extended iterators.
> > >
> > > Hash function is changed to Objects.hash()
> > >
> > > Constructors are/will be private where factories make sense.  For some
> the
> > > simple constructor makes more sense.  (e.g. BloomFilter(BitSet) )
> >
> > See
> >     https://www.baeldung.com/java-constructors-vs-static-factory-methods
> > and
> >     https://blog.joda.org/2014/03/valjos-value-java-objects.html
> >
> > > I have no problem removing the Serializable on FilterConfig, but can
> you
> > > explain (or point to a post) why it should be avoided?
> >
> > Same as above (not prejudging how application want to handle
> serialization).
>
> It's difficult to code Serializable classes properly, and just as hard
> to maintain them.
> Also the serialised format becomes part of the API making changes harder.
>
> We should not promise what we likely cannot continue to deliver,
> especially since it may not be needed.
>
> Serializable classes are also a target for code exploits.
>

+1 to removing Serializable support in its current implementation. If
serialization is a requirement, then it should be implemented using a
Serialization Proxy a la J. Block:

https://www.youtube.com/watch?v=V1vQf4qyMXg&feature=youtu.be&t=56m12s

Gary


> > > The update/build methods are being documented and an explanation of the
> > > difference included.
> >
> > Am I wrong that the "build(...)" methods only spare the typing of 2
> > characters?
> > If so, it's not worth the increase of the API size.
> >
> > >
> > > getLog() is also being documented
> > >
> > > The asBitSet() method was to extract the bitset from the bloom filter
> so
> > > that it could be stored otherwise used with BitSet based processes.
> The
> > > name of the method is being chagned to getBitSet() and it makes a
> defensive
> > > copy.
> > >
> > > The ProtoBloomFilter is not a factory.  It is really a bean and there
> are
> > > times when it makes sense to use the ProtoBloomFilter.  The name is
> > > important as it identifies what it is.  It contains the necessary
> > > information to build any configuration of BloomFilter.  There is a
> proposal
> > > to use ProtoBloomFilters in service calls.  It could be defined as an
> > > interface with a concrete implementation in the
> ProtoBloomFilterBuilder.
> > >
> > > As an example of ProtoBloomFilter usage.  Say you have a Person object
> that
> > > has the user's first name, last name, eye color, hair color.
> > >
> > > The ProtoBloomFilterBuilder would accept the 4 values as 4 calls to
> > > update() followed by a build() to build the protoBloomFilter.
> > > ProtoBloomFilter proto = ne wProtoBloomFilterBuilder().update( "Claude"
> > > ).update( "Warren" ).update( "brown" ).update( "silver" ).build();
> >
> > I'm still missing why it cannot be a factory, but maybe it is a
> > question of design habits.  What about the following:
> >
> > public class BloomFilter {
> >     private final BitSet bitSet;
> >
> >     private BloomFilter(BitSet bs) {
> >         bitSet = (BitSet) (bs.clone());
> >     }
> >
> >     /** Factory method. (cf. "ValJO") */
> >     public BloomFilter of(BitSet bs) {
> >         return new BloomFilter(bs);
> >     }
> >
> >     // Combined "ProtoBloomFilter" and "ProtoBloomFilterBuilder" logic.
> >     public static class Builder {
> >          private final Set<Hash> hashes = new HashSet<>();
> >
> >          private Builder() {}
> >
> >          /** Factory method(s). */
> >          public Builder with(Builder other) {
> >               hashes.addAll(other.hashes);
> >          }
> >          public Builder with(ByteBuffer b) { /* ... */ }
> >          // etc.
> >
> >         /** Factory (converter). */
> >         public BloomFilter create(FilterConfig cfg) {
> >              final BitSet set = new BitSet(cfg.getNumberOfBits());
> >              for (Hash hash : hashes) {
> >                   hash.populate(set, cfg);
> >              }
> >              return BloomFilter.of(set);
> >         }
> >     }
> > }
> >
> > Perhaps I'm taking too many shortcuts... But, as pointed out by
> > Gary, more coverage is required; and unit tests will help figure out
> > the leanest design.
> >
> > >
> > > I can create a bloom filter configuration that has 1x10^7  items and
> > > accepts 1 / 1x10^6 collisions.
> > > FilterConfig config1 = new FilterConfig( 10000000, 1000000)
> > >
> > > and build the BloomFilter
> > > BloomFilter filter1 = proto.create( config1 )
> > >
> > > I can use that filter to determine if the person is in a collection of
> > > 1x10^7 items.  The implementation of that collection could be multiple
> > > buckets of  1x10^4 items with a collision rate of 5x10^3 items.  To
> check
> > > that I create an appropriate FilterConfig
> > > FilterConfig config2 = new FilterConfig( 10000, 5000 )
> > >
> > > and then create the proper bloom filter
> > > BloomFilter filter2 = proto.create( config2 )
> > >
> >
> > With my above attempt at an alternative API, we'd have (untested):
> >
> > BloomFilter.Builder proto = BloomFilter
> >      .with("Claude")
> >      .with("Warren")
> >      .with("brown").
> >      .with("silver");
> >
> > BloomFilter filter1 = proto.create(config1);
> > BloomFilter filter2 = proto.create(config2);
> >
> >
> > IIUC, in your current implementation of "ProtoBloomFilter", there
> > is a side-effect of "build()" (i.e. the call to "clear()"); thus calling
> > "build()" twice will not yield the same result.  I don't think that's
> > a desirable feature.
> >
> > Regards,
> > Gilles
> >
> > > Claude
> > >
> > >
> > > On Fri, Sep 13, 2019 at 2:22 PM Gilles Sadowski <gillese...@gmail.com>
> > > wrote:
> > >
> > > > > > [...]
> > > > >
> > > > > Gilles,
> > > > >
> > > > > Take a look at the PR, I added comments to allow the PR to have 0
> deps.
> > > > >
> > > > > Gary
> > > > >
> > > > >
> > > >
> > > > IMO, the package should be named "bloomfilter" (without "s").
> > > > Naming seems inconsistent in [Collections]: Some (package)
> > > > names are singular, other plural.
> > > >
> > > > * Indent must be 4 spaces.
> > > > * All fields and methods must be commented.
> > > > * "FilterConfig" contains a "main" method.
> > > > * Local variables and fields must be declared "final" whenever
> constant.
> > > > * Some methods are declared "final", others not.
> > > > * "BloomFilter" should be made immutable.
> > > > * "BloomFilter" could do without a second constructor: "FilterConfig"
> > > > should have a method that returns a "BitSet".
> > > > * "BloomFilter" must make a defensive copy of its "bitSet" argument.
> > > > * What's the difference between "match" and "equals"?
> > > > * Why is "STORED_SIZE" public?
> > > > * "Utils" classes should be avoided.
> > > > * Hash function should be imported/shaded from "Commons Codec"
> > > > (and defined there if missing).
> > > > * Constructor(s) should be private (instantiation through factory
> > > > methods (cf. ValJO).
> > > > * "Serializable" should be avoided.
> > > > * The "update" methods are missing explanations (or reference) about
> > > > their purpose.  Also, it seems that "update" and "build" are
> redundant.
> > > > * Does "getLog" return a standard value?  If so, the reference should
> > > > appear in the Javadoc (not just as code comment).
> > > > * What is the purpose of method "asBitSet()"?
> > > > * (IIUC) "ProtoBloomFilter" should be renamed "BloomFilterFactory".
> > > > "ProtoBloomFilterBuilder" should be a static inner class of that
> > > > factory.
> > > >
> > > > Regards,
> > > > Gilles
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > For additional commands, e-mail: dev-h...@commons.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

Reply via email to