Due to numerous security flaws in Java serialization, if you do use
it, make sure to use serialization proxies and make them as simple as
possible.

On Sun, 15 Sep 2019 at 20:32, Claude Warren <cla...@xenei.com> wrote:
>
> I am refactoring some of the code to make the Builder and Hash classes
> enclosed in the ProtoBloomFilter class.  I have also add the proxy
> serializer as noted in the effective java talk.  This simplifies the
> classes significantly.  I will have unit tests and fix the javadocs before
> the next push.
>
> Claude
>
> On Mon, Sep 16, 2019 at 2:01 AM Gary Gregory <garydgreg...@gmail.com> wrote:
>
> > On Sun, Sep 15, 2019 at 8:17 PM sebb <seb...@gmail.com> wrote:
> >
> > > On Mon, 16 Sep 2019 at 00:17, Gilles Sadowski <gillese...@gmail.com>
> > > wrote:
> > > >
> > > > Hi.
> > > >
> > > > Le sam. 14 sept. 2019 à 08:15, Claude Warren <cla...@xenei.com> a
> > écrit
> > > :
> > > > >
> > > > > @Gilles
> > > > >
> > > > > I am happy to rename the package without the plural if that is the
> > > > > standard, I will also fix the indent issue.  Is there a definition
> > > that can
> > > > > be quickly imported into Eclipse to do the proper formatting?
> > > >
> > > > Hopefully, someone else can answer that one.
> > > >
> > > > > I am adding/updating all comments in the code.
> > > > >
> > > > > FilterConfig contains a main method to make it easy to assess how the
> > > > > various settings influence the length and number of functions in the
> > > > > filter.  This could go into another class or a Utility class but
> > > neither of
> > > > > those seem like a better solution.
> > > >
> > > > I don't think there should be a "main" method in a library.
> > > > Also, just printing to stdout does not seem very flexible.
> > > > Perhaps define a "toString()" method?
> > > >
> > > > > I agree with your assessment of final and immutable for methods,
> > > variables
> > > > > and classes.
> > > > >
> > > > > I am removing the FilterConfig constructor for BloomFilter and
> > > ensuring the
> > > > > BloomFilter makes a defensive copy.
> > > > >
> > > > > The difference between match() and equals() is that equals() is a bit
> > > > > vector equality and match is the standard bloom filter comparison T&C
> > > = T.
> > > > > Documentation is being added to the match() and inverseMatch() to
> > > explain
> > > > > this.
> > > > >
> > > > > STORED_SIZE is similar to Long.BYTES (perhaps it should be renamed).
> > > It
> > > > > identifies how many bytes the FilterConfig will use when written to a
> > > > > DataOuptut or read from a DataInput.
> > > >
> > > > Why is it necessary to know such internal details?
> > > > Anyways, I think that persistence is best left to higher-level code.
> > > >
> > > > >
> > > > > I think the Utils class can be removed once the other code is moved
> > to
> > > > > Java8 streams and not extended iterators.
> > > > >
> > > > > Hash function is changed to Objects.hash()
> > > > >
> > > > > Constructors are/will be private where factories make sense.  For
> > some
> > > the
> > > > > simple constructor makes more sense.  (e.g. BloomFilter(BitSet) )
> > > >
> > > > See
> > > >
> > https://www.baeldung.com/java-constructors-vs-static-factory-methods
> > > > and
> > > >     https://blog.joda.org/2014/03/valjos-value-java-objects.html
> > > >
> > > > > I have no problem removing the Serializable on FilterConfig, but can
> > > you
> > > > > explain (or point to a post) why it should be avoided?
> > > >
> > > > Same as above (not prejudging how application want to handle
> > > serialization).
> > >
> > > It's difficult to code Serializable classes properly, and just as hard
> > > to maintain them.
> > > Also the serialised format becomes part of the API making changes harder.
> > >
> > > We should not promise what we likely cannot continue to deliver,
> > > especially since it may not be needed.
> > >
> > > Serializable classes are also a target for code exploits.
> > >
> >
> > +1 to removing Serializable support in its current implementation. If
> > serialization is a requirement, then it should be implemented using a
> > Serialization Proxy a la J. Block:
> >
> > https://www.youtube.com/watch?v=V1vQf4qyMXg&feature=youtu.be&t=56m12s
> >
> > Gary
> >
> >
> > > > > The update/build methods are being documented and an explanation of
> > the
> > > > > difference included.
> > > >
> > > > Am I wrong that the "build(...)" methods only spare the typing of 2
> > > > characters?
> > > > If so, it's not worth the increase of the API size.
> > > >
> > > > >
> > > > > getLog() is also being documented
> > > > >
> > > > > The asBitSet() method was to extract the bitset from the bloom filter
> > > so
> > > > > that it could be stored otherwise used with BitSet based processes.
> > > The
> > > > > name of the method is being chagned to getBitSet() and it makes a
> > > defensive
> > > > > copy.
> > > > >
> > > > > The ProtoBloomFilter is not a factory.  It is really a bean and there
> > > are
> > > > > times when it makes sense to use the ProtoBloomFilter.  The name is
> > > > > important as it identifies what it is.  It contains the necessary
> > > > > information to build any configuration of BloomFilter.  There is a
> > > proposal
> > > > > to use ProtoBloomFilters in service calls.  It could be defined as an
> > > > > interface with a concrete implementation in the
> > > ProtoBloomFilterBuilder.
> > > > >
> > > > > As an example of ProtoBloomFilter usage.  Say you have a Person
> > object
> > > that
> > > > > has the user's first name, last name, eye color, hair color.
> > > > >
> > > > > The ProtoBloomFilterBuilder would accept the 4 values as 4 calls to
> > > > > update() followed by a build() to build the protoBloomFilter.
> > > > > ProtoBloomFilter proto = ne wProtoBloomFilterBuilder().update(
> > "Claude"
> > > > > ).update( "Warren" ).update( "brown" ).update( "silver" ).build();
> > > >
> > > > I'm still missing why it cannot be a factory, but maybe it is a
> > > > question of design habits.  What about the following:
> > > >
> > > > public class BloomFilter {
> > > >     private final BitSet bitSet;
> > > >
> > > >     private BloomFilter(BitSet bs) {
> > > >         bitSet = (BitSet) (bs.clone());
> > > >     }
> > > >
> > > >     /** Factory method. (cf. "ValJO") */
> > > >     public BloomFilter of(BitSet bs) {
> > > >         return new BloomFilter(bs);
> > > >     }
> > > >
> > > >     // Combined "ProtoBloomFilter" and "ProtoBloomFilterBuilder" logic.
> > > >     public static class Builder {
> > > >          private final Set<Hash> hashes = new HashSet<>();
> > > >
> > > >          private Builder() {}
> > > >
> > > >          /** Factory method(s). */
> > > >          public Builder with(Builder other) {
> > > >               hashes.addAll(other.hashes);
> > > >          }
> > > >          public Builder with(ByteBuffer b) { /* ... */ }
> > > >          // etc.
> > > >
> > > >         /** Factory (converter). */
> > > >         public BloomFilter create(FilterConfig cfg) {
> > > >              final BitSet set = new BitSet(cfg.getNumberOfBits());
> > > >              for (Hash hash : hashes) {
> > > >                   hash.populate(set, cfg);
> > > >              }
> > > >              return BloomFilter.of(set);
> > > >         }
> > > >     }
> > > > }
> > > >
> > > > Perhaps I'm taking too many shortcuts... But, as pointed out by
> > > > Gary, more coverage is required; and unit tests will help figure out
> > > > the leanest design.
> > > >
> > > > >
> > > > > I can create a bloom filter configuration that has 1x10^7  items and
> > > > > accepts 1 / 1x10^6 collisions.
> > > > > FilterConfig config1 = new FilterConfig( 10000000, 1000000)
> > > > >
> > > > > and build the BloomFilter
> > > > > BloomFilter filter1 = proto.create( config1 )
> > > > >
> > > > > I can use that filter to determine if the person is in a collection
> > of
> > > > > 1x10^7 items.  The implementation of that collection could be
> > multiple
> > > > > buckets of  1x10^4 items with a collision rate of 5x10^3 items.  To
> > > check
> > > > > that I create an appropriate FilterConfig
> > > > > FilterConfig config2 = new FilterConfig( 10000, 5000 )
> > > > >
> > > > > and then create the proper bloom filter
> > > > > BloomFilter filter2 = proto.create( config2 )
> > > > >
> > > >
> > > > With my above attempt at an alternative API, we'd have (untested):
> > > >
> > > > BloomFilter.Builder proto = BloomFilter
> > > >      .with("Claude")
> > > >      .with("Warren")
> > > >      .with("brown").
> > > >      .with("silver");
> > > >
> > > > BloomFilter filter1 = proto.create(config1);
> > > > BloomFilter filter2 = proto.create(config2);
> > > >
> > > >
> > > > IIUC, in your current implementation of "ProtoBloomFilter", there
> > > > is a side-effect of "build()" (i.e. the call to "clear()"); thus
> > calling
> > > > "build()" twice will not yield the same result.  I don't think that's
> > > > a desirable feature.
> > > >
> > > > Regards,
> > > > Gilles
> > > >
> > > > > Claude
> > > > >
> > > > >
> > > > > On Fri, Sep 13, 2019 at 2:22 PM Gilles Sadowski <
> > gillese...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > > > [...]
> > > > > > >
> > > > > > > Gilles,
> > > > > > >
> > > > > > > Take a look at the PR, I added comments to allow the PR to have 0
> > > deps.
> > > > > > >
> > > > > > > Gary
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > IMO, the package should be named "bloomfilter" (without "s").
> > > > > > Naming seems inconsistent in [Collections]: Some (package)
> > > > > > names are singular, other plural.
> > > > > >
> > > > > > * Indent must be 4 spaces.
> > > > > > * All fields and methods must be commented.
> > > > > > * "FilterConfig" contains a "main" method.
> > > > > > * Local variables and fields must be declared "final" whenever
> > > constant.
> > > > > > * Some methods are declared "final", others not.
> > > > > > * "BloomFilter" should be made immutable.
> > > > > > * "BloomFilter" could do without a second constructor:
> > "FilterConfig"
> > > > > > should have a method that returns a "BitSet".
> > > > > > * "BloomFilter" must make a defensive copy of its "bitSet"
> > argument.
> > > > > > * What's the difference between "match" and "equals"?
> > > > > > * Why is "STORED_SIZE" public?
> > > > > > * "Utils" classes should be avoided.
> > > > > > * Hash function should be imported/shaded from "Commons Codec"
> > > > > > (and defined there if missing).
> > > > > > * Constructor(s) should be private (instantiation through factory
> > > > > > methods (cf. ValJO).
> > > > > > * "Serializable" should be avoided.
> > > > > > * The "update" methods are missing explanations (or reference)
> > about
> > > > > > their purpose.  Also, it seems that "update" and "build" are
> > > redundant.
> > > > > > * Does "getLog" return a standard value?  If so, the reference
> > should
> > > > > > appear in the Javadoc (not just as code comment).
> > > > > > * What is the purpose of method "asBitSet()"?
> > > > > > * (IIUC) "ProtoBloomFilter" should be renamed "BloomFilterFactory".
> > > > > > "ProtoBloomFilterBuilder" should be a static inner class of that
> > > > > > factory.
> > > > > >
> > > > > > Regards,
> > > > > > Gilles
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > > For additional commands, e-mail: dev-h...@commons.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> > > For additional commands, e-mail: dev-h...@commons.apache.org
> > >
> > >
> >
>
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren



-- 
Matt Sicker <boa...@gmail.com>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to