On Sun, Sep 15, 2019 at 8:17 PM sebb <seb...@gmail.com> wrote: > On Mon, 16 Sep 2019 at 00:17, Gilles Sadowski <gillese...@gmail.com> > wrote: > > > > Hi. > > > > Le sam. 14 sept. 2019 à 08:15, Claude Warren <cla...@xenei.com> a écrit > : > > > > > > @Gilles > > > > > > I am happy to rename the package without the plural if that is the > > > standard, I will also fix the indent issue. Is there a definition > that can > > > be quickly imported into Eclipse to do the proper formatting? > > > > Hopefully, someone else can answer that one. > > > > > I am adding/updating all comments in the code. > > > > > > FilterConfig contains a main method to make it easy to assess how the > > > various settings influence the length and number of functions in the > > > filter. This could go into another class or a Utility class but > neither of > > > those seem like a better solution. > > > > I don't think there should be a "main" method in a library. > > Also, just printing to stdout does not seem very flexible. > > Perhaps define a "toString()" method? > > > > > I agree with your assessment of final and immutable for methods, > variables > > > and classes. > > > > > > I am removing the FilterConfig constructor for BloomFilter and > ensuring the > > > BloomFilter makes a defensive copy. > > > > > > The difference between match() and equals() is that equals() is a bit > > > vector equality and match is the standard bloom filter comparison T&C > = T. > > > Documentation is being added to the match() and inverseMatch() to > explain > > > this. > > > > > > STORED_SIZE is similar to Long.BYTES (perhaps it should be renamed). > It > > > identifies how many bytes the FilterConfig will use when written to a > > > DataOuptut or read from a DataInput. > > > > Why is it necessary to know such internal details? > > Anyways, I think that persistence is best left to higher-level code. > > > > > > > > I think the Utils class can be removed once the other code is moved to > > > Java8 streams and not extended iterators. > > > > > > Hash function is changed to Objects.hash() > > > > > > Constructors are/will be private where factories make sense. For some > the > > > simple constructor makes more sense. (e.g. BloomFilter(BitSet) ) > > > > See > > https://www.baeldung.com/java-constructors-vs-static-factory-methods > > and > > https://blog.joda.org/2014/03/valjos-value-java-objects.html > > > > > I have no problem removing the Serializable on FilterConfig, but can > you > > > explain (or point to a post) why it should be avoided? > > > > Same as above (not prejudging how application want to handle > serialization). > > It's difficult to code Serializable classes properly, and just as hard > to maintain them. > Also the serialised format becomes part of the API making changes harder. > > We should not promise what we likely cannot continue to deliver, > especially since it may not be needed. > > Serializable classes are also a target for code exploits. >
+1 to removing Serializable support in its current implementation. If serialization is a requirement, then it should be implemented using a Serialization Proxy a la J. Block: https://www.youtube.com/watch?v=V1vQf4qyMXg&feature=youtu.be&t=56m12s Gary > > > The update/build methods are being documented and an explanation of the > > > difference included. > > > > Am I wrong that the "build(...)" methods only spare the typing of 2 > > characters? > > If so, it's not worth the increase of the API size. > > > > > > > > getLog() is also being documented > > > > > > The asBitSet() method was to extract the bitset from the bloom filter > so > > > that it could be stored otherwise used with BitSet based processes. > The > > > name of the method is being chagned to getBitSet() and it makes a > defensive > > > copy. > > > > > > The ProtoBloomFilter is not a factory. It is really a bean and there > are > > > times when it makes sense to use the ProtoBloomFilter. The name is > > > important as it identifies what it is. It contains the necessary > > > information to build any configuration of BloomFilter. There is a > proposal > > > to use ProtoBloomFilters in service calls. It could be defined as an > > > interface with a concrete implementation in the > ProtoBloomFilterBuilder. > > > > > > As an example of ProtoBloomFilter usage. Say you have a Person object > that > > > has the user's first name, last name, eye color, hair color. > > > > > > The ProtoBloomFilterBuilder would accept the 4 values as 4 calls to > > > update() followed by a build() to build the protoBloomFilter. > > > ProtoBloomFilter proto = ne wProtoBloomFilterBuilder().update( "Claude" > > > ).update( "Warren" ).update( "brown" ).update( "silver" ).build(); > > > > I'm still missing why it cannot be a factory, but maybe it is a > > question of design habits. What about the following: > > > > public class BloomFilter { > > private final BitSet bitSet; > > > > private BloomFilter(BitSet bs) { > > bitSet = (BitSet) (bs.clone()); > > } > > > > /** Factory method. (cf. "ValJO") */ > > public BloomFilter of(BitSet bs) { > > return new BloomFilter(bs); > > } > > > > // Combined "ProtoBloomFilter" and "ProtoBloomFilterBuilder" logic. > > public static class Builder { > > private final Set<Hash> hashes = new HashSet<>(); > > > > private Builder() {} > > > > /** Factory method(s). */ > > public Builder with(Builder other) { > > hashes.addAll(other.hashes); > > } > > public Builder with(ByteBuffer b) { /* ... */ } > > // etc. > > > > /** Factory (converter). */ > > public BloomFilter create(FilterConfig cfg) { > > final BitSet set = new BitSet(cfg.getNumberOfBits()); > > for (Hash hash : hashes) { > > hash.populate(set, cfg); > > } > > return BloomFilter.of(set); > > } > > } > > } > > > > Perhaps I'm taking too many shortcuts... But, as pointed out by > > Gary, more coverage is required; and unit tests will help figure out > > the leanest design. > > > > > > > > I can create a bloom filter configuration that has 1x10^7 items and > > > accepts 1 / 1x10^6 collisions. > > > FilterConfig config1 = new FilterConfig( 10000000, 1000000) > > > > > > and build the BloomFilter > > > BloomFilter filter1 = proto.create( config1 ) > > > > > > I can use that filter to determine if the person is in a collection of > > > 1x10^7 items. The implementation of that collection could be multiple > > > buckets of 1x10^4 items with a collision rate of 5x10^3 items. To > check > > > that I create an appropriate FilterConfig > > > FilterConfig config2 = new FilterConfig( 10000, 5000 ) > > > > > > and then create the proper bloom filter > > > BloomFilter filter2 = proto.create( config2 ) > > > > > > > With my above attempt at an alternative API, we'd have (untested): > > > > BloomFilter.Builder proto = BloomFilter > > .with("Claude") > > .with("Warren") > > .with("brown"). > > .with("silver"); > > > > BloomFilter filter1 = proto.create(config1); > > BloomFilter filter2 = proto.create(config2); > > > > > > IIUC, in your current implementation of "ProtoBloomFilter", there > > is a side-effect of "build()" (i.e. the call to "clear()"); thus calling > > "build()" twice will not yield the same result. I don't think that's > > a desirable feature. > > > > Regards, > > Gilles > > > > > Claude > > > > > > > > > On Fri, Sep 13, 2019 at 2:22 PM Gilles Sadowski <gillese...@gmail.com> > > > wrote: > > > > > > > > > [...] > > > > > > > > > > Gilles, > > > > > > > > > > Take a look at the PR, I added comments to allow the PR to have 0 > deps. > > > > > > > > > > Gary > > > > > > > > > > > > > > > > > > IMO, the package should be named "bloomfilter" (without "s"). > > > > Naming seems inconsistent in [Collections]: Some (package) > > > > names are singular, other plural. > > > > > > > > * Indent must be 4 spaces. > > > > * All fields and methods must be commented. > > > > * "FilterConfig" contains a "main" method. > > > > * Local variables and fields must be declared "final" whenever > constant. > > > > * Some methods are declared "final", others not. > > > > * "BloomFilter" should be made immutable. > > > > * "BloomFilter" could do without a second constructor: "FilterConfig" > > > > should have a method that returns a "BitSet". > > > > * "BloomFilter" must make a defensive copy of its "bitSet" argument. > > > > * What's the difference between "match" and "equals"? > > > > * Why is "STORED_SIZE" public? > > > > * "Utils" classes should be avoided. > > > > * Hash function should be imported/shaded from "Commons Codec" > > > > (and defined there if missing). > > > > * Constructor(s) should be private (instantiation through factory > > > > methods (cf. ValJO). > > > > * "Serializable" should be avoided. > > > > * The "update" methods are missing explanations (or reference) about > > > > their purpose. Also, it seems that "update" and "build" are > redundant. > > > > * Does "getLog" return a standard value? If so, the reference should > > > > appear in the Javadoc (not just as code comment). > > > > * What is the purpose of method "asBitSet()"? > > > > * (IIUC) "ProtoBloomFilter" should be renamed "BloomFilterFactory". > > > > "ProtoBloomFilterBuilder" should be a static inner class of that > > > > factory. > > > > > > > > Regards, > > > > Gilles > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >