On Sat, Sep 14, 2019 at 2:15 AM Claude Warren <cla...@xenei.com> wrote:
> @Gilles > > I am happy to rename the package without the plural if that is the > standard, I will also fix the indent issue. Is there a definition that can > be quickly imported into Eclipse to do the proper formatting? > > I am adding/updating all comments in the code. > > FilterConfig contains a main method to make it easy to assess how the > various settings influence the length and number of functions in the > filter. This could go into another class or a Utility class but neither of > those seem like a better solution. > > I agree with your assessment of final and immutable for methods, variables > and classes. > > I am removing the FilterConfig constructor for BloomFilter and ensuring the > BloomFilter makes a defensive copy. > > The difference between match() and equals() is that equals() is a bit > vector equality and match is the standard bloom filter comparison T&C = T. > Documentation is being added to the match() and inverseMatch() to explain > this. > > STORED_SIZE is similar to Long.BYTES (perhaps it should be renamed). It > identifies how many bytes the FilterConfig will use when written to a > DataOuptut or read from a DataInput. > > I think the Utils class can be removed once the other code is moved to > Java8 streams and not extended iterators. > > Hash function is changed to Objects.hash() > > Constructors are/will be private where factories make sense. For some the > simple constructor makes more sense. (e.g. BloomFilter(BitSet) ) > > I have no problem removing the Serializable on FilterConfig, but can you > explain (or point to a post) why it should be avoided? > >From the master: https://www.youtube.com/watch?v=V1vQf4qyMXg&feature=youtu.be&t=56m12s Also related: https://opensource.googleblog.com/2017/03/operation-rosehub.html Gary > The update/build methods are being documented and an explanation of the > difference included. > > getLog() is also being documented > > The asBitSet() method was to extract the bitset from the bloom filter so > that it could be stored otherwise used with BitSet based processes. The > name of the method is being chagned to getBitSet() and it makes a defensive > copy. > > The ProtoBloomFilter is not a factory. It is really a bean and there are > times when it makes sense to use the ProtoBloomFilter. The name is > important as it identifies what it is. It contains the necessary > information to build any configuration of BloomFilter. There is a proposal > to use ProtoBloomFilters in service calls. It could be defined as an > interface with a concrete implementation in the ProtoBloomFilterBuilder. > > As an example of ProtoBloomFilter usage. Say you have a Person object that > has the user's first name, last name, eye color, hair color. > > The ProtoBloomFilterBuilder would accept the 4 values as 4 calls to > update() followed by a build() to build the protoBloomFilter. > ProtoBloomFilter proto = ne wProtoBloomFilterBuilder().update( "Claude" > ).update( "Warren" ).update( "brown" ).update( "silver" ).build(); > > I can create a bloom filter configuration that has 1x10^7 items and > accepts 1 / 1x10^6 collisions. > FilterConfig config1 = new FilterConfig( 10000000, 1000000) > > and build the BloomFilter > BloomFilter filter1 = proto.create( config1 ) > > I can use that filter to determine if the person is in a collection of > 1x10^7 items. The implementation of that collection could be multiple > buckets of 1x10^4 items with a collision rate of 5x10^3 items. To check > that I create an appropriate FilterConfig > FilterConfig config2 = new FilterConfig( 10000, 5000 ) > > and then create the proper bloom filter > BloomFilter filter2 = proto.create( config2 ) > > Claude > > > On Fri, Sep 13, 2019 at 2:22 PM Gilles Sadowski <gillese...@gmail.com> > wrote: > > > > > [...] > > > > > > Gilles, > > > > > > Take a look at the PR, I added comments to allow the PR to have 0 deps. > > > > > > Gary > > > > > > > > > > IMO, the package should be named "bloomfilter" (without "s"). > > Naming seems inconsistent in [Collections]: Some (package) > > names are singular, other plural. > > > > * Indent must be 4 spaces. > > * All fields and methods must be commented. > > * "FilterConfig" contains a "main" method. > > * Local variables and fields must be declared "final" whenever constant. > > * Some methods are declared "final", others not. > > * "BloomFilter" should be made immutable. > > * "BloomFilter" could do without a second constructor: "FilterConfig" > > should have a method that returns a "BitSet". > > * "BloomFilter" must make a defensive copy of its "bitSet" argument. > > * What's the difference between "match" and "equals"? > > * Why is "STORED_SIZE" public? > > * "Utils" classes should be avoided. > > * Hash function should be imported/shaded from "Commons Codec" > > (and defined there if missing). > > * Constructor(s) should be private (instantiation through factory > > methods (cf. ValJO). > > * "Serializable" should be avoided. > > * The "update" methods are missing explanations (or reference) about > > their purpose. Also, it seems that "update" and "build" are redundant. > > * Does "getLog" return a standard value? If so, the reference should > > appear in the Javadoc (not just as code comment). > > * What is the purpose of method "asBitSet()"? > > * (IIUC) "ProtoBloomFilter" should be renamed "BloomFilterFactory". > > "ProtoBloomFilterBuilder" should be a static inner class of that > > factory. > > > > Regards, > > Gilles > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > > For additional commands, e-mail: dev-h...@commons.apache.org > > > > > > -- > I like: Like Like - The likeliest place on the web > <http://like-like.xenei.com> > LinkedIn: http://www.linkedin.com/in/claudewarren >